Teach yourself statistics

Teach yourself statistics

Sampling Distribution of a Proportion

Suppose that we draw all possible random samples of size n from a given population. And within each sample, suppose we count the number of successes (x) and compute a proportion (p), where p = x/n. The probability distribution of this sample proportion is the sampling distribution for the proportion.

The AP Statistics curriculum describes three different ways to represent the sampling distribution of a proportion: (1) as a binomial distribution, (2) as a normal approximation to the binomial without a continuity correction, and (3) as a normal approximation to the binomial with a continuity correction. In this lesson, we'll cover the binomial distribution. We'll look at normal approximations in the next lesson.

The Binomial Distribution

The binomial distribution is used to model the number of successes (x) in a fixed number of trials (n), where each trial has two possible outcomes (success or failure) and each trial is independent. The binomial distribution provides an exact probability (not an approximation) for every sample outcome; that is, for every sample proportion (p), where p = x/n .

How to Display Probability

Probabilities for every possible value that a sample proportion can take are typically displayed in a table or a histogram.

Here’s a simple example. Suppose we wanted to know the proportion of families that own dogs in a city of 100,000 families. If we surveyed every family in the city, we might find that 40% own dogs, so the actual proportion of dog owners in the population is 0.4. It would be impractical to survey every single family; but we could sample a subset of families to estimate the proportion of dog owners. If we randomly selected two families for our sample, we could observe three possible outcomes.

If nobody in our two-family sample owned a dog, the estimated sample proportion would be 0.
If one of two families in the sample owned a dog, the estimated sample proportion would be 0.5.
And if both families in the sample owned a dog, the estimated sample proportion would be 1.

Given these three possible outcomes and knowing that the true proportion of dog-owing families is 0.4, we can display a sampling distribution for this study in a table or a histogram, as shown below.

Table showing sampling distribution of a proportion

Table

Histogram showing sampling distribution of a proportion

Histogram

This table and this histogram are both examples of sampling distributions, because both show probabilities for each possible sample outcome. From the table and the histogram, we see there is a 36% probability that the sample proportion will be 0; a 48% probability that the sample proportion will be 0.5; and a 16% probability that the sample proportion will be 1.

How to Calculate Binomial Probability

Suppose you select a random sample of observations and count the proportion of successes in the sample. Finding the probability for any possible sample outcome is a two-step process.

Define the problem.
- P = Population proportion (probability of success)
- n = Sample size
- x = Number of successes in the sample
- p = Proportion of successes in the sample; that is, p = x/n
Use the binomial formula to compute the probability any sample outcome. Here are three equivalent equations for the binomial formula:

P(X = x)

=

\[ \binom{n}{x} \]

* P^x * (1 - P)^{n - x}

or
P(X = x) = _nC_x * P^x * (1 - P)^{n - x}
or
P(X = x) = { n! / [ x! (n - x)! ] } * P^x * (1 - P)^{n - x}

Note: If you take the AP Statistics exam, the first formula is the one that is provided to you at the exam. So, if you take the exam, it would be prudent to know how to compute binomial probability from that formula.

What Is a Combination?

One of the terms in the binomial formula is a combination. If you are not familiar with combinations or with notation for combinations, here is an explanation. A combination represents the number of ways to choose r items from a set of n items without regard to order. The notation for a combination is typically written as:

\[ \binom{n}{r} \]

or

_nC_r

This is read as "n choose r." And the formula for a combination is:

\[ \binom{n}{r} \]

=

_nC_r

=

n(n - 1)(n - 2) ... (n - r + 1)/r!

\[ \binom{n}{r} \]

=

_nC_r

=

n! / r!(n - r)!

where r! is the factorial of r, and (n−r)! is the factorial of (n−r).

Advantages and Disadvantages

When working with the sampling distribution of a proportion, you have two main options for calculating probabilities: the binomial distribution (which we are covering in this lesson) and the the normal approximation (which we will cover in the next lesson). Here are the advantages and disadvantages of the binomial distribution.

Advantages of the Binomial

Exact probabilities. The binomial distribution provides exact probabilities for sample outcomes, whereas the normal approximation gives approximate probabilities.
No need for continuity correction. Since the binomial distribution is discrete, you don't need to apply a continuity correction (unlike the normal approximation, which requires it for better accuracy).
Works for extreme values of P. For extreme values of P (close to 0 or 1), the normal approximation may fail (because the normal distribution assumes symmetry).
Works for small samples. The binomial distribution gives accurate results for any sample size, whereas the normal approximation requires a larger sample for good results.

Disadvantages of the Binomial

Computationally complexity. For large sample sizes, calculating binomial probabilities can be demanding, especially without a calculator or software.
May be harder to use. For some applications (e.g., confidence intervals and hypothesis testing), the normal approximation simplifies calculations significantly.

Test Your Understanding

Problem 1

Suppose it were possible to take a simple random sample of 120 newborns. Find the probability that no more than 40% will be boys. Assume equal probabilities for the births of boys and girls (i.e., P = 0.5).

Solution: The binomial distribution provides an exact probability for the sampling distribution of a proportion. Finding the probability for any particular outcome is a two-step process:

Define the problem. Here we know the following:
- P = Population proportion = 0.5
- n = Sample size = 120
- p = Proportion of successes in the sample = 0.4
- x = Number of successes in the sample = 0.4 * 120 = 48
Given the problem definition, find the probability that no more than 40% of newborns will be boys. This is equivalent to finding the probability that no more than 48 of the sampled newborns will be boys. We can use the binomial formula to find that probability.
P(X ≤ 48) = \(\sum_{i=0}^{i=48}\) [ ₁₂₀C_i * pⁱ * (1 - p)^{120 - i} ]

P(x < 48) = P(x=0) + P(x=1) + ... + P(x=48) + P(x=48)

P(x < 48) = 0.0 + 0.0 + ... + 0.00435 + 0.00662

P(x < 48 ) = 0.01766

Finding this cumulative binomial probability is computationally complex. It requires requires computing 49 individual binomial probabilities (as shown above). It can be done by hand, but it is much easier to use modern technology (e.g., a graphing calculator, Excel, or other statistical software). Below, we used the Binomial Calculator found on this site:

Screenshot of Binomial Calculator

In the next lesson, we use a normal approximation of the binomial to describe the sampling distribution of a proportion. The normal approximation avoids the computational complexity associated with the binomial formula.

Note: Back in the 20th century, before binomial calculators were widely available, the binomial distribution was seldom used in applied work. Analysts used the normal approximation to the binomial distribution instead. Today, with modern technology, the normal approximation is less necessary. However, it is still in use and is a topic in the AP Statistics curriculum, so we include it in this tutorial.

Last lesson Next lesson