Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Sampling Distribution of a Proportion

Suppose that we draw all possible random samples of size n from a given population. Suppose further that we compute a proportion for each sample. The probability distribution of this statistic is the sampling distribution for the proportion.

Shape of Sampling Distribution

It is safe to assume that the shape of the sampling distribution for a proportion will be approximately normal when the following conditions are true:

  • Population size (N) is at least 10 times sample size (n).
  • The sampling method is simple random sampling.
  • n * p ≥ 10, where p is the sample proportion.
  • n * (1 - p) ≥ 10.

Note: The last two conditions require that at least 20 observations be sampled from the population. When the sample proportion p is more extreme than 0.5, even more observations are required.

Standard Deviation of the Sampling Distribution

In a population of size N, suppose that the probability of the occurrence of an event (dubbed a "success") is P; and the probability of the event's non-occurrence (dubbed a "failure") is Q. From this population, suppose that we draw all possible simple random samples of size n. And finally, within each sample, suppose that we determine the proportion of successes p and failures q. In this way, we create a sampling distribution of the proportion.

The standard deviation of the sampling distribution (σp) is determined by the population proportion P, the population size N, and the sample size n, as shown below:

σp = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]

When the population size is very large relative to the sample size, the standard deviation formula can be approximated by:

σp = sqrt[ PQ/n ]

You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.

Standard Error of the Sampling Distribution

Typically, we don't know the value for population parameter P. And, if we don't know P, we cannot compute the standard deviation of the sampling distribution (σp).

However, we do know the sample proportions p and q. Substituting p and q into the equation for σp, we get:

SEp = sqrt[ pq/n ] * sqrt[ (N - n ) / (N - 1) ]

In this equation, p is the sample estimate of P, q is the sample estimate of Q, and SEp is a sample estimate of σp, the standard deviation of the sampling distribution. SEp is the standard error of the difference between sample proportions.

And when the population size is very large relative to the sample size, the standard error formula can be approximated by:

SEp = sqrt[ pq/n ]

In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute compute confidence intervals for proportions and to test hypotheses about proportions.

Summary of Key Points

The key takeaways from this lesson are summarized below.

  • The sampling distribution for a sample proportion will be normally distributed when:
    • Population size (N) is at least 10 times sample size (n).
    • The sampling method is simple random sampling.
    • n * p ≥ 10, where p is the sample proportion.
    • n * (1 - p) ≥ 10.
  • If population size is large relative to sample size, the standard error of the sampling distribution can be computed from the following formula:

    SEp = sqrt[ pq/n ]

    A population is considered "large" if it is at least 20 times bigger than its sample.

Test Your Understanding

In this section, we work through an example to illustrate how sampling distributions are used to solve commom statistical problems. In this problem, the population proportion is known; and the sample size is large. So you can use the Normal Distribution Calculator to compute probabilities.

Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator

Example 1

Suppose it were possible to take a simple random sample of 120 newborns. Find the probability that no more than 40% will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.

Solution:

This problem satisfies the conditions that allow us to assume the sampling distribution is approximately normal.

  • Population size (N = ∞) is at least 10 times sample size (n = 120).
  • The sampling method is simple random sampling.
  • n * p ≥ 10, where p is the sample proportion.
  • n * (1 - p) ≥ 10.

The mean of the sampling distribution will equal the mean of the population distribution. In the population, half of the births result in boys; and half, in girls. Therefore, the probability of boy births in the population is 0.50. Thus, the mean proportion in the sampling distribution should also be 0.50.

The standard deviation of the sampling distribution can be computed using the following formula.

σp = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]

Here, the finite population correction is equal to 1.0, since the population size (N) was assumed to be infinite. Therefore, standard error formula reduces to:

σp = sqrt[ PQ/n ]
σp = sqrt[ (0.5)(0.5)/120 ] = sqrt[0.25/120 ] = 0.04564

Let's review what we know and what we want to know. We know that the sampling distribution of the proportion is normally distributed with a mean of 0.50 and a standard deviation of 0.04564. We want to know the probability that no more than 40% of the sampled births are boys.

Because the sampling distribution is approximately normal, we'll use the normal distribution to find probability that 40% of sampled births are boys. To find the probability, we plug these inputs into the Normal Probability Calculator: mean of sampling distribution = .5, standard deviation of sampling distribution = 0.04564, and the raw score (i.e., sample mean) = .4.

Normal Distribution Calculator

The Calculator tells us that the probability that no more than 40% of the sampled births are boys is equal to 0.01422.

Note: This problem can also be treated as a binomial experiment. In a previous lesson, we explained how to analyze a binomial experiment, and we showed how to solve this problem when it is treated as a binomial experiment. The binomial experiment is actually the more exact analysis. When this problem is treated as a binomial experiment, we find a probability of 0.01766 (versus a probability of 0.14 that we found using the normal distribution).

The use of the normal distribution to estimate binomial probabilities is called the normal approximation to the binomial distribution. The normal approximation to the binomial distribution was used more in the 20th century, before binomial calculators were widely available, than it is used today. It is still a topic in the AP Statistics curriculum, so we include it in this tutorial.