Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Sampling Distribution of a Proportion

Suppose that we draw all possible random samples of size n from a given population. Suppose further that we compute a proportion for each sample. The probability distribution of this statistic is the sampling distribution for the proportion.

In this lesson, you'll learn how to find the mean of the sampling distribution, how to compute the standard deviation of the sampling distribution, how to compute the standard error of the sampling distribution, and how to find the cumulative probability that a sample proportion will be less than or equal to some critical value, which we call d.

How to Represent Sampling Distribution

The sampling distribution of a proportion (or any discrete variable) is typically represented by a table or a histogram.

Here’s a simple example. Suppose we wanted to know the proportion of families that own dogs in a city of 100,000 families. If we surveyed every family in the city, we might find that 40% own dogs, so the actual proportion of dog owners in the population is 0.4.

It would be impractical to survey every single family; but we could sample a subset of families to estimate the proportion of dog owners. If we randomly selected two families for our sample, we could observe three possible outcomes. If nobody in our two-family sample owned a dog, the estimated sample proportion would be 0. If one family in the sample owned a dog, the estimated sample proportion would be 0.5. And if both families in the sample owned a dog, the estimated sample proportion would be 1. Given these three possible outcomes, a sampling distribution for this study might take the form of a table or a histogram, as shown below.

Table showing sampling distribution of a proportion

Table

Histogram showing sampling distribution of a proportion

Histogram

This table and this histogram are both examples of sampling distributions, because both show probabilities for each possible sample outcome. From the table, we see there is a 36% probability that the sample proportion will be 0; a 48% probability that the sample proportion will be 0.5; and a 16% probability that the sample proportion will be 1. That covers every possible outcome, since this study yields only three possible outcomes - 0, 0.5, or 1. The histogram shows the same information – a 3 6% probability that the sample proportion will be 0; a 48% probability that the sample proportion will be 0.5; and a 16% probability that the sample proportion will be 1.

What is the Effect of Sample Size?

Suppose we sampled more than two households in our dog-ownership study. The histograms below show sampling distributions for four different sample sizes: n=2, n=5, n=10, and n=25.

Histograms showing effect of sample size on sampling distribution of a proportion Histograms showing effect of sample size on sampling distribution of a proportion

From the histograms, we see two effects of interest.

  • As sample size increases, the histograms grow narrower, more closely concentrated around the population proportion of 0.4. This reflects greater precision of sample estimates with increased sample size.
  • As sample size increases, the histograms become increasingly more bell-shaped, like a normal distribution (as illustrated below by the green normal curve superimposed over the last histogram from the series above).
Histogram showing normal curve on sampling distribution of a proportion

The tendency of a sampling distribution to approximate a normal distribution has implications for statistical anaysis. When the approximation is sufficiently close, we can use the normal distribution to test hypotheses about proportions and to express confidence intervals around proportions – things you will learn to do in future lessons. For now, let's just answer the question: Under what conditions can we safely assume the sampling distribution of a proportion will be approximately normal in shape?

When is Distribution Normal?

It is safe to assume that the shape of the sampling distribution for a proportion will be approximately normal when the following conditions are true:

  • Population size (N) is at least 10 times sample size (n).
  • The sampling method is simple random sampling.
  • n * p ≥ 10, where p is the sample proportion.
  • n * (1 - p) ≥ 10.

Note: When the sample proportion p equals 0.5, the last two conditions require that at least 20 observations be sampled from a population for the sampling distribution to be approximatley normal. When the sample proportion p is more extreme than 0.5, more observations are required.

Standard Deviation of the Sampling Distribution

In a population of size N, suppose that each element can be characterized as a "success" or a "failure". The proportion of successes in the population is P; and the proportion of failures is Q. From this population, suppose that we draw all possible simple random samples of size n. And finally, within each sample, suppose that we determine the proportion of successes p and failures q. In this way, we create a sampling distribution of the proportion.

The standard deviation of the sampling distribution (σp) is determined by the population proportion P, the population size N, and the sample size n, as shown below:

σp = sqrt[ PQ/n ] * sqrt[ (N - n ) / (N - 1) ]

where

Q = 1 - P

When the population size is very large relative to the sample size, the standard deviation formula can be approximated by:

σp = sqrt[ PQ/n ] = sqrt[ P*(1-P)/n ]

You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.

Standard Error of the Sampling Distribution

Typically, we don't know the value for population parameter P. And, if we don't know P, we cannot compute the standard deviation of the sampling distribution (σp).

However, we do know the sample proportions p and q. Substituting p and q into the equation for σp, we get:

SEp = sqrt[ pq/n ] * sqrt[ (N - n ) / (N - 1) ]

where

q = 1 - p

In this equation, p is the sample estimate of P, q is the sample estimate of Q, and SEp is the standard error of the sampling distribution of the proportion. The standard error (SEp) is a sample estimate of the standard deviation (σp) of the sampling distribution of a proportion.

And when the population size is very large relative to the sample size, the standard error formula can be approximated by:

SEp = sqrt[ pq/n ] = sqrt[ p*(1-p)/n ]

In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute compute confidence intervals for proportions and to test hypotheses about proportions.

How to Find Probability

When the sampling distribution of a proportion is approximately normal in shape, you can use the normal distribution to find a cumulative probability for any sample proportion. Specifically, you can find:

P(p ≤ d)

where p is a sample proportion and d is a constant called the critical value. Finding the probability that the a sample proportion will be no greater than the critical value d is a four-step process:

Step 1: Find Mean of Sampling Distribution

When the sampling distribution is approximately normal in shape, the sampling distribution will symmetric and centered over the population proportion P. Therefore, the mean of the sampling distribution of a sample proportion will equal P. Thus,

Ps = P

where Ps is the mean of the sampling distribution and P is the population proportion.

Step 2: Find Standard Deviation

Earlier in this lesson (see above), we explained how to compute standard deviation of the sampling distribution when you know the population proportion. And we showed how to estimate the standard deviation with the standard error when you don't know the population proportion. You can use these formulas for standard deviation and standard error:

σp = sqrt[ P(1 - P)/n ] * sqrt[ (N - n ) / (N - 1) ]

SEp = sqrt[ p(1 - p)/n ] * sqrt[ (N - n ) / (N - 1) ]

When population size is big relative to sample size, you can use these simpler formulas to get a good approximation of standard deviation and standard error:

σp = sqrt[ P(1 - P)/n ]

SEp = sqrt[ p(1 - p)/n ]

where σp is the standard deviation of the sampling distribution, SEp is the standard error, P is the population proportion, p is the sample estimate of the population proportion, , N is population size, and n is sample size.

Step 3: Transform d Into z-Score

If you know the standard deviation of the sampling distribution, compute a z-score using this formula:

z = (d – Ps) / σp

If you know the standard error, use this formula:

z = (d – Ps) / SEp

where d is the critical value for which we want to find a probability, Ps is the mean of the sampling distribution, σp is the standard deviation of the sampling distribution, and SEp is the standard error of the sampling distribution.

Step 4: Find Probability

Find the probability for the z-score you calculated in Step 3; and you have found the probability that a sample proportion will be no greater than the critical value, d.

You can find the probability for the z-score from a handheld graphing calculator, from a written probability table commonly found in the appendix of introductory statistics texts, or from an online probability calculator, like Stat Trek's normal distribution calculator.

Test Your Understanding

In this section, we work through a sample problem to illustrate how to find probability when the sampling distribution of a sample proportion is shaped approximately like a normal distribution. We will use the Normal Distribution Calculator to compute probability.

Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator

Problem 1

Suppose it were possible to take a simple random sample of 120 newborns. Find the probability that no more than 40% will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.

Solution:

This problem satisfies the conditions that allow us to assume the sampling distribution is approximately normal.

  • Population size (N = ∞) is at least 10 times sample size (n = 120).
  • The sampling method is simple random sampling.
  • n * p ≥ 10, where p is the sample proportion.
  • n * (1 - p) ≥ 10.

Therefore, we can use the four-step solution to find probability.

  • Step 1. Find the mean of the sampling distribution. In the population, 50% of births are boys; and the mean of the sampling distribution (μs) equals the population mean, so:

    μs = μp

    μs = 0.50

  • Step 2. Find the standard deviation of the sampling distribution. Since sample size (n = 120) is much smaller than the population size (very large) we can use the simpler formula for standard deviation:

    σp = sqrt[ P(1 - P)/n ]
    σp = sqrt[ (0.5) * (1 - 0.5) / 120 ] = 0.456

  • Step 3. Transform d into a z-score. In this problem, d is 0.40, the critical value for which we want to find a cumulative probability; and the z-score formula is:

    z = (d - μs)/σp = (0.40 - 0.50)/0.456 = -0.219

  • Step 4. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -0.219, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)
Normal Distribution Calculator

The Calculator tells us that the probability that the proportion of male births in our sample will no greater than 0.40 is 0.0146. Not very likely.

Note: This problem can also be treated as a binomial experiment. In a previous lesson, we explained how to analyze a binomial experiment, and we showed how to solve this problem when it is treated as a binomial experiment. The binomial experiment is actually the more exact analysis. When this problem is treated as a binomial experiment, we find a probability of 0.01766 (versus a probability of 0.14 that we found using the normal distribution).

The use of the normal distribution to estimate binomial probabilities is called the normal approximation to the binomial distribution. The normal approximation to the binomial distribution was used more in the 20th century, before binomial calculators were widely available, than it is used today. It is still a topic in the AP Statistics curriculum, so we include it in this tutorial.