Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Sampling Distribution: Difference Between Proportions

Statistics problems often involve comparisons between sample proportions from two independent populations. This lesson describes the sampling distribution for the difference between sample proportions.

In this lesson, you'll learn how to find the mean of the sampling distribution, how to compute the standard deviation of the sampling distribution, how to compute the standard error of the sampling distribution, and how to find the cumulative probability that the difference between sample proportions will be less than or equal to some critical value, which we call d.

Sampling Distribution: Difference Between Proportions

Suppose we have two populations with proportions equal to P1 and P2. Suppose further that we take all possible samples of size n1 and n2. And finally, suppose that the following assumptions are valid.

  • The samples from each population are big enough to justify using a normal distribution to model differences between proportions. The sample sizes will be big enough when the following conditions are met:
    • n1P1 > 10
    • n1(1 -P1) > 10
    • n2P2 > 10
    • n2(1 - P2) > 10
    When P1 and P2 each equal 0.5, this criterion requires that at least 20 observations be sampled from each population. When P1 or P2 is more extreme than 0.5, even more observations are required.
  • The samples are independent; that is, observations selected from population 1 are not affected by observations selected from population 2, and vice versa.

Given these assumptions, we know the following about the sampling distribution for the difference between sample proportions.

  • The sampling distribution for the difference between independent sample proportions will be approximately normally distributed.
  • The expected value of the difference between all possible sample proportions is equal to the difference between population proportions. Thus, the mean of the sampling distribution for the difference between sample proportions is:

    μd = E(p1 - p2) = P1 - P2

    where μd is the mean of the sampling distribution, p1 and p2 are sample proportions, and P1 and P2 are population proportions.

Standard Deviation of Sampling Distribution

When population sizes are large relative to sample sizes, the standard deviation of the difference between sample proportions (σd) is approximately equal to:

σd = sqrt{ [P1(1 - P1) / n1] + [P2(1 - P2) / n2] }

It is straightforward to derive this equation, based on material covered in previous lessons. The derivation starts with a recognition that the variance of the difference between independent random variables is equal to the sum of the individual variances. Thus,

σ2d = σ2P1 - P2 = σ21 + σ22

If the populations N1 and N2 are both large relative to n1 and n2, respectively, then

σ21 = P1(1 - P1) / n1       And       σ22 = P2(1 - P2) / n2

In this context, a population is considered to be "large" relative to a sample if it is at least 20 times bigger than the sample.

Therefore,

σ2d = [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ]
And
σd = sqrt{ [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] }

Bottom line: We can use the formula above to compute the standard deviation of a the sampling distribution for the difference between population proportions if:

  • N1 is large relative to n1, and N2 is large relative to n2.
  • We know each sample size (n1 and n2).
  • We know the population proportions (P1 and P1).

Standard Error of Sampling Distribution

Typically, we don't know the values for population parameters P1 or P2. And, if we don't know P1 and P2, we cannot compute the standard deviation of the difference between sample proportions (σd).

However, we can compute sample estimates of P1 and P2 from sample data. Substituting those estimates into the equation for σd, we get:

SEd = sqrt{ [ p1(1 - p1) / n1 ] + [ p2(1 - p2) / n2 ] }

In this equation, p1 is the sample estimate of P1, p2 is the sample estimate of P2, and SEd is a sample estimate of σd, the standard deviation of the difference between sample proportions. SEd is the standard error of the difference between sample proportions.

Reminder: This formula for standard error assumes that N1 is large relative to n1, and N2 is large relative to n2.

In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute confidence intervals for the difference between proportions and to test hypotheses about the difference between proportions.

How to Find Probability

When the sampling distribution for the difference between sample proportions is approximately normal in shape, you can use the normal distribution to find a cumulative probability for any difference in independent sample proportions. Specifically, you can find:

P(p1 - p2 ≤ d)

where p1 is a sample proportion from population 1, p2 is a sample proportion from population 2,and d is a constant called the critical value. Finding the probability that the difference between sample proportions will be no greater than the critical value d is a four-step process:

Step 1: Find Mean of Sampling Distribution

When the sampling distribution is approximately normal in shape, the sampling distribution will symmetric and centered over the difference between population proportions. Therefore, the mean of the sampling distribution of a difference between two independent sample proportions will equal:

μd = P1 - P2

where μd is the mean of the sampling distribution, P1 is population proportion for population 1, and P2 is population proportion for population 2.

Step 2: Find Standard Deviation

Earlier in this lesson (see above), we explained how to compute standard deviation of the sampling distribution when you know the population proportion. And we showed how to estimate the standard deviation with the standard error when you don't know the population proportion. When population size is big relative to sample size, you can use these formulas to find standard deviation and standard error:

σd = sqrt{ [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] }

SEd = sqrt{ [ p1(1 - p1) / n1 ] + [ p2(1 - p2) / n2 ] }

where σd is the standard deviation of the sampling distribution, SEd is the standard error, P1 and P2 are independent population proportions, p1 and p2 are sample estimates of the population proportions, n1 is sample size from population 1, and n2 is sample size from population 2.

Step 3: Transform d Into z-Score

If you know the standard deviation of the sampling distribution, compute a z-score using this formula:

z = (d – μd / σd

If you know the standard error, use this formula:

z = (d – μd) / SEd

where d is the critical value for which we want to find a probability, μd is the mean of the sampling distribution, σd is the standard deviation of the sampling distribution, and SEd is the standard error of the sampling distribution.

Step 4: Find Probability

Find the probability for the z-score you calculated in Step 3; and you have found the probability that a difference between two indpendent sample proporitions will be no greater than the critical value, d.

You can find the probability for the z-score from a handheld graphing calculator, from a written probability table commonly found in the appendix of introductory statistics texts, or from an online probability calculator, like Stat Trek's normal distribution calculator.

Test Your Understanding

In this section, we work through a sample problem to show how to apply the guidelines presented above. For this problem, we will use Stat Trek's Normal Distribution Calculator to compute probability.

Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator

Sample Problem

In one state, 52% of the voters are Republicans, and 48% are Democrats. In a second state, 47% of the voters are Republicans, and 53% are Democrats. Suppose 100 voters are surveyed from each state. Assume the survey uses simple random sampling.

What is the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state?

Solution:

This problem satisfies the conditions that allow us to assume the sampling distribution is approximately normal.

  • For each sample, population size is at least 10 times sample size.
  • The sampling method is simple random sampling.
  • For each sample, it will be true that n * p ≥ 10, where p is the sample proportion.
  • For each sample, it will be true that n * (1 - p) ≥ 10.

Therefore, we can use the four-step solution to find probability.

  • Step 1. Find the mean of the sampling distribution. In the first state, 52% of voters are Republican; and in the second state, 47% of voters are Republican. Therefore, the mean of the sampling distribution (μd) is:

    μd = P1 - P2

    μd = 0.52 - 0.47 = 0.05

  • Step 2. Find the standard deviation of the sampling distribution. Since we know population proportions, we can compute the standard deviation, rather than estimate it with standard error. The standard deviation of the sampling distribution is:

    σd = sqrt{ [ P1(1 - P1) / n1 ] + [ P2(1 - P2) / n2 ] }

    σd = sqrt{ [ (0.52)(1 - 0.52) / 100 ] + [ (0.47)(1 - 0.47) / 100 ] }

    σd = sqrt{ [ 0.002496 ] + [ 0.002491] } = 0.0706

  • Step 3. Transform d into a z-score. This problem requires us to find the probability that p1 is less than p2. This is equivalent to finding the probability that p1 - p2 is less than zero. Therefore, for this problem, the critical value d for which we want to find a cumulative probability is zero; and the z-score formula is:

    z = (d - μd)/σd = (0 - 0.05)/0.0706 = -0.7082

  • Step 4. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -0.7082, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)
Normal Distribution Calculator

The calculator tells us that the probability of finding a z-score less than -0.7082 is 0.23941. Therefore, the probability that the survey will show a greater percentage of Republican voters in the second state than in the first state is about 0.24.