Teach yourself statistics

Teach yourself statistics

Confidence Interval: Difference Between Proportions

This lesson describes how to construct a confidence interval for the difference between two sample proportions, p₁ - p₂.

Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

Both samples are simple random samples.
The samples are independent.
Each sample includes at least 10 successes and 10 failures.

The Variability of the Difference Between Proportions

To construct a confidence interval for the difference between two sample proportions, we need to know how to compute the standard deviation of the sampling distribution or the standard error.

Standard deviation: The standard deviation (SD) of the sampling distribution is the "average" deviation between all possible sample differences (p₁ - p₂) and the true population difference, (P₁ - P₂). The standard deviation of the difference between sample proportions is:
SD = sqrt{ [P₁ * (1 - P₁) / n₁] * [(N₁ - n₁) / (N₁ - 1)] + [P₂ * (1 - P₂) / n₂] * [(N₂ - n₂) / (N₂ - 1)] }
where P₁ is the population proportion for sample 1, P₂ is the population proportion for sample 2, n₁ is the sample size from population 1, n₂ is the sample size from population 2, N₁ is the number of observations in population 1, and N₂ is the number of observations in population 2. When each sample is small (less than 5% of its population), the standard deviation can be approximated by:
SD = sqrt{ [P₁ * (1 - P₁) / n₁] + [P₂ * (1 - P₂) / n₂] }
Standard error: When the population parameters (P₁ and P₂) are not known, the standard deviation of the sampling distribution cannot be calculated. Under these circumstances, use the standard error. The standard error (SE) can be calculated from the equation below.
SE =
sqrt{ [p₁ * (1 - p₁) / n₁] * [(N₁ - n₁) / (N₁ - 1)]
+ [p₂ * (1 - p₂) / n₂] * [(N₂ - n₂) / (N₂ - 1)] }
where p₁ is the sample proportion for sample 1, and where p₂ is the sample proportion for sample 2. When each sample is small (less than 5% of its population), the standard deviation can be approximated by:
SE = sqrt{ [p₁ * (1 - p₁) / n₁] + [p₂ * (1 - p₂) / n₂] }

Alert

The Advanced Placement Statistics Examination only covers the "approximate" formulas for the standard deviation and standard error.

SD = sqrt{ [P₁ * (1 - P₁) / n₁] + [P₂ * (1 - P₂) / n₂] }

SE = sqrt{ [p₁ * (1 - p₁) / n₁] + [p₂ * (1 - p₂) / n₂] }

However, students are expected to be aware of the limitations of these formulas; namely, the approximate formulas should only be used when the population size is at least 20 times larger than the sample size and when the sampling method is simple random sampling.

How to Find the Confidence Interval for Difference Between Proportions

Previously, we described how to construct confidence intervals . For convenience, we repeat the five steps below.

Choose the confidence level. The confidence level describes the uncertainty of a sampling plan. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
Compute the standard deviation or standard error. The standard deviation (SD) and the standard error (SE) for the difference between proportions can be calculated from the following formulas:
SD = sqrt{ [P₁ * (1 - P₁) / n₁] + [P₂ * (1 - P₂) / n₂] }

SE = sqrt{ [p₁ * (1 - p₁) / n₁] + [p₂ * (1 - p₂) / n₂] }
Find the critical value. The critical value will be a z-score. The value of the z-score depends on the confidence level. Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level. Instructions for finding other z-score critical values are provided in the lesson on margin of error.
Find the margin of error. You can compute the margin of error (ME), based on one of the following equations.
ME = CV * SD

ME = CV * SE

where CV is the z-score critical value, SD is the standard deviation of the sampling distribution for the difference between proportions, and SE is the standard error.
Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval (CI) is defined by the following equation.
CI = (p₁ - p₂) ± ME

where p₁ and p₂ are sample proportions, and ME is the margin of error.

In the next section, we work through a problem that shows how to use this approach to construct a confidence interval for the difference between proportions.

Test Your Understanding

Problem 1

Suppose the Cartoon Network conducts a nation-wide survey to assess viewer attitudes toward Superman. Using a simple random sample, they select 400 boys and 300 girls to participate in the study. Forty percent of the boys say that Superman is their favorite character, compared to thirty percent of the girls. What is the 90% confidence interval for the true difference in attitudes toward Superman?

(A) 0 to 20 percent more boys prefer Superman
(B) 2 to 18 percent more boys prefer Superman
(C) 4 to 16 percent more boys prefer Superman
(D) 6 to 14 percent more boys prefer Superman
(E) None of the above

Solution

The correct answer is (C). The approach that we used to solve this problem is valid when the following conditions are met.

The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
Both samples should be independent. This condition is satisfied since neither sample was affected by responses of the other sample.
Each sample should include at least 10 successes and 10 failures. Suppose we classify choosing Superman as a success, and any other response as a failure. Then, we have plenty of successes and failures in both samples.

Since the above requirements are satisfied, we can use the following five-step approach to construct a confidence interval.

Choose a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 90% confidence level.
Compute the standard deviation or standard error. Since we do not know the population proportions, we cannot compute the standard deviation; instead, we compute the standard error. And since each population is more than 20 times larger than its sample, we can use the following formula to compute the standard error (SE) of the difference between proportions:
SE = sqrt{ [p₁ * (1 - p₁) / n₁] + [p₂ * (1 - p₂) / n₂] }
SE = sqrt{ [0.40 * 0.60 / 400] + [0.30 * 0.70 / 300] }
SE = sqrt[ (0.24 / 400) + (0.21 / 300) ] = sqrt(0.0006 + 0.0007) = sqrt(0.0013) = 0.036
Find the critical value. The critical z-score value for a 90% confidence level is 1.645.
Find the margin of error (ME): The margin of error is computed from this formula:
ME = critical value * standard error

ME = 1.645 * 0.036 = 0.06
Specify the confidence interval. The uncertainty of the confidence interval is denoted by the confidence level (90%). The range of the confidence interval (CI) is defined by the sample statistic ± margin of error.
CI = (p₁ - p₂) ± ME

CI = (0.4 - 0.3) ± 0.06 = 0.10 ± 0.06

Conclusion: We are 90% confident that the true difference between population proportion is in the range defined by 0.10 ± 0.06 (i.e., between 0.04 and 0.16). Here is what that actually means. If we replicated the study many times (i.e., used the same sampling plan with different samples), the sampling plan we used should produce a confidence interval that includes the true population mean 90% of the time.

Note: You might also use shorthand notation to describe this confidence interval as (0.04, 0.16).

Last lesson Next lesson