# Confidence Interval: Difference Between Proportions

This lesson describes how to construct a
confidence interval
for the difference between two sample proportions,
*p*_{1} - *p*_{2}.

## Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

- Both samples are simple random samples.
- The samples are independent.
- Each sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes and 5 failures are enough.)

## The Variability of the Difference Between Proportions

To construct a confidence interval for the difference between two sample proportions, we need to know about the sampling distribution of the difference. Specifically, we need to know how to compute the standard deviation or standard error of the sampling distribution.

- The standard deviation of the sampling distribution is
the "average" deviation between all possible sample
differences (
*p*_{1}-*p*_{2}) and the true population difference, (*P*_{1}-*P*_{2}). The standard deviation of the difference between sample proportions σ_{p1}_{-}_{p2}is:σ

where P_{p1}_{-}_{p2}=

sqrt{ [P_{1}* (1 - P_{1}) / n_{1}] * [(N_{1}- n_{1}) / (N_{1}- 1)] + [P_{2}* (1 - P_{2}) / n_{2}] * [(N_{2}- n_{2}) / (N_{2}- 1)] }_{1}is the population proportion for sample 1, P_{2}is the population proportion for sample 2, n_{1}is the sample size from population 1, n_{2}is the sample size from population 2, N_{1}is the number of observations in population 1, and N_{2}is the number of observations in population 2. When each sample is small (less than 10% of its population), the standard deviation can be approximated by:σ

_{p1}_{-}_{p2}= sqrt{ [P_{1}* (1 - P_{1}) / n_{1}] + [P_{2}* (1 - P_{2}) / n_{2}] } - When the population parameters (P
_{1}and P_{2}) are not known, the standard deviation of the sampling distribution cannot be calculated. Under these circumstances, use the standard error. The standard error (SE) provides an unbiased estimate of the standard deviation. It can be calculated from the equation below.SE

where p_{p1}_{-}_{p2}=

sqrt{ [p_{1}* (1 - p_{1}) / n_{1}] * [(N_{1}- n_{1}) / (N_{1}- 1)] + [p_{2}* (1 - p_{2}) / n_{2}] * [(N_{2}- n_{2}) / (N_{2}- 1)] }_{1}is the sample proportion for sample 1, and where p_{2}is the sample proportion for sample 2. When each sample is small (less than 10% of its population), the standard deviation can be approximated by:SE

_{p1}_{-}_{p2}= sqrt{ [p_{1}* (1 - p_{1}) / n_{1}] + [p_{2}* (1 - p_{2}) / n_{2}] }

**Note:** The Advanced Placement Statistics Examination only covers
the "approximate" formulas for the standard deviation and standard
error. However, students are expected to be aware of the limitations
of these formulas; namely, that they should only be used when each
population is at least 10 times larger than its respective
sample.

## Alert

Some texts present a different, less general version of the approximate formulas. These formulas, which appear below, are valid when the proportions are equal.

- σ
_{p1}_{-}_{p2}= sqrt[P * (1 - P)] * sqrt[ (1 / n_{2}) + (1 / n_{2})] where P = P_{1}= P_{2} - SE
_{p1}_{-}_{p2}= sqrt[p * (1 - p)] * sqrt[ (1 / n_{2}) + (1 / n_{2})] where p = p_{1}= p_{2}

Remember, these two formulas should be used only when the proportions from each group are equal, and when each sample size is small (less than 10% of the population size).

## How to Find the Confidence Interval for a Proportion

Previously, we described how to construct confidence intervals. For convenience, we repeat the key steps below.

- Identify a sample statistic. Use the sample proportions
(p
_{1}- p_{2}) to estimate the difference between population proportions (P_{1}- P_{2}). - Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error.
- Specify the confidence interval. The range of the confidence
interval is defined by the
*sample statistic*__+__*margin of error*. And the uncertainty is denoted by the confidence level.

In the next section, we work through a problem that shows how to use this approach to construct a confidence interval for the difference between proportions.

## Test Your Understanding

**Problem 1**

Suppose the Cartoon Network conducts a nation-wide survey to assess viewer attitudes toward Superman. Using a simple random sample, they select 400 boys and 300 girls to participate in the study. Forty percent of the boys say that Superman is their favorite character, compared to thirty percent of the girls. What is the 90% confidence interval for the true difference in attitudes toward Superman?

(A) 0 to 20 percent more boys prefer Superman

(B) 2 to 18 percent more boys prefer Superman

(C) 4 to 16 percent more boys prefer Superman

(D) 6 to 14 percent more boys prefer Superman

(E) None of the above

**Solution**

The correct answer is (C). The approach that we used to solve this problem is valid when the following conditions are met.

- The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
- Both samples should be independent. This condition is satisfied since neither sample was affected by responses of the other sample.
- The sample should include at least 10 successes and 10 failures. Suppose we classify choosing Superman as a success, and any other response as a failure. Then, we have plenty of successes and failures in both samples.
- The sampling distribution should be approximately normally distributed. Because each sample size is large, we know from the central limit theorem that the sampling distribution of the difference between sample proportions will be normal or nearly normal; so this condition is satisfied.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

- Identify a sample statistic. Since we are trying to estimate
the difference between population proportions, we choose the
difference between sample proportions
as the sample statistic. Thus, the sample statistic is
p
_{boy}- p_{girl}= 0.40 - 0.30 = 0.10. - Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.

- Find standard deviation or standard error. Since we do not
know the population proportions, we cannot compute the
standard deviation; instead, we compute the standard
error. And since each population is more than 10 times larger
than its sample, we can use the following formula
to compute the standard error (SE) of the difference
between proportions:
SE = sqrt{ [p

_{1}* (1 - p_{1}) / n_{1}] + [p_{2}* (1 - p_{2}) / n_{2}] }

SE = sqrt{ [0.40 * 0.60 / 400] + [0.30 * 0.70 / 300] }

SE sqrt[ (0.24 / 400) + (0.21 / 300) ] = sqrt(0.0006 + 0.0007) = sqrt(0.0013) = 0.036 - Find critical value. The critical value is a factor used to
compute the margin of error. Because the sampling
distribution is approximately normal and the sample
sizes are large, we can express the critical value as a
z score
by following these steps.

- Compute alpha (α): α = 1 - (confidence level / 100) = 1 - (90/100) = 0.10
- Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.10/2 = 0.95
- The critical value is the z score having a cumulative probability equal to 0.95. From the Normal Distribution Calculator, we find that the critical value is 1.645.

- Compute margin of error (ME): ME = critical value * standard error = 1.645 * 0.036 = 0.06

- Find standard deviation or standard error. Since we do not
know the population proportions, we cannot compute the
standard deviation; instead, we compute the standard
error. And since each population is more than 10 times larger
than its sample, we can use the following formula
to compute the standard error (SE) of the difference
between proportions:
- Specify the confidence interval. The range of the confidence
interval is defined by the
*sample statistic*__+__*margin of error*. And the uncertainty is denoted by the confidence level.

Therefore, the 90% confidence interval is 0.04 to 0.16. That is, we are 90%
confident that the true difference between population proportion is in
the range defined by 0.10 __+__ 0.06. Since both ends of the
confidence interval are positive, we can conclude that more boys than
girls choose Superman as their favorite cartoon character.