Confidence Interval: Difference In Proportions
This lesson describes how to construct a
confidence interval
for the difference between two sample proportions,
p_{1} - p_{2}.
Estimation Requirements
The approach described in this lesson is valid whenever the
following conditions are met:
- Each sample includes at least 10 successes and 10 failures.
The Variability of the Difference Between Proportions
To construct a
confidence interval
for the difference between two sample proportions,
we need to know about the
sampling distribution
of the difference. Specifically, we need to know how to compute the
standard deviation
or
standard error
of the sampling distribution.
- The standard deviation of the sampling distribution is
the "average" deviation between all possible sample
differences (p_{1} - p_{2})
and the true population difference,
(P_{1} - P_{2}).
The standard deviation of the difference between sample proportions
σ_{p1} _{-} _{p2}
is:
σ_{p1} _{-} _{p2}
=
sqrt{ [P_{1} * (1 - P_{1}) / n_{1}]
*
[(N_{1} - n_{1}) / (N_{1} - 1)]
+
[P_{2} * (1 - P_{2}) / n_{2}]
*
[(N_{2} - n_{2}) / (N_{2} - 1)] }
where P_{1} is the population proportion for sample 1,
P_{2} is the population proportion for sample 2,
n_{1} is the sample size from population 1,
n_{2} is the sample size from population 2,
N_{1} is the number of observations in population 1, and
N_{2} is the number of observations in population 2.
When each sample is small (less than 5% of its population),
the standard deviation can be approximated by:
σ_{p1} _{-} _{p2}
=
sqrt{ [P_{1} * (1 - P_{1}) / n_{1}]
+
[P_{2} * (1 - P_{2}) / n_{2}] }
- When the population parameters (P_{1} and P_{2})
are not known, the standard deviation of the sampling distribution
cannot be calculated.
Under these circumstances, use the
standard error.
The standard error (SE) can be calculated from the equation below.
SE_{p1} _{-} _{p2}
=
sqrt{ [p_{1} * (1 - p_{1}) / n_{1}]
*
[(N_{1} - n_{1}) / (N_{1} - 1)]
+
[p_{2} * (1 - p_{2}) / n_{2}]
*
[(N_{2} - n_{2}) / (N_{2} - 1)] }
where p_{1} is the sample proportion for sample 1, and
where p_{2} is the sample proportion for sample 2.
When each sample is small (less than 5% of its population),
the standard deviation can be approximated by:
SE_{p1} _{-} _{p2}
=
sqrt{ [p_{1} * (1 - p_{1}) / n_{1}]
+
[p_{2} * (1 - p_{2}) / n_{2}] }
Alert
The Advanced Placement Statistics
Examination only covers the "approximate" formulas for the standard
deviation and standard error.
σ_{p1} _{-} _{p2}
=
sqrt{ [P_{1} * (1 - P_{1}) / n_{1}]
+
[P_{2} * (1 - P_{2}) / n_{2}] }
SE_{p1} _{-} _{p2}
=
sqrt{ [p_{1} * (1 - p_{1}) / n_{1}]
+
[p_{2} * (1 - p_{2}) / n_{2}] }
However, students are expected to be
aware of the limitations of these formulas; namely, the
approximate formulas should only be used when the population
size is at least 20 times larger than the sample size.
How to Find the Confidence Interval for a Proportion
Previously, we described
how to construct confidence intervals. For convenience, we
repeat the key steps below.
- Identify a sample statistic. Use the sample proportions
(p_{1} - p_{2}) to
estimate the difference between population proportions
(P_{1} - P_{2}).
- Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
In the next section, we work through a problem that shows how to use
this approach to construct a confidence interval for the
difference between proportions.
Test Your Understanding
Problem 1
Suppose the Cartoon Network conducts a nation-wide survey to
assess viewer attitudes toward Superman. Using a simple
random sample, they select 400 boys and 300 girls to
participate in the study. Forty percent of the boys say
that Superman is their favorite character, compared to thirty percent
of the girls. What is the 90% confidence interval for the true
difference in attitudes toward Superman?
(A) 0 to 20 percent more boys prefer Superman
(B) 2 to 18 percent more boys prefer Superman
(C) 4 to 16 percent more boys prefer Superman
(D) 6 to 14 percent more boys prefer Superman
(E) None of the above
Solution
The correct answer is (C). The approach that we used to solve this
problem is valid when the following conditions are met.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
the difference between population proportions, we choose the
difference between sample proportions
as the sample statistic. Thus, the sample statistic is
p_{boy} - p_{girl} = 0.40 - 0.30 = 0.10.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard deviation or standard error. Since we do not
know the population proportions, we cannot compute the
standard deviation; instead, we compute the standard
error. And since each population is more than 20 times larger
than its sample, we can use the following formula
to compute the standard error (SE) of the difference
between proportions:
SE =
sqrt{ [p_{1} * (1 - p_{1}) / n_{1}]
+
[p_{2} * (1 - p_{2}) / n_{2}] }
SE =
sqrt{ [0.40 * 0.60 / 400] + [0.30 * 0.70 / 300] }
SE = sqrt[ (0.24 / 400) + (0.21 / 300) ] = sqrt(0.0006 + 0.0007)
= sqrt(0.0013) = 0.036
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sampling
distribution is approximately normal and the sample
sizes are large, we can express the critical value as a
z-score
by following these steps.
- Compute margin of error (ME):
ME = critical value * standard error
ME = 1.645 * 0.036 = 0.06
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 90% confidence interval is 0.04 to 0.16. That is, we are 90%
confident that the true difference between population proportion is in
the range defined by 0.10 + 0.06. Since both ends of the
confidence interval are positive, we can conclude that more boys than
girls choose Superman as their favorite cartoon character.