Confidence Interval: Difference Between Means
This lesson describes how to construct a
confidence interval
for the difference between two means.
Estimation Requirements
The approach described in this lesson is valid whenever the
following conditions are met:
Generally, the sampling distribution will be approximately
normally distributed when the sample size is greater than or
equal to 30.
The Variability of the Difference Between Sample Means
To construct a
confidence interval, we need to know the variability
of the difference between sample means. This means we need to know
how to compute the
standard deviation
of the
sampling distribution of the difference.
Note: In real-world analyses, the standard deviation of the
population is seldom known. Therefore,
SE_{x1-x2}
is used
more often than
σ_{x1-x2}.
Alert
Some texts present additional options
for calculating standard deviations.
These formulas, which should only be used under special
circumstances, are described below.
Remember, these two formulas
should be used only when the various required underlying
assumptions are justified.
How to Find the Confidence Interval for the Difference Between Means
Previously, we described
how to construct confidence intervals. For convenience, we
repeat the key steps below.
- Identify a sample statistic. Use the difference between sample means to
estimate the difference between population means.
- Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error, based on the
critical value and standard deviation.
When the sample size is large, you can use a t statistic or a
z-score
for the critical value.
Since it does not require computing degrees of freedom, the
z-score is a little easier. When the sample
sizes are small (less than 40), use a
t score
for the critical value.
If you use a t statistic, you will need to compute
degrees of freedom (DF). Here's how.
The next section presents sample problems that illustrate how to
use z-scores and t statistics as critical values.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Test Your Understanding
Problem 1: Small Samples
Suppose that simple random samples of college freshman are
selected from two universities - 15 students
from school A and 20 students from school B. On a standardized
test, the sample from school A has an average score of 1000 with a
standard deviation of 100. The sample from school B has an average
score of 950 with a standard deviation of 90.
What is the 90%
confidence interval for the difference in test scores at
the two schools, assuming that test scores came from normal
distributions in both schools? (Hint: Since the sample sizes are
small, use a
t score
as the
critical value.)
(A) 50 + 1.70
(B) 50 + 28.49
(C) 50 + 32.74
(D) 50 + 55.66
(E) None of the above
Solution
The correct answer is (D). The approach that we used to solve this
problem is valid when the following conditions are met.
- The
sampling distribution
should be approximately normally distributed. The problem states
that test scores in each population are normally distributed,
so the difference between test scores will also be normally
distributed.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
the difference between population means, we choose the
difference between sample means as the sample statistic. Thus,
x_{1} - x_{2}
= 1000 - 950 = 50.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard error. Using the sample standard deviations,
we compute the standard error (SE), which is an estimate of the
standard deviation of the difference between sample means.
SE =
sqrt [ s^{2}_{1} / n_{1} +
s^{2}_{2} / n_{2} ]
SE =
sqrt [(100)^{2} / 15 + (90)^{2} / 20]
SE =
sqrt (10,000/15 + 8100/20)
SE = sqrt(666.67 + 405) = 32.74
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are small, we express the critical value as a
t score
rather than a
z-score.
To find the critical value, we take these steps.
- Compute alpha (α):
α = 1 - (confidence level / 100)
α = 1 - 90/100 = 0.10
- Find the critical probability (p*):
p* = 1 - α/2 = 1 - 0.10/2 = 0.95
- Find the
degrees of freedom (df):
DF =
(s_{1}^{2}/n_{1} +
s_{2}^{2}/n_{2})^{2} /
{ [ (s_{1}^{2} / n_{1})^{2} /
(n_{1} - 1) ] +
[ (s_{2}^{2} / n_{2})^{2} /
(n_{2} - 1) ] }
DF = (100^{2}/15 + 90^{2}/20)^{2} /
{ [ (100^{2} /15)^{2} / 14 ] +
[ (90^{2} /20)^{2} / 19 ] }
DF = (666.67 + 405}^{2} /
(31746.03 + 8632.89)
DF = 1150614.5 / 40378.92 = 28.495
Rounding off to the nearest whole number, we conclude
that there are 28 degrees of freedom.
- The critical value is
the t statistic having 28 degrees of freedom and a
cumulative probability
equal to 0.95. From the
t Distribution Calculator,
we find that the critical value is 1.7.
- Compute margin of error (ME):
ME = critical value * standard error
ME = 1.7 * 32.74 = 55.66
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 90% confidence interval is 50 + 55.66; that is, -5.66 to 105.66. Here's how to interpret this confidence interval.
Suppose we repeated this study with different random samples for school A and school B. Based on the confidence interval, we would expect
the observed difference in sample means to be between -5.66 and 105.66 90% of the time.
Problem 2: Large Samples
The local baseball team conducts a study to find the amount
spent on refreshments at the ball park. Over the course of
the season they gather simple random samples of 500 men and 1000 women.
For men, the average expenditure was $20, with a standard deviation of
$3. For women, it was $15, with a standard deviation of $2.
What is the 99%
confidence interval for the spending difference between
men and women? Assume that the two populations are independent
and normally distributed.
(A) $5 + $0.15
(B) $5 + $0.38
(C) $5 + $1.15
(D) $5 + $1.38
(E) None of the above
Solution
The correct answer is (B). The approach that we used to solve this
problem is valid when the following conditions are met.
- The
sampling distribution
should be approximately normally distributed. The problem states
that test scores in each population are normally distributed,
so the difference between test scores will also be normally
distributed.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
the difference between population means,
we choose the difference between sample means
as the sample statistic. Thus,
x_{1} - x_{2}
= $20 - $15 = $5.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 99%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard error. The standard error is an estimate of
the standard deviation of the difference between population means.
We use the sample standard deviations to estimate the
standard error (SE).
SE =
sqrt [ s^{2}_{1} / n_{1} +
s^{2}_{2} / n_{2} ]
SE =
sqrt [(3)^{2} / 500 + (2)^{2} / 1000]
SE =
sqrt (9/500 + 4/1000)
SE = sqrt(0.018 + 0.004) = 0.148
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are large enough, we express the critical value as a
z-score.
To find the critical value, we take these steps.
- Compute margin of error (ME):
ME = critical value * standard error
ME = 2.58 * 0.148 = 0.38
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 99% confidence interval is $5 + $0.38; that is, $4.62 to $5.38.