Confidence Interval: Difference Between Means
This lesson describes how to construct a
confidence interval
for the difference between two means.
Estimation Requirements
The approach described in this lesson is valid whenever the
following conditions are met:
Generally, the sampling distribution will be approximately
normally distributed if each sample is described by at least
one of the following statements.
- The sample size is greater than 40, without outliers.
The Variability of the Difference Between Sample Means
To construct a
confidence interval, we need to know the variability
of the difference between sample means. This means we need to know
how to compute the
standard deviation
of the
sampling distribution of the difference.
Note: In real-world analyses, the standard deviation of the
population is seldom known. Therefore,
SEx1-x2
is used
more often than
σx1-x2.
Alert
Some texts present additional options
for calculating standard deviations.
These formulas, which should only be used under special
circumstances, are described below.
- Standard deviation. Use this formula when the population
standard deviations are known and are equal.
σx1 - x2
= σd
= σ * sqrt[ (1 / n1) + (1 / n2)]
where σ = σ1 = σ2
- Pooled standard deviation. Use this formula when the population
standard deviations are unknown, but assumed to be equal; and
the samples sizes (n1) and (n2) are
small (under 30).
SDpooled
= sqrt{ [ (n1 -1) * s12)
+ (n2 -1) * s22) ]
/ (n1 + n2 - 2) }
where σ1 = σ2
Remember, these two formulas
should be used only when the various required underlying
assumptions are justified.
How to Find the Confidence Interval for the Difference Between Means
Previously, we described
how to construct confidence intervals. For convenience, we
repeat the key steps below.
- Identify a sample statistic. Use the difference between sample means to
estimate the difference between population means.
- Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error, based on the
critical value and standard deviation.
When the sample size is large, you can use a t score or a
z score
for the critical value.
Since it does not require computing degrees of freedon, the
z score is a little easier. When the sample
sizes are small (less than 40), use a
t score
for the critical value.
If you use a t score, you will need to compute
degrees of freedom (DF). Here's how.
The next section presents sample problems that illustrate how to
use z scores and t scores as critical values.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Test Your Understanding of This Lesson
Problem 1: Small Samples
Suppose that simple random samples of college freshman are
selected from two universities - 15 students
from school A and 20 students from school B. On a standardized
test, the sample from school A has an average score of 1000 with a
standard deviation of 100. The sample from school B has an average
score of 950 with a standard deviation of 90.
What is the 90%
confidence interval for the difference in test scores at
the two schools, assuming that test scores came from normal
distributions in both schools? (Hint: Since the sample sizes are
small, use a
t score
as the
critical value.)
(A) 50 + 1.70
(B) 50 + 28.49
(C) 50 + 32.74
(D) 50 + 55.66
(E) None of the above
Solution
The correct answer is (D). The approach that we used to solve this
problem is valid when the following conditions are met.
- The
sampling distribution
should be approximately normally distributed. The problem states
that test scores in each population are normally distributed,
so the difference between test scores will also be normally
distributed.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
the difference between population means, we choose the
difference between sample means as the sample statistic. Thus,
x1 - x2
= 1000 - 950 = 50.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 90% confidence interval is -5.66 to 100.66. That is, we are 99%
confident that the true difference in population means is in the range
defined by 50 + 55.66.
Problem 2: Large Samples
The local baseball team conducts a study to find the amount
spent on refreshments at the ball park. Over the course of
the season they gather simple random samples of 50 men and 100 women.
For men, the average expenditure was $20, with a standard deviation of
$3. For women, it was $15, with a standard deviation of $2.
What is the 99%
confidence interval for the spending difference between
men and women? Assume that the two populations are independent
and normally distributed.
(A) $5 + $0.47
(B) $5 + $1.21
(C) $5 + $2.58
(D) $5 + $5.00
(E) None of the above
Solution
The correct answer is (B). The approach that we used to solve this
problem is valid when the following conditions are met.
- The
sampling distribution
should be approximately normally distributed. The problem states
that test scores in each population are normally distributed,
so the difference between test scores will also be normally
distributed.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
the difference between population means,
we choose the difference between sample means
as the sample statistic. Thus,
x1 - x2
= $20 - $15 = $5.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 99%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 99% confidence interval is $3.79 to $6.21. That is, we are 99%
confident that men outspend women at the ballpark by about
$5 + $1.21.