Confidence Interval: Difference Between Means
This lesson describes how to construct a confidence interval for the difference between two means.
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
- Both samples are simple random samples.
- The samples are independent.
- Each population is at least 20 times larger than its respective sample.
- The sampling distribution of the difference between means is approximately normally distributed.
Generally, the sampling distribution will be approximately normally distributed when the sample size is greater than or equal to 30.
The Variability of the Difference Between Sample Means
To construct a confidence interval, we need to know the variability of the difference between sample means. This means we need to know how to compute the standard deviation of the sampling distribution of the difference.
- If the population standard deviations are known, the
standard deviation of the sampling distribution is:
σx1-x2 = sqrt [ σ21 / n1 + σ22 / n2 ]
where σ1 is the standard deviation of the population 1, σ2 is the standard deviation of the population 2, and n1 is the size of sample 1, and n2 is the size of sample 2. - When the standard deviation of either population is unknown
and the sample sizes (n1 and n2) are large,
the standard deviation of the sampling distribution can be
estimated by the standard error, using the equation below.
SEx1-x2 = sqrt [ s21 / n1 + s22 / n2 ]
where SE is the standard error, s1 is the standard deviation of the sample 1, s2 is the standard deviation of the sample 2, and n1 is the size of sample 1, and n2 is the size of sample 2.
Note: In real-world analyses, the standard deviation of the population is seldom known. Therefore, SEx1-x2 is used more often than σx1-x2.
Alert
Some texts present additional options for calculating standard deviations. These formulas, which should only be used under special circumstances, are described below.
- Standard deviation. Use this formula when the population
standard deviations are known and are equal.
σx1 - x2 = σd = σ * sqrt[ (1 / n1) + (1 / n2)]
where σ = σ1 = σ2 - Pooled standard deviation. Use this formula when the population
standard deviations are unknown, but assumed to be equal; and
the samples sizes (n1) and (n2) are
small (under 30).
SDpooled = sqrt{ [ (n1 -1) * s12) + (n2 -1) * s22) ] / (n1 + n2 - 2) }
where σ1 = σ2
Remember, these two formulas should be used only when the various required underlying assumptions are justified.
How to Find the Confidence Interval for the Difference Between Means
Previously, we described how to construct confidence intervals. For convenience, we repeat the key steps below.
- Identify a sample statistic. Use the difference between sample means to estimate the difference between population means.
- Select a confidence level. The confidence level describes the uncertainty of a sampling method. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error, based on the
critical value and standard deviation.
When the sample size is large, you can use a t statistic or a z-score for the critical value. Since it does not require computing degrees of freedom, the z-score is a little easier. When the sample sizes are small (less than 40), use a t statistic for the critical value.
If you use a t statistic, you will need to compute degrees of freedom (DF). Here's how.
- The following formula is appropriate whenever a
t statistic is used to analyze the difference between means.
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
- If you are working with a pooled standard deviation (see above), DF = n1 + n2 - 2.
- The following formula is appropriate whenever a
t statistic is used to analyze the difference between means.
- Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.
Note: The next two problems show how to use t statistics (see Problem 1) and z-scores (see Problem 2) as critical values.
Test Your Understanding
Problem 1: Small Samples
Suppose that simple random samples of college freshman are selected from two universities - 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90.
What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t statistic as the critical value.)
(A) 50 + 1.70
(B) 50 + 28.49
(C) 50 + 32.74
(D) 50 + 55.66
(E) None of the above
Solution
The correct answer is (D). The approach that we used to solve this problem is valid when the following conditions are met.
- The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
- The samples must be independent. Since responses from one sample did not affect responses from the other sample, the samples are independent.
- The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.
Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate the difference between population means, we choose the difference between sample means as the sample statistic. Thus, x1 - x2 = 1000 - 950 = 50.
- Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 90% confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard error. Using the sample standard deviations,
we compute the standard error (SE), which is an estimate of the
standard deviation of the difference between sample means.
SE = sqrt [ s21 / n1 + s22 / n2 ]
SE = sqrt [(100)2 / 15 + (90)2 / 20]
SE = sqrt (10,000/15 + 8100/20)
SE = sqrt(666.67 + 405) = 32.74
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are small, we express the critical value as a
t statistic
rather than a
z-score.
To find the critical value, we take these steps.
- Compute alpha (α):
α = 1 - (confidence level / 100)
α = 1 - 90/100 = 0.10
- Find the critical probability (p*):
p* = 1 - α/2 = 1 - 0.10/2 = 0.95
- Find the
degrees of freedom (df):
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
DF = (1002/15 + 902/20)2 / { [ (1002 /15)2 / 14 ] + [ (902 /20)2 / 19 ] }
DF = (666.67 + 405}2 / (31746.03 + 8632.89)
DF = 1150614.5 / 40378.92 = 28.495
Rounding off to the nearest whole number, we conclude that there are 28 degrees of freedom. - The critical value is the t statistic having 28 degrees of freedom and a cumulative probability equal to 0.95. From the t Distribution Calculator, we find that the critical value is about 1.7.
- Compute alpha (α):
- Compute margin of error (ME):
ME = critical value * standard error
ME = 1.7 * 32.74 = 55.66
- Find standard error. Using the sample standard deviations,
we compute the standard error (SE), which is an estimate of the
standard deviation of the difference between sample means.
- Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.
Therefore, the 90% confidence interval is 50 + 55.66; that is, -5.66 to 105.66. Here's how to interpret this confidence interval. Suppose we repeated this study with different random samples for school A and school B. Based on the confidence interval, we would expect the observed difference in sample means to be between -5.66 and 105.66 90% of the time.
Problem 2: Large Samples
The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of 500 men and 1000 women. For men, the average expenditure was $20, with a standard deviation of $3. For women, it was $15, with a standard deviation of $2.
What is the 99% confidence interval for the spending difference between men and women? Assume that the two populations are independent and normally distributed.
(A) $5 + $0.15
(B) $5 + $0.38
(C) $5 + $1.15
(D) $5 + $1.38
(E) None of the above
Solution
The correct answer is (B). The approach that we used to solve this problem is valid when the following conditions are met.
- The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
- The samples must be independent. Again, the problem statement satisfies this condition.
- The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.
Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate the difference between population means, we choose the difference between sample means as the sample statistic. Thus, x1 - x2 = $20 - $15 = $5.
- Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard error. The standard error is an estimate of
the standard deviation of the difference between population means.
We use the sample standard deviations to estimate the
standard error (SE).
SE = sqrt [ s21 / n1 + s22 / n2 ]
SE = sqrt [(3)2 / 500 + (2)2 / 1000]
SE = sqrt (9/500 + 4/1000)
SE = sqrt(0.018 + 0.004) = 0.148
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are large enough, we express the critical value as a
z-score.
To find the critical value, we take these steps.
- Compute alpha (α):
α = 1 - (confidence level / 100)
α = 1 - 99/100 = 0.01
- Find the critical probability (p*):
p* = 1 - α/2 = 1 - 0.01/2 = 0.995
- The critical value is the z-score having a cumulative probability equal to 0.995. From the Normal Distribution Calculator, we find that the critical value is about 2.58.
- Compute alpha (α):
- Compute margin of error (ME):
ME = critical value * standard error
ME = 2.58 * 0.148 = 0.38
- Find standard error. The standard error is an estimate of
the standard deviation of the difference between population means.
We use the sample standard deviations to estimate the
standard error (SE).
- Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.