# Confidence Interval: Difference Between Means

This lesson describes how to construct a confidence interval for the difference between two means.

## Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

- Both samples are simple random samples.
- The samples are independent.
- Each population is at least 20 times larger than its respective sample.
- The sampling distribution of the difference between means is approximately normally distributed.

Generally, the sampling distribution will be approximately normally distributed when the sample size is greater than or equal to 30.

## The Variability of the Difference Between Sample Means

To construct a confidence interval, we need to know the variability of the difference between sample means. This means we need to know how to compute the standard deviation of the sampling distribution of the difference.

- If the population standard deviations are known, the
standard deviation of the sampling distribution is:
σ

where σ_{x1-x2}= sqrt [ σ^{2}_{1}/ n_{1}+ σ^{2}_{2}/ n_{2}]_{1}is the standard deviation of the population 1, σ_{2}is the standard deviation of the population 2, and n_{1}is the size of sample 1, and n_{2}is the size of sample 2. - When the standard deviation of either population is unknown
and the sample sizes (n
_{1}and n_{2}) are large, the standard deviation of the sampling distribution can be estimated by the standard error, using the equation below.SE

where SE is the standard error, s_{x1-x2}= sqrt [ s^{2}_{1}/ n_{1}+ s^{2}_{2}/ n_{2}]_{1}is the standard deviation of the sample 1, s_{2}is the standard deviation of the sample 2, and n_{1}is the size of sample 1, and n_{2}is the size of sample 2.

**Note:** In real-world analyses, the standard deviation of the
population is seldom known. Therefore,
SE_{x1-x2}
is used
more often than
σ_{x1-x2}.

## Alert

Some texts present additional options for calculating standard deviations. These formulas, which should only be used under special circumstances, are described below.

- Standard deviation. Use this formula when the population
standard deviations are known and are equal.

σ_{x1}_{-}_{x2}= σ_{d}= σ * sqrt[ (1 / n_{1}) + (1 / n_{2})] where σ = σ_{1}= σ_{2} - Pooled standard deviation. Use this formula when the population
standard deviations are unknown, but assumed to be equal; and
the samples sizes (n
_{1}) and (n_{2}) are small (under 30).

SD_{pooled}= sqrt{ [ (n_{1}-1) * s_{1}^{2}) + (n_{2}-1) * s_{2}^{2}) ] / (n_{1}+ n_{2}- 2) } where σ_{1}= σ_{2}

Remember, these two formulas should be used only when the various required underlying assumptions are justified.

## How to Find the Confidence Interval for the Difference Between Means

Previously, we described how to construct confidence intervals. For convenience, we repeat the key steps below.

- Identify a sample statistic. Use the difference between sample means to
estimate the difference between population means.
- Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error, based on the
critical value and standard deviation.
When the sample size is large, you can use a t statistic or a z score for the critical value. Since it does not require computing degrees of freedom, the z score is a little easier. When the sample sizes are small (less than 40), use a t score for the critical value.

If you use a t statistic, you will need to compute degrees of freedom (DF). Here's how.

- The following formula is appropriate whenever a
t statistic is used to analyze the difference between means.
DF = (s

_{1}^{2}/n_{1}+ s_{2}^{2}/n_{2})^{2}/ { [ (s_{1}^{2}/ n_{1})^{2}/ (n_{1}- 1) ] + [ (s_{2}^{2}/ n_{2})^{2}/ (n_{2}- 1) ] } - If you are working with a pooled standard deviation
(see above), DF = n
_{1}+ n_{2}- 2.

The next section presents sample problems that illustrate how to use z scores and t statistics as critical values.

- The following formula is appropriate whenever a
t statistic is used to analyze the difference between means.
- Specify the confidence interval. The range of the confidence
interval is defined by the
*sample statistic*__+__*margin of error*. And the uncertainty is denoted by the confidence level.

## Test Your Understanding

**Problem 1: Small Samples**

Suppose that simple random samples of college freshman are selected from two universities - 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90.

What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)

(A) 50 __+__ 1.70

(B) 50 __+__ 28.49

(C) 50 __+__ 32.74

(D) 50 __+__ 55.66

(E) None of the above

**Solution**

The correct answer is (D). The approach that we used to solve this problem is valid when the following conditions are met.

- The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
- The samples must be independent. Since responses from one sample did not affect responses from the other sample, the samples are independent.
- The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

- Identify a sample statistic. Since we are trying to estimate
the difference between population means, we choose the
difference between sample means as the sample statistic. Thus,
x
_{1}- x_{2}= 1000 - 950 = 50. - Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.

- Find standard error. Using the sample standard deviations,
we compute the standard error (SE), which is an estimate of the
standard deviation of the difference between sample means.
SE = sqrt [ s

^{2}_{1}/ n_{1}+ s^{2}_{2}/ n_{2}]

SE = sqrt [(100)^{2}/ 15 + (90)^{2}/ 20]

SE = sqrt (10,000/15 + 8100/20) = sqrt(666.67 + 405) = 32.74 - Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are small, we express the critical value as a
t score
rather than a
z score.
To find the critical value, we take these steps.

- Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 90/100 = 0.10
- Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.10/2 = 0.95
- Find the
degrees of freedom (df):
DF = (s

Rounding off to the nearest whole number, we conclude that there are 28 degrees of freedom._{1}^{2}/n_{1}+ s_{2}^{2}/n_{2})^{2}/ { [ (s_{1}^{2}/ n_{1})^{2}/ (n_{1}- 1) ] + [ (s_{2}^{2}/ n_{2})^{2}/ (n_{2}- 1) ] }

DF = (100^{2}/15 + 90^{2}/20)^{2}/ { [ (100^{2}/15)^{2}/ 14 ] + [ (90^{2}/20)^{2}/ 19 ] }

DF = (666.67 + 405}^{2}/ (31746.03 + 8632.89) = 1150614.5 / 40378.92 = 28.495 - The critical value is the t statistic having 28 degrees of freedom and a cumulative probability equal to 0.95. From the t Distribution Calculator, we find that the critical value is 1.7.

- Compute margin of error (ME): ME = critical value * standard error = 1.7 * 32.74 = 55.66

- Find standard error. Using the sample standard deviations,
we compute the standard error (SE), which is an estimate of the
standard deviation of the difference between sample means.
- Specify the confidence interval. The range of the confidence
interval is defined by the
*sample statistic*__+__*margin of error*. And the uncertainty is denoted by the confidence level.

Therefore, the 90% confidence interval is 50 __+__ 55.66; that is, -5.66 to 105.66. Here's how to interpret this confidence interval.
Suppose we repeated this study with different random samples for school A and school B. Based on the confidence interval, we would expect
the observed difference in sample means to be between -5.66 and 105.66 90% of the time.

**Problem 2: Large Samples**

The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of 500 men and 1000 women. For men, the average expenditure was $20, with a standard deviation of $3. For women, it was $15, with a standard deviation of $2.

What is the 99% confidence interval for the spending difference between men and women? Assume that the two populations are independent and normally distributed.

(A) $5 __+__ $0.15

(B) $5 __+__ $0.38

(C) $5 __+__ $1.15

(D) $5 __+__ $1.38

(E) None of the above

**Solution**

The correct answer is (B). The approach that we used to solve this problem is valid when the following conditions are met.

- The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
- The samples must be independent. Again, the problem statement satisfies this condition.
- The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

- Identify a sample statistic. Since we are trying to estimate
the difference between population means,
we choose the difference between sample means
as the sample statistic. Thus,
x
_{1}- x_{2}= $20 - $15 = $5. - Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 99%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.

- Find standard error. The standard error is an estimate of
the standard deviation of the difference between population means.
We use the sample standard deviations to estimate the
standard error (SE).
SE = sqrt [ s

^{2}_{1}/ n_{1}+ s^{2}_{2}/ n_{2}]

SE = sqrt [(3)^{2}/ 500 + (2)^{2}/ 1000] = sqrt (9/500 + 4/1000) = sqrt(0.018 + 0.004) = 0.148 - Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample sizes
are large enough, we express the critical value as a
z score.
To find the critical value, we take these steps.

- Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 99/100 = 0.01
- Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.01/2 = 0.995
- The critical value is the z score having a cumulative probability equal to 0.995. From the Normal Distribution Calculator, we find that the critical value is 2.58.

- Compute margin of error (ME): ME = critical value * standard error = 2.58 * 0.148 = 0.38

- Find standard error. The standard error is an estimate of
the standard deviation of the difference between population means.
We use the sample standard deviations to estimate the
standard error (SE).
- Specify the confidence interval. The range of the confidence
interval is defined by the
*sample statistic*__+__*margin of error*. And the uncertainty is denoted by the confidence level.

Therefore, the 99% confidence interval is $5 __+__ $0.38; that is, $4.62 to $5.38.