Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Confidence Interval: Difference Between Means

This lesson describes how to construct a confidence interval for the difference between two means.

Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

Note: The sampling distribution will be approximately normally distributed when each sample size is greater than or equal to 30.

The Variability of the Difference Between Sample Means

To construct a confidence interval, we need to know the variability of the difference between sample means. This means we need to know the standard deviation of the sampling distribution of the difference.

  • If the population standard deviations are known, the standard deviation of the sampling distribution is:

    σx1-x2 = sqrt [ σ21 / n1 + σ22 / n2 ]

    where σ1 is the standard deviation of the population 1, σ2 is the standard deviation of the population 2, and n1 is the size of sample 1, and n2 is the size of sample 2.
  • When the standard deviation of either population is unknown and the sample sizes (n1 and n2) are large, the standard deviation of the sampling distribution can be estimated by the standard error, using the equation below.

    SEx1-x2 = sqrt [ s21 / n1 + s22 / n2 ]

    where SE is the standard error, s1 is the standard deviation of the sample 1, s2 is the standard deviation of the sample 2, and n1 is the size of sample 1, and n2 is the size of sample 2.

Note: In real-world analyses, the standard deviation of the population is seldom known. Therefore, SEx1-x2 is used more often than σx1-x2.

Pooled Standard Error

When the population standard deviations (σ1 and σ2 ) are unknown but assumed to be equal, the standard error can be computed by combining or pooling the sample standard deviations, as shown below:

sp = sqrt{ [ (n1 -1) * s12) + (n2 -1) * s22) ] / (n1 + n2 - 2) }

SEp = sp * sqrt( 1 / n1 + 1 / n2 )

where sp is the pooled estimate of the equal population standard deviations, SEp is the pooled standard error, s1 and s2 are sample standard deviations, and n1 and n2 are sample sizes.

The critical value for the pooled standard error is a t-score with degrees of freedom (df) equal to:

df = n2 + n2 - 2

Note: Students taking the AP Statistics exam are expected to be aware of the pooled standard deviation and the pooled standard error, even though formulas for the pooled standard deviation and pooled standard error may not be provided on the formula sheet for the exam.

The Critical Value

The critical value is a factor used to compute the margin of error around a statistic. When the statistic is a difference between sample means, the critical value can be expressed as a z-score or a t-score.

  • z-Score. When both sample sizes are large (n ≥ 30) and the standard deviation of each population distribution is known, use a z-score.
  • t-Score. When either sample size is small (n < 30) or the standard deviation of either population is unknown, use a t-score.

Most of the time, we do not know the standard deviation of the population; so most of the time, we express the critical value as a t-score.

Warning: If either sample size is small (n < 30) and its population distribution is distinctly not normal (e.g., heavily skewed or contains outliers), do not express the critical value as a z-score or a t-score. (Such cases are not part of the AP Statistics curriculum and are beyond the scope of what we cover in this tutorial.)

How to Express Critical Value as t-Score

To express the critical value as a t-score, follow these steps.

  • Compute alpha (α): α = 1 - (confidence level / 100)
    • When the confidence level is 99%, α is 1 - 99/100 or 0.01.
    • When the confidence level is 95%, α is 1 - 95/100 or 0.05.
    • When the confidence level is 90%, α is 1 - 90/100 or 0.1.
  • Find the critical probability (p*): p* = 1 - α/2
  • Find the degrees of freedom (df).
    • When the population standard deviations are equal, use this formula to find degrees of freedom:

      df = n1 + n2 - 1

    • When the population standard deviations are unequal, use the Welch-Satterthwaite Approximation to find degrees of freedom:

      num = (s12/n1 + s22/n2)2

      den = [(s12/n1)2/(n1 - 1)] + [(s22/n2)2/(n2 - 1)]

      df = num/den

      where s12 and s22​ are the sample variances for the two groups, and n1 and n2​ are the sample sizes for the two groups. (If this formula produces a non-integer df, round df down to the nearest whole number.)

  • Find the t-score having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).

To find the critical t-score, use an online calculator (e.g.,Stat Trek's t Distribution Calculator), a graphing calculator, or a t-distribution statistical table (found in the appendix of most introductory statistics texts).

How to Express Critical Value as z-Score

When the critical value is expressed as a z-score, its value depends on the confidence level. Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level.

To express the critical value as a z-score when the confidence level is not 90%, 95%, or 99%, follow these steps.

  • Compute alpha (α): α = 1 - (confidence level / 100)
  • Find the critical probability (p*): p* = 1 - α/2
  • Find the z-score having a cumulative probability equal to the critical probability (p*).

To find the critical z-score, use an online calculator (e.g, Stat Trek's Normal Distribution Calculator), a graphing calculator, or a normal distribution statistical table (found in the appendix of most introductory statistics texts).

How to Find the Confidence Interval for the Difference Between Means

Previously, we described how to construct confidence intervals . For convenience, we repeat the five steps below.

  1. Choose the confidence level. The confidence level describes the uncertainty of a sampling plan. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
  2. Compute the standard deviation or standard error. The standard deviation (σx1-x2) and the standard error (SE) of the difference in means can be calculated from the following formulas:

    σx1-x2 = sqrt [ σ21 / n1 + σ22 / n2 ]

    SE = sqrt [ s21 / n1 + s22 / n2 ]

  3. Find the critical value. Follow the instructions for finding z-score and t-score critical values provided above.
  4. Find the margin of error. You can compute the margin of error, based on one of the following equations.

    Margin of error = Critical value * Standard deviation of statistic

    Margin of error = Critical value * Standard error of statistic

  5. Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval is defined by the following equation.

    Confidence interval = Sample statistic ± Margin of error

Note: The next two problems show how to use t-scores (see Problem 1) and z-scores (see Problem 2) as critical values.

Test Your Understanding

Problem 1: Small Samples

Suppose that simple random samples of college freshman are selected from two universities - 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90.

What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t-score as the critical value.)

(A) 50 + 1.70
(B) 50 + 28.49
(C) 50 + 32.74
(D) 50 + 55.66
(E) None of the above

Solution

The correct answer is (D). The approach that we used to solve this problem is valid when the following conditions are met.

  • The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
  • The samples must be independent. Since responses from one sample did not affect responses from the other sample, the samples are independent.
  • The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following five-step approach to construct a confidence interval.

  • Choose confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 90% confidence level.
  • Compute the standard deviation or standard error. Using the sample standard deviations, we compute the standard error (SE), which is an estimate of the standard deviation of the difference between sample means.

    SE = sqrt [ s21 / n1 + s22 / n2 ]

    SE = sqrt [(100)2 / 15 + (90)2 / 20]

    SE = sqrt (10,000/15 + 8100/20)

    SE = sqrt(666.67 + 405) = 32.74

  • Find critical value. The critical value is a factor used to compute the margin of error. Because the sample sizes are small, we express the critical value as a t-score rather than a z-score. We follow the instructions for finding t-score critical values described above, as shown below:
    • Compute alpha (α):

      α = 1 - (confidence level / 100)

      α = 1 - 90/100 = 0.10

    • Find the critical probability (p*):

      p* = 1 - α/2 = 1 - 0.10/2 = 0.95

    • Find the degrees of freedom (df). Since we cannot assume that the population standard deviations are equal, we use the Welch-Satterthwaite Approximation to compute degrees of freedom:

      df = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }

      df = (1002/15 + 902/20)2 / { [ (1002 /15)2 / 14 ] + [ (902 /20)2 / 19 ] }

      df = (666.67 + 405}2 / (31746.03 + 8632.89)

      df = 1150614.5 / 40378.92 = 28.495

      Rounding down to the nearest whole number, we conclude that there are 28 degrees of freedom.
    • The critical value is the t-score having 28 degrees of freedom and a cumulative probability equal to 0.95. From the t Distribution Calculator, we find that the critical value is about 1.7.
    T Distribution Calculator
  • Find the margin of error (ME): We can compute the margin of error from the critical value (CV) and the standard error (SE):

    ME = CV * SE

    ME = 1.7 * 32.74 = 55.66

  • Specify the confidence interval (CI). The range of the confidence interval is defined by the sample statistic + margin of error.

    CI = x ± ME

    CI = 50 ± 55.66

    And the uncertainty is denoted by the confidence level, which is 90%.

Therefore, the 90% confidence interval is 50 + 55.66; that is, -5.66 to 105.66. Here's how to interpret this confidence interval. Suppose we repeated this study with different random samples for school A and school B, and we computed a separate confidence interval for each sample. We would expect 90% of the confidence intervals to include the true difference in population means.

Problem 2: Large Samples and Known Standard Deviations

The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of 500 men and 1000 women. For the sampled men, the average expenditure was $20; and for the sampled women the average expenditure was $15. Assume that the population standard deviations are known: $3 for men and $2 for women.

What is the 99% confidence interval for the spending difference between men and women?

(A) $5 + $0.15
(B) $5 + $0.38
(C) $5 + $1.15
(D) $5 + $1.38
(E) None of the above

Solution

The correct answer is (B). The approach that we used to solve this problem is valid when the following conditions are met.

  • The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
  • The samples must be independent. Again, the problem statement satisfies this condition.
  • The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following five-step approach to construct a confidence interval.

  • Choose the confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.
  • Compute the standard deviation or standard error. Because we know the population standard deviations (σ1 and (σ2), we can compute the standard deviation (SD) of the sampling distribution.

    SD = sqrt [ σ21 / n1 + σ22 / n2 ]

    SD = sqrt [(3)2 / 500 + (2)2 / 1000]

    SD = sqrt (9/500 + 4/1000)

    SD = sqrt(0.018 + 0.004) = 0.148

  • Find the critical value. The critical value (CV) is a factor used to compute the margin of error. Because the sample size is large and the population standard deviations are known, we will express the critical value as a z-score.

    Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level. Since we are working with a 99% confidence level, the z-score critical value for this problem will be 2.576.

  • Find the margin of error (ME). We can compute the margin of error from the critical value (CV) and the standard deviation of the sampling distribution (SD):

    ME = CV * SD

    ME = 2.58 * 0.148 = 0.38

  • So, the margin of error is 0.38, when the confidence level is 99%.
  • Specify the confidence interval (CI). The range of the confidence interval is defined by the sample statistic ± margin of error. Here, the sample statistic is:

    x1 - x2 = $20 - $15 = $5

    So, the range of the confidence interval is:

    CI = statistic ± ME

    CI = 5 ± 0.38

    So, the difference in spending ranges from $4.62 to $5.38, with 99% confidence.