Confidence Interval: Difference Between Means

This lesson describes how to construct a confidence interval for the difference between two means.

Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

Generally, the sampling distribution will be approximately normally distributed if each sample is described by at least one of the following statements.

  • The population distribution is normal.
  • The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.
  • The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.

The Variability of the Difference Between Sample Means

To construct a confidence interval, we need to know the variability of the difference between sample means. This means we need to know how to compute the standard deviation of the sampling distribution of the difference.

  • If the population standard deviations are known, the standard deviation of the sampling distribution is:

    σx1-x2 = sqrt [ σ21 / n1 + σ22 / n2 ]

    where σ1 is the standard deviation of the population 1, σ2 is the standard deviation of the population 2, and n1 is the size of sample 1, and n2 is the size of sample 2.

  • When the standard deviation of either population is unknown and the sample sizes (n1 and n2) are large, the standard deviation of the sampling distribution can be estimated by the standard error, using the equation below.

    SEx1-x2 = sqrt [ s21 / n1 + s22 / n2 ]

    where SE is the standard error, s1 is the standard deviation of the sample 1, s2 is the standard deviation of the sample 2, and n1 is the size of sample 1, and n2 is the size of sample 2.

Note: In real-world analyses, the standard deviation of the population is seldom known. Therefore, SEx1-x2 is used more often than σx1-x2.

Alert

Some texts present additional options for calculating standard deviations. These formulas, which should only be used under special circumstances, are described below.

  • Standard deviation. Use this formula when the population standard deviations are known and are equal.
    σx1 - x2 = σd = σ * sqrt[ (1 / n1) + (1 / n2)]       where σ = σ1 = σ2

  • Pooled standard deviation. Use this formula when the population standard deviations are unknown, but assumed to be equal; and the samples sizes (n1) and (n2) are small (under 30).
    SDpooled = sqrt{ [ (n1 -1) * s12) + (n2 -1) * s22) ] / (n1 + n2 - 2) }       where σ1 = σ2

Remember, these two formulas should be used only when the various required underlying assumptions are justified.

How to Find the Confidence Interval for the Difference Between Means

Previously, we described how to construct confidence intervals. For convenience, we repeat the key steps below.

  • Identify a sample statistic. Use the difference between sample means to estimate the difference between population means.

  • Select a confidence level. The confidence level describes the uncertainty of a sampling method. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.

  • Find the margin of error. Previously, we showed how to compute the margin of error, based on the critical value and standard deviation.

    When the sample size is large, you can use a t score or a z score for the critical value. Since it does not require computing degrees of freedom, the z score is a little easier. When the sample sizes are small (less than 40), use a t score for the critical value.

    If you use a t score, you will need to compute degrees of freedom (DF). Here's how.

    • The following formula is appropriate whenever a t score is used to analyze the difference between means.

      DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }

    • If you are working with a pooled standard deviation (see above), DF = n1 + n2 - 2.

    The next section presents sample problems that illustrate how to use z scores and t scores as critical values.

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Test Your Understanding of This Lesson

Problem 1: Small Samples

Suppose that simple random samples of college freshman are selected from two universities - 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90.

What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)

(A) 50 + 1.70
(B) 50 + 28.49
(C) 50 + 32.74
(D) 50 + 55.66
(E) None of the above

Solution

The correct answer is (D). The approach that we used to solve this problem is valid when the following conditions are met.

  • The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
  • The samples must be independent. Since responses from one sample did not affect responses from the other sample, the samples are independent.
  • The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

  • Identify a sample statistic. Since we are trying to estimate the difference between population means, we choose the difference between sample means as the sample statistic. Thus, x1 - x2 = 1000 - 950 = 50.

  • Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 90% confidence level.

  • Find the margin of error. Elsewhere on this site, we show how to compute the margin of error when the sampling distribution is approximately normal. The key steps are shown below.

    • Find standard error. Using the sample standard deviations, we compute the standard error (SE), which is an estimate of the standard deviation of the difference between sample means.

      SE = sqrt [ s21 / n1 + s22 / n2 ]
      SE = sqrt [(100)2 / 15 + (90)2 / 20]
      SE = sqrt (10,000/15 + 8100/20) = sqrt(666.67 + 405) = 32.74

    • Find critical value. The critical value is a factor used to compute the margin of error. Because the sample sizes are small, we express the critical value as a t score rather than a z score. To find the critical value, we take these steps.

      • Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 90/100 = 0.10
      • Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.10/2 = 0.95
      • Find the degrees of freedom (df):

        DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
        DF = (1002/15 + 902/20)2 / { [ (1002 /15)2 / 14 ] + [ (902 /20)2 / 19 ] }
        DF = (666.67 + 405}2 / (31746.03 + 8632.89) = 1150614.5 / 40378.92 = 28.495

        Rounding off to the nearest whole number, we conclude that there are 28 degrees of freedom.
      • The critical value is the t score having 28 degrees of freedom and a cumulative probability equal to 0.95. From the t Distribution Calculator, we find that the critical value is 1.7.

    • Compute margin of error (ME): ME = critical value * standard error = 1.7 * 32.74 = 55.66

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 90% confidence interval is -5.66 to 100.66. That is, we are 99% confident that the true difference in population means is in the range defined by 50 + 55.66.

 

Problem 2: Large Samples

The local baseball team conducts a study to find the amount spent on refreshments at the ball park. Over the course of the season they gather simple random samples of 50 men and 100 women. For men, the average expenditure was $20, with a standard deviation of $3. For women, it was $15, with a standard deviation of $2.

What is the 99% confidence interval for the spending difference between men and women? Assume that the two populations are independent and normally distributed.

(A) $5 + $0.47
(B) $5 + $1.21
(C) $5 + $2.58
(D) $5 + $5.00
(E) None of the above

Solution

The correct answer is (B). The approach that we used to solve this problem is valid when the following conditions are met.

  • The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
  • The samples must be independent. Again, the problem statement satisfies this condition.
  • The sampling distribution should be approximately normally distributed. The problem states that test scores in each population are normally distributed, so the difference between test scores will also be normally distributed.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

  • Identify a sample statistic. Since we are trying to estimate the difference between population means, we choose the difference between sample means as the sample statistic. Thus, x1 - x2 = $20 - $15 = $5.

  • Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.

  • Find the margin of error. Elsewhere on this site, we show how to compute the margin of error when the sampling distribution is approximately normal. The key steps are shown below.

    • Find standard error. The standard error is an estimate of the standard deviation of the difference between population means. We use the sample standard deviations to estimate the standard error (SE).

      SE = sqrt [ s21 / n1 + s22 / n2 ]
      SE = sqrt [(3)2 / 50 + (2)2 / 100] = sqrt (9/50 + 4/100) = sqrt(0.18 + 0.04) = 0.47

    • Find critical value. The critical value is a factor used to compute the margin of error. Because the sample sizes are large enough, we express the critical value as a z score. To find the critical value, we take these steps.

      • Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 99/100 = 0.01
      • Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.01/2 = 0.995
      • The critical value is the z score having a cumulative probability equal to 0.995. From the Normal Distribution Calculator, we find that the critical value is 2.58.

    • Compute margin of error (ME): ME = critical value * standard error = 2.58 * 0.47 = 1.21

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 99% confidence interval is $3.79 to $6.21. That is, we are 99% confident that men outspend women at the ballpark by about $5 + $1.21.