Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Mean Difference Between Matched Pairs

This lesson describes how to construct a confidence interval to estimate the mean difference between matched data pairs.

Estimation Requirements

The approach described in this lesson is valid whenever the following conditions are met:

  • The data set is a simple random sample of observations from the population of interest.
  • Each element of the population includes measurements on two paired variables (e.g., x and y) such that the paired difference between x and y is: d = x - y.
  • The sampling distribution of the mean difference between data pairs (d) is approximately normally distributed.

Generally, the sampling distribution will be approximately normally distributed if the sample is described by at least one of the following statements.

  • The population distribution of paired differences (i.e., the variable d) is normal.
  • The sample distribution of paired differences is symmetric, unimodal, without outliers, and the sample size is 15 or less.
  • The sample distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.

The Variability of the Mean Difference Between Matched Pairs

Suppose d is the mean difference between sample data pairs. To construct a confidence interval for d, we need to know how to compute the standard deviation or the standard error of the sampling distribution for d.

  • The standard deviation of the mean difference σd is:

    σd = σd * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }

    where σd is the standard deviation of the population difference, N is the population size, and n is the sample size. When the population size is much larger (at least 20 times larger) than the sample size, the standard deviation can be approximated by:

    σd = σd / sqrt( n )

  • When the standard deviation of the population σd is unknown, the standard deviation of the sampling distribution cannot be calculated. Under these circumstances, use the standard error. The standard error (SE) can be calculated from the equation below.

    SEd = sd * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }

    where sd is the standard deviation of the sample difference, N is the population size, and n is the sample size. When the population size is much larger (at least 20 times larger) than the sample size, the standard error can be approximated by:

    SEd = sd / sqrt( n )

Note: In real-world analyses, the standard deviation of the population is seldom known. Therefore, the standard error is used more often than the standard deviation.

Alert

The Advanced Placement Statistics Examination only covers the "approximate" formulas for the standard deviation and standard error.

σd = σd / sqrt( n )

SEd = sd / sqrt( n )

However, students are expected to be aware of the limitations of these formulas; namely, the approximate formulas should only be used when the population size is much larger than the sample size.

How to Find the Confidence Interval for Mean Difference With Paired Data

Previously, we described how to construct confidence intervals. For convenience, we repeat the key steps below.

  • Identify a sample statistic. Use the mean difference between sample data pairs (d) to estimate the mean difference between population data pairs (μd).
  • Select a confidence level. The confidence level describes the uncertainty of a sampling method. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
  • Find the margin of error. Previously, we showed how to compute the margin of error, based on the critical value and standard deviation.

    When the sample size is large, you can use a t statistic or a z-score for the critical value. Since it does not require computing degrees of freedom, the z-score is a little easier. When the sample sizes are small (less than 40), use a t score for the critical value. (For additional explanation, see choosing between a t statistic and a z-score.)

    If you use a t statistic, you will need to compute degrees of freedom (DF). In this case, the degrees of freedom is equal to the sample size minus one: DF = n - 1.
  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Test Your Understanding

Problem

Twenty-two students were randomly selected from a population of 1000 students. The sampling method was simple random sampling. All of the students were given a standardized English test and a standardized math test. Test results are summarized below.

Student English Math Diff, d (d - d)2
1 95 90 5 16
2 89 85 4 9
3 76 73 3 4
4 92 90 2 1
5 91 90 1 0
6 53 53 0 1
7 67 68 -1 4
8 88 90 -2 9
9 75 78 -3 16
10 85 89 -4 25
11 90 95 -5 36

Student English Math Diff, d (d - d)2
12 85 83 2 1
13 87 83 4 9
14 85 83 2 1
15 85 82 3 4
16 68 65 3 4
17 81 79 2 1
18 84 83 1 0
19 71 60 11 100
20 46 47 -1 4
21 75 77 -2 9
22 80 83 -3 16

Σ(d - d)2 = 270
d = 1

Find the 90% confidence interval for the mean difference between student scores on the math and English tests. Assume that the mean differences are approximately normally distributed.

Solution

The approach that we used to solve this problem is valid when the following conditions are met.

  • The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
  • The sampling distribution should be approximately normally distributed. The problem statement says that the differences were normally distributed; so this condition is satisfied.

Since the above requirements are satisfied, we can use the following four-step approach to construct a confidence interval.

  • Identify a sample statistic. Since we are trying to estimate a population mean difference in math and English test scores, we use the sample mean difference (d = 1) as the sample statistic.
  • Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 90% confidence level.
  • Find the margin of error. Elsewhere on this site, we show how to compute the margin of error when the sampling distribution is approximately normal. The key steps are shown below.
    • Find standard deviation or standard error. Since we do not know the standard deviation of the population, we cannot compute the standard deviation of the sample mean; instead, we compute the standard error (SE). Since the sample size is much smaller than the population size, we can use the approximation equation for the standard error.

      sd = sqrt [ (Σ(di - d)2 / (n - 1) ]

      sd = sqrt[ 270/(22-1) ]

      sd = sqrt(12.857) = 3.586

      SE = sd / sqrt( n )

      SE = 3.586 / [ sqrt(22) ]

      SE = 3.586/4.69 = 0.765

    • Find critical value. The critical value is a factor used to compute the margin of error. Because the sample size is small, we express the critical value as a t score rather than a z-score. (See how to choose between a t statistic and a z-score.) To find the critical value, we take these steps.
      • Compute alpha (α):

        α = 1 - (confidence level / 100)

        α = 1 - 90/100 = 0.10

      • Find the critical probability (p*):

        p* = 1 - α/2 = 1 - 0.10/2 = 0.95

      • Find the degrees of freedom (df):

        df = n - 1 = 22 - 1 = 21

      • The critical value is the t statistic having 21 degrees of freedom and a cumulative probability equal to 0.95. From the t Distribution Calculator, we find that the critical value is about 1.72.
      T Distribution Calculator
    • Compute margin of error (ME):

      ME = critical value * standard error

      ME = 1.72 * 0.765 = 1.3

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 90% confidence interval is -0.3 to 2.3 or 1 + 1.3.