Mean Difference Between Matched Pairs
This lesson describes how to construct a
confidence interval to estimate
the mean difference between matched
data pairs.
Estimation Requirements
The approach described in this lesson is valid whenever the
following conditions are met:
- The
sampling distribution of the mean difference between data pairs
(d) is approximately
normally distributed.
Generally, the sampling distribution will be approximately
normally distributed if the sample is described by at least
one of the following statements.
- The sample size is greater than 40, without outliers.
The Variability of the Mean Difference Between Matched Pairs
Suppose d is the mean difference
between sample data pairs. To construct a
confidence interval for d,
we need to know how to compute the
standard deviation
or the
standard error
of the
sampling distribution for d.
Note: In real-world analyses, the standard deviation of the
population is seldom known. Therefore, the standard error is used
more often than the standard deviation.
Alert
The Advanced Placement Statistics
Examination only covers the "approximate" formulas for the standard
deviation and standard error.
σ_{d} =
σ_{d} / sqrt( n )
SE_{d} =
s_{d} / sqrt( n )
However, students are expected to be
aware of the limitations of these formulas; namely, the
approximate formulas should only be used when the population
size is much larger than the sample size.
How to Find the Confidence Interval for Mean Difference With
Paired Data
Previously, we described
how to construct confidence intervals. For convenience, we
repeat the key steps below.
- Identify a sample statistic. Use the mean difference between
sample data pairs (d)
to estimate the mean difference between population data
pairs (μ_{d}).
- Select a confidence level. The confidence level describes the
uncertainty of a sampling
method. Often, researchers choose 90%, 95%, or 99% confidence
levels; but any percentage can be used.
- Find the margin of error. Previously, we showed
how to compute the margin of error, based on the
critical value and standard deviation.
When the sample size is large, you can use a t statistic or a
z-score
for the critical value.
Since it does not require computing degrees of freedom, the
z-score is a little easier. When the sample
sizes are small (less than 40), use a
t score
for the critical value.
(For additional explanation, see
choosing between a t statistic and a z-score.)
If you use a t statistic, you will need to compute
degrees of freedom (DF). In this case, the degrees
of freedom is equal to the sample size minus one:
DF = n - 1.
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Test Your Understanding
Problem
Twenty-two students were randomly selected from a population of
1000 students. The sampling method was simple random sampling.
All of the students were given a standardized English test and
a standardized math test. Test results are summarized below.
Student |
English |
Math |
Diff, d |
(d - d)^{2} |
1 |
95 |
90 |
5 |
16 |
2 |
89 |
85 |
4 |
9 |
3 |
76 |
73 |
3 |
4 |
4 |
92 |
90 |
2 |
1 |
5 |
91 |
90 |
1 |
0 |
6 |
53 |
53 |
0 |
1 |
7 |
67 |
68 |
-1 |
4 |
8 |
88 |
90 |
-2 |
9 |
9 |
75 |
78 |
-3 |
16 |
10 |
85 |
89 |
-4 |
25 |
11 |
90 |
95 |
-5 |
36 |
Student |
English |
Math |
Diff, d |
(d - d)^{2} |
12 |
85 |
83 |
2 |
1 |
13 |
87 |
83 |
4 |
9 |
14 |
85 |
83 |
2 |
1 |
15 |
85 |
82 |
3 |
4 |
16 |
68 |
65 |
3 |
4 |
17 |
81 |
79 |
2 |
1 |
18 |
84 |
83 |
1 |
0 |
19 |
71 |
60 |
11 |
100 |
20 |
46 |
47 |
-1 |
4 |
21 |
75 |
77 |
-2 |
9 |
22 |
80 |
83 |
-3 |
16 |
Σ(d - d)^{2} = 270
d = 1
Find the 90% confidence interval for the mean difference between
student scores on the math and English tests. Assume that the
mean differences are approximately normally distributed.
Solution
The approach that we used to solve this
problem is valid when the following conditions are met.
- The
sampling distribution
should be approximately normally distributed. The problem statement
says that the differences were normally distributed; so this
condition is satisfied.
Since the above requirements are satisfied, we can use the following
four-step approach to construct a confidence interval.
- Identify a sample statistic. Since we are trying to estimate
a population mean difference in math and English test scores,
we use the sample mean difference
(d = 1) as the sample
statistic.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 90%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard deviation or standard error. Since we do not
know the standard deviation of the population, we cannot compute the
standard deviation of the sample mean; instead, we compute the standard
error (SE). Since the sample size is much smaller than the
population size, we can use the approximation equation for the
standard error.
s_{d} = sqrt [
(Σ(d_{i} - d)^{2}
/ (n - 1) ]
s_{d} = sqrt[ 270/(22-1) ]
s_{d} = sqrt(12.857) = 3.586
SE =
s_{d} / sqrt( n )
SE = 3.586 / [ sqrt(22) ]
SE = 3.586/4.69 = 0.765
- Find critical value. The critical value is a factor used to
compute the margin of error. Because the sample size is small, we express the critical
value as a
t score
rather than a
z-score.
(See how to choose between a t statistic and a z-score.)
To find the critical value, we take these steps.
- Compute margin of error (ME):
ME = critical value * standard error
ME = 1.72 * 0.765 = 1.3
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Therefore, the 90% confidence interval is -0.3 to 2.3 or 1 + 1.3.