One-Factor Repeated Measures: Example
This lesson shows how to use analysis of variance to analyze and interpret data from a one-factor, repeated measures experiment.
To illustrate the process, we walk step-by-step through a real-world example.
Computations for analysis of variance are usually handled by a software package.
For this example, however, we will do the computations "manually", since the gory details have educational value.
Note: A one-factor, repeated measures experiment is a type of
randomized block experiment,
Specifically, it is a randomized block experiment in which each experimental unit serves as a
blocking variable.
As a result, the computations required for analysis of variance with a one-factor, repeated measures experiment are identical
to the computations for a one-factor, randomized block experiment (see
Randomized Block Experiment: Example).
Problem Statement
As part of a repeated measures experiment, a researcher tests the effect of three treatments on
short-term cognitive performance. Each treatment is administered in pill form. The first treatment (T1)
is a placebo; the second treatment (T2) is an herbal relaxant; and the third treatement (T3) is an herbal
stimulant. The researcher randomly selects six subjects to participate in the experiment.
Using human subjects as experimental units, the researcher conducts this experiment over a three-day period.
Each day, each subject receives a different treatment.
After each treatment, subjects complete a memory test. Test scores for each subject
following each treatment are shown in the table below:
Table 1. Dependent Variable Scores
Subject |
Test score |
T1 |
T2 |
T3 |
S1 |
87 |
85 |
87 |
S2 |
84 |
84 |
85 |
S3 |
83 |
84 |
84 |
S4 |
82 |
82 |
83 |
S5 |
81 |
82 |
83 |
S6 |
80 |
80 |
82 |
In conducting this experiment, the researcher has one main research question:
Does the treatment have a significant effect on cognitive performance (as measured by test score)?
What About Order Effects?
Repeated measures experiments have a potential problem: vulnerability to order effects (e.g., fatigue, learning)
that can affect subject performance. To control for order effects, researchers vary the order
in which treatment levels are administered (e.g., randomizing or reversing the order of treatments among experimental units).
With the present experiment, for example, there are six possible sequences in which treatments can be administered:
T1, T2, T3 |
T1, T3, T2 |
T2, T1, T3 |
T2, T3, T1 |
T3, T2, T1 |
T3, T1, T2 |
Since there are also six subjects, it would make sense in this experiment to randomly assign a different treatment
sequence to each subject. By balancing treatment sequences across subjects, you can control order effects.
Analytical Logic
To implement analysis of variance with a repeated measures experiment,
a researcher takes the following steps:
- Specify a mathematical model to describe how treatment effects and subject effects influence the dependent variable.
- Write statistical hypotheses to be tested by experimental data.
- Specify a significance level for a hypothesis test.
- Compute the grand mean and marginal means for the treatment and for subjects.
- Compute sums of squares for each effect in the model.
- Find the degrees of freedom associated with each effect in the model.
- Based on sums of squares and degrees of freedom, compute mean squares for each effect in the model.
- Find the expected value of the mean squares for each effect in the model.
- Compute a test statistic for the treatment effect and a test statistic for the subject effect,
based on observed mean squares and their expected values.
- Find the P value for each test statistic.
- Accept or reject null hypotheses, based on P value and significance level.
Below, we'll explain how to implement each step in the analysis.
Mathematical Model
For every experimental design, there is a mathematical model that accounts for all of the
independent and extraneous variables that affect the dependent variable.
Here is a mathematical model for a single-factor, repeated measures experiment:
X i j = μ + π i + τ j + ε ij
where X i j is the dependent variable score (in this example, the test score) for subject i under treatment j,
μ is the population mean,
π i is the effect of subject i;
τ j is the effect of treatment j;
and ε ij is the experimental error (i.e., the effect of all other extraneous variables).
For this model, it is assumed that ε ij is normally and independently
distributed with a mean of zero and a variance of σε2.
The mean ( μ ) is constant.
Statistical Hypotheses
With a single-factor, repeated measures experiment, it is possible to test both subject ( π i ) and treatment ( τ j ) effects.
Here are the null hypotheses (H0) and
alternative hypotheses (H1) for
each effect.
H0: π i = 0 for all i
H1: π i ≠ 0 for some i
H0: τ j = 0 for all j
H1: τ j ≠ 0 for some j
With a repeated measures experiment, the main hypothesis test of interest is the test of the treatment effect(s). For instance,
in this example the experimenter is primarily interested in the effect of a pill (placebo, relaxant, or stimulant) on student performance (i.e., test score).
Subject effects are less interesting, since we expect subjects to bring individual differences to the experiment.
We would be surprised if we didn't find significant differences between subjects.
Significance Level
The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it
is actually true. The significance level for an experiment is specified by the experimenter, before data collection
begins. Experimenters often choose significance levels of 0.05 or 0.01. For this experiment, we'll assume that the
experimenter chose 0.05 as the significance level.
A significance level of 0.05 means that there is a 5% chance of rejecting the null hypothesis
when it is true. A significance level of 0.01 means that there is a 1% chance of rejecting the null hypothesis
when it is true. The lower the significance level, the more persuasive the evidence needs to be
before an experimenter can reject the null hypothesis.
Mean Scores
Analysis of variance for a repeated measures experiment begins by computing a grand mean and
marginal means for the treatment and for subjects.
Here are computations for the various means, based on dependent variable scores from Table 1:
- Grand mean. The grand mean (X) is the mean of all observations,
computed as follows:
N = nk = 6 * 3 = 18
X = ( 1 / N )
nΣi=1
kΣj=1
( X
i j )
X = ( 1 / 18 )
6Σi=1
3Σj=1
( X
i j )
X = 83.222
- Marginal means for treatment levels. The mean for treatment level j
( X . j ) is computed as follows:
X . j = ( 1 / n )
nΣi=1
( X
i j )
X . 1 = ( 1 / 6 )
6Σi=1
( X
i 1 ) = 82.833
X . 2 = ( 1 / 6 )
6Σi=1
( X
i 2 ) = 82.833
X . 3 = ( 1 / 6 )
6Σi=1
( X
i 3 ) = 84
- Marginal means for subjects. The mean for subject i
( X i . ) is computed as follows:
X i . = ( 1 / k )
kΣj=1
( X
i j )
X 1 . = ( 1 / 3 )
3Σj=1
( X
1 j ) = 86.333
X 2 . = ( 1 / 3 )
3Σj=1
( X
2 j ) = 84.333
X 3 . = ( 1 / 3 )
3Σj=1
( X
3 j ) = 83.667
X 4 . = ( 1 / 3 )
3Σj=1
( X
4 j ) = 82.333
X 5 . = ( 1 / 3 )
3Σj=1
( X
5 j ) = 82.000
X 6 . = ( 1 / 3 )
3Σj=1
( X
6 j ) = 80.667
In the equations above, N is the total sample size (18);
n is the number of subjects (6), and
k is the number of treatment levels (3).
Sums of Squares
A sum of squares is the sum of squared deviations from a mean score. The single-factor repeate measures design in this experiment makes use of four sums of squares:
- Sum of squares for treatments. The sum of squares for treatments (SSTR) measures variation of the marginal means
of treatment levels ( X j )
around the grand mean ( X ). It can be computed from the following formula:
SSTR = n
kΣj=1
(
X j -
X )
2
SSTR = 6
3Σj=1
(
X j - 83.222 )
2
= 5.444
- Sum of squares for subjects. The sum of squares for subjects (SSS) measures variation of the marginal means
of subjects ( X i )
around the grand mean ( X ). It can be computed from the following formula:
SSS = k
nΣi=1
(
X i -
X )
2
SSS = 3
6Σi=1
(
X i - 83.222 )
2
= 59.778
- Error sum of squares. The error sum of squares (SSE) measures variation of all scores
( X i j ) attributable to extraneous variables.
It can be computed from the following formula:
SSE =
nΣi=1
kΣj=1
( X
i j
-
X i
-
X j
+
X )
2
SSE =
6Σi=1
3Σj=1
( X
i j
-
X i
-
X j
+ 83.222 )
2
= 3.889
- Total sum of squares. The total sum of squares (SST) measures variation of all scores
( X i j ) around the grand mean
( X ).
It can be computed from the following formula:
SST =
nΣi=1
kΣj=1
( X
i j -
X )
2
SST =
6Σi=1
3Σj=1
( X
i j - 83.222 )
2
= 69.111
In the formulas above, n is the number of subjects, and k is the number of treatment levels.
And the total sum of squares is equal to the sum of the component sums of squares, as shown below:
SST = SSTR + SSS + SSE
SST = 5.444 + 59.778 + 3.889 = 69.111
Degrees of Freedom
The term degrees of freedom (df) refers to the number of independent sample points used to compute a
statistic minus the number of
parameters estimated from the sample points.
The degrees of freedom used to compute the various sums of squares for a single-factor, repeated measures experiment
are shown in the table below:
Sum of squares |
Degrees of freedom |
Treatment |
k - 1 = 2 |
Subject |
n - 1 = 5 |
Error |
( k - 1 )( n - 1 ) = 10 |
Total |
nk - 1 = 17 |
Notice that there is an additive relationship between the various sums of squares. The degrees of freedom
for total sum of squares (dfTOT) is equal to the degrees of freedom for the treatment sum of squares (dfTR) plus
the degrees of freedom for the subjects sum of squares (dfS) plus
the degrees of freedom for the error sum of squares (dfE). That is,
dfTOT = dfTR + dfS + dfE
dfTOT = 2 + 5 + 10 = 17
Mean Squares
A mean square is an estimate of population variance. It is computed by dividing
a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:
MS = SS / df
To conduct analysis of variance with a single-factor, repeated measures experiment, we are interested in three mean squares:
Expected Value
The expected value
of a mean square is the average value of the mean square over a large number of experiments.
Statisticians have derived formulas for the expected value of mean squares, assuming the
mathematical model described earlier is correct. Those formulas
appear below:
Mean square |
Expected value |
MST |
σ2E + nσ2T |
MSS |
σ2E + kσ2S |
MSE |
σ2E |
In the table above, MST is the mean square for treatments; MSS is the mean square for subjects;
and MSE is the error mean square.
Test Statistics
The main data analysis goal for this experiment is to test the hypotheses that we stated earlier
(see Statistical Hypotheses).
That will require the use of test statistics. Let's talk about how to compute test
statistics for this study and how to interpret the statistics we compute.
How to Compute Test Statistics
Suppose we want to test the significance of treatment levels or of subjects in a
single-factor, repeated measures experiment. We can use the mean squares to define a test statistic F
for each source of variation, as shown in the table below:
Source |
Mean square: Expected value |
F ratio |
Treatment (T) |
σ2E + nσ2T |
|
Subjects (S) |
σ2E + kσ2B |
|
Error |
σ2E |
|
Using formulas from the table with data from this repeated measures experiment,
we can compute an F ratio for treatments ( FT ) and an F ratio for subjects ( FS ).
FT = MST / MSE = 2.722/0.389 = 7.0
FS = MSS / MSE = 11.956/0.389 = 30.7
How to Interpret Test Statistics
Consider the F ratio for the treatment effect in this repeated measures experiment. For convenience,
we display once again the table that shows expected mean squares and F ratio formulas:
Source |
Mean square: Expected value |
F ratio |
Treatment (T) |
σ2E + nσ2T |
|
Subjects (S) |
σ2E + kσ2S |
|
Error |
σ2E |
|
Notice that numerator of the F ratio for the treatment effect should equal the denominator
when the variation due to the treatment ( σ2 T ) is zero (i.e., when the treatment does not affect the
dependent variable). And the numerator should be bigger than the denominator
when the variation due to the treatment is not zero (i.e., when the treatment does affect the
dependent variable).
The F ratio for subjects works the same way. When subject differences do not affect the dependent variable,
the numerator of the F ratio should equal the denominator. Otherwise, the numerator should be bigger than the denominator.
Each F ratio is a convenient measure that we can use to test the null hypothesis about
the effect of a source (the treatment or the subjects) on the dependent variable. Here's how
to conduct the test:
- When the F ratio is close to one, the numerator of the F ratio is approximately equal to the denominator.
This indicates that the source did not affect the dependent variable, so we cannot
reject the null hypothesis.
- When the F ratio is significantly greater than one, the numerator is bigger than the denominator.
This indicates that the source did affect the dependent variable, so we must reject the null hypothesis.
What does it mean for the F ratio to be significantly greater than one?
To answer that question, we need to talk about the P-value.
P-Value
In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome,
assuming the null hypothesis is true.
With analysis of variance for a repeated measures experiment, the F ratios are the observed experimental outcomes that we are interested in.
So, the P-value would be the probability that an F ratio would be more extreme (i.e., bigger) than the
actual F ratio computed from experimental data.
How does an experimenter attach a probability to an observed F ratio?
Luckily, the F ratio is a random variable
that has an F distribution.
The degrees of freedom (v1 and v2) for the F ratio are the degrees of freedom associated with the mean squares
used to compute the F ratio.
For example, consider the F ratio for a treatment effect. That F ratio ( FT ) is computed from
the following formula:
FT = F(v1, v2) = MST / MSE
MST (the numerator in the formula) has degrees of freedom equal to dfTR ; so for FT , v1 is equal to
dfTR . Similarly, MSE (the denominator in the formula) has degrees of freedom equal to dfE ; so
for FT , v2 is equal to dfE .
Knowing the F ratio and its degrees of freedom, we can use an F table or
Stat Trek's free F distribution calculator to find the probability that
an F ratio will be bigger than the actual F ratio observed in the experiment.
To illustrate the process, let's find P-values for the treatment effect and for the subject effect in this
repeated measures experiment.
Treatment Variable P-Value
From previous computations, we know the following:
Therefore, the P-value we are looking for is the probability that an F with 2 and 10 degrees of freedom is greater than
7. We want to know:
P [ F(2, 10) > 7 ]
Now, we are ready to use the F Distribution Calculator.
We enter the degrees of freedom (v1 = 2) for the treatment mean square,
the degrees of freedom (v2 = 10) for the error mean square, and the F value (7) into the calculator;
and hit the Calculate button.
The calculator reports that the probability that F is less than or equal to 7 is 0.99. Therefore, the probability that
F is greater than 7 equals 1 minus 0.99 or 0.01. Hence, the correct P-value for the treatment variable is 0.01.
P-Value for Subjects
The process to compute the P-value for subjects is exactly the same as the process used
for the treatment variable. From previous computations, we know the following:
Therefore, the P-value we are looking for is the probability that an F with 5 and 10 degrees of freedom is greater than
30.7. We want to know:
P [ F(5, 10) > 30.7 ]
Now, we are ready to use the F Distribution Calculator.
We enter the degrees of freedom (v1 = 5) for the subjects mean square,
the degrees of freedom (v2 = 10) for the error mean square, and the F value (33) into the calculator;
and hit the Calculate button.
The calculator reports that the probability that F is less than or equal to 30.7 is 0.99999. Therefore, the probability that
F is greater than 30.7 equals 1 minus 0.99999 or 0.00001. Hence, the correct P-value is 0.00001.
Interpretation of Results
Having completed the computations for analysis, we are ready to interpret results. We begin by displaying key findings in
an ANOVA summary table. Then, we use those findings to test the hypothesis that there is no significant difference
between treatment levels.
ANOVA Summary Table
It is traditional to summarize ANOVA results in an analysis of variance table. Here,
filled with key results, is the analysis of variance table for the repeated measures experiment
that we have been working on.
Analysis of Variance Table
Source |
SS |
df |
MS |
F |
P |
Treatment |
5.44 |
2 |
2.72 |
7.0 |
0.01 |
Subjects |
59.78 |
5 |
11.96 |
30.7 |
<0.01 |
Error |
3.89 |
10 |
0.39 |
|
|
Total |
69.11 |
17 |
|
|
|
Recall that the experimenter specified a significance level of 0.05 for this study.
Once you know the significance level and the P-values, the hypothesis tests are routine.
The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the
F ratio shown in the table, assuming the null hypothesis is true. When a P-value for the treatment effect or the subject effect is bigger
than the significance level, we accept the null hypothesis for the effect; when it is smaller, we reject the null hypothesis.
Based on the P-values in the table above, we can draw the following conclusions:
- The P-value for treatments (i.e., the independent variable) is 0.01. Since the P-value is smaller than the significance level (0.05),
we reject the null hypothesis that the independent variable (the type of pill given to the subject) has no effect on the dependent variable.
- The P-value for subjects is less than 0.01. Since this P-value is also smaller than the significance level (0.05),
we reject the null hypothesis that subjects had no effect on the dependent variable.
What About Sphericity?
In the
previous lesson,
we noted the importance of the sphericity assumption for analysis of variance with
repeated measures experiments. Specifically, we noted that violations of the
sphericity assumption
increase the likelihood of making a Type I error (i.e., rejecting the null hypothesis when it is, in fact, true).
Whenever a standard analysis of variance for a repeated measures experiment leads you to reject the
null hypothesis, it is important to consider the role of sphericity in your analysis. You need to
(a) test for a violation of the sphericity assumption and/or (b) adjust the analysis to account
for sphericity effects.
You may have noticed that we rejected the null hypothesis in the sample problem for this lesson,
but we failed to address sphericity in any way. So our work on the sample problem is not finished.
We'll learn more about sphericity, and we'll complete the sample problem in the
next lesson.