Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


One-Factor Repeated Measures ANOVA: Example

This lesson shows how to use analysis of variance to analyze and interpret data from a one-factor, repeated measures experiment. To illustrate the process, we walk step-by-step through a real-world example.

Computations for analysis of variance are usually handled by a software package. For this example, however, we will do the computations "manually", since the gory details have educational value.

Note: A one-factor, repeated measures experiment is a type of randomized block experiment, Specifically, it is a randomized block experiment in which each experimental unit serves as a blocking variable. As a result, the computations required for analysis of variance with a one-factor, repeated measures experiment are identical to the computations for a one-factor, randomized block experiment (see Randomized Block Experiment: Example).

Problem Statement

As part of a repeated measures experiment, a researcher tests the effect of three treatments on short-term cognitive performance. Each treatment is administered in pill form. The first treatment (T1) is a placebo; the second treatment (T2) is an herbal relaxant; and the third treatement (T3) is an herbal stimulant. The researcher randomly selects six subjects to participate in the experiment.

Using human subjects as experimental units, the researcher conducts this experiment over a three-day period. Each day, each subject receives a different treatment. After each treatment, subjects complete a memory test. Test scores for each subject following each treatment are shown in the table below:

Table 1. Dependent Variable Scores

Subject Test score
T1 T2 T3
S1 87 85 87
S2 84 84 85
S3 83 84 84
S4 82 82 83
S5 81 82 83
S6 80 80 82

In conducting this experiment, the researcher has one main research question: Does the treatment have a significant effect on cognitive performance (as measured by test score)?

What About Order Effects?

Repeated measures experiments have a potential problem: vulnerability to order effects (e.g., fatigue, learning) that can affect subject performance. To control for order effects, researchers vary the order in which treatment levels are administered (e.g., randomizing or reversing the order of treatments among experimental units).

With the present experiment, for example, there are six possible sequences in which treatments can be administered:

T1, T2, T3 T1, T3, T2 T2, T1, T3
T2, T3, T1 T3, T2, T1 T3, T1, T2

Since there are also six subjects, it would make sense in this experiment to randomly assign a different treatment sequence to each subject. By balancing treatment sequences across subjects, you can control order effects.

Note: The sample size for a perfectly-balanced repeated measures experiment should be a multiple of the number of possible treatment sequences. Thus,

ntotal = n * r

where ntotal is the total sample size for the experiment, n is the number of subjects that see a particular treatment sequence, and r is the number of treatment sequences.

Analytical Logic

To implement analysis of variance with a repeated measures experiment, a researcher takes the following steps:

  • Specify a mathematical model to describe how treatment effects and subject effects influence the dependent variable.
  • Write statistical hypotheses to be tested by experimental data.
  • Specify a significance level for a hypothesis test.
  • Compute the grand mean and marginal means for the treatment and for subjects.
  • Compute sums of squares for each effect in the model.
  • Find the degrees of freedom associated with each effect in the model.
  • Based on sums of squares and degrees of freedom, compute mean squares for each effect in the model.
  • Find the expected value of the mean squares for each effect in the model.
  • Compute a test statistic for the treatment effect and a test statistic for the subject effect, based on observed mean squares and their expected values.
  • Find the P value for each test statistic.
  • Accept or reject null hypotheses, based on P value and significance level.

Below, we'll explain how to implement each step in the analysis.

Mathematical Model

For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable. Here is a mathematical model for a single-factor, repeated measures experiment:

X i j = μ + π i + τ j + ε ij

where X i j is the dependent variable score (in this example, the test score) for subject i under treatment j, μ is the population mean, π i is the effect of subject i; τ j is the effect of treatment j; and ε ij is the experimental error (i.e., the effect of all other extraneous variables).

For this model, it is assumed that ε ij is normally and independently distributed with a mean of zero and a variance of σε2. The mean ( μ ) is constant.

Statistical Hypotheses

With a single-factor, repeated measures experiment, it is possible to test both subject ( π i ) and treatment ( τ j ) effects. Here are the null hypotheses (H0) and alternative hypotheses (H1) for each effect.

H0: π i = 0 for all i

H1: π i ≠ 0 for some i

H0: τ j = 0 for all j

H1: τ j ≠ 0 for some j

With a repeated measures experiment, the main hypothesis test of interest is the test of the treatment effect(s). For instance, in this example the experimenter is primarily interested in the effect of a pill (placebo, relaxant, or stimulant) on student performance (i.e., test score).

Subject effects are less interesting, since we expect subjects to bring individual differences to the experiment. We would be surprised if we didn't find significant differences between subjects.

Significance Level

The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins. Experimenters often choose significance levels of 0.05 or 0.01. For this experiment, we'll assume that the experimenter chose 0.05 as the significance level.

A significance level of 0.05 means that there is a 5% chance of rejecting the null hypothesis when it is true. A significance level of 0.01 means that there is a 1% chance of rejecting the null hypothesis when it is true. The lower the significance level, the more persuasive the evidence needs to be before an experimenter can reject the null hypothesis.

Mean Scores

Analysis of variance for a repeated measures experiment begins by computing a grand mean and marginal means for the treatment and for subjects. Here are computations for the various means, based on dependent variable scores from Table 1:

  • Grand mean. The grand mean (X) is the mean of all observations, computed as follows:
    N = nk = 6 * 3 = 18
    X = ( 1 / N )
    nΣi=1
    kΣj=1
    ( X i j )
    X = ( 1 / 18 )
    6Σi=1
    3Σj=1
    ( X i j )

    X = 83.222

  • Marginal means for treatment levels. The mean for treatment level jX . j ) is computed as follows:
    X . j  = ( 1 / n )
    nΣi=1
    ( X i j )
    X . 1  = ( 1 / 6 )
    6Σi=1
    ( X i 1 ) = 82.833
    X . 2  = ( 1 / 6 )
    6Σi=1
    ( X i 2 ) = 82.833
    X . 3  = ( 1 / 6 )
    6Σi=1
    ( X i 3 ) = 84
  • Marginal means for subjects. The mean for subject iX i . ) is computed as follows:
    X i .  = ( 1 / k )
    kΣj=1
    ( X i j )
    X 1 .  = ( 1 / 3 )
    3Σj=1
    ( X 1 j ) = 86.333
    X 2 .  = ( 1 / 3 )
    3Σj=1
    ( X 2 j ) = 84.333
    X 3 .  = ( 1 / 3 )
    3Σj=1
    ( X 3 j ) = 83.667
    X 4 .  = ( 1 / 3 )
    3Σj=1
    ( X 4 j ) = 82.333
    X 5 .  = ( 1 / 3 )
    3Σj=1
    ( X 5 j ) = 82.000
    X 6 .  = ( 1 / 3 )
    3Σj=1
    ( X 6 j ) = 80.667

In the equations above, N is the total sample size (18); n is the number of subjects (6), and k is the number of treatment levels (3).

Sums of Squares

A sum of squares is the sum of squared deviations from a mean score. The single-factor repeate measures design in this experiment makes use of four sums of squares:

  • Sum of squares for treatments. The sum of squares for treatments (SSTR) measures variation of the marginal means of treatment levels ( X j ) around the grand mean ( X ). It can be computed from the following formula:
    SSTR = n
    kΣj=1
    X j - X )2
    SSTR = 6
    3Σj=1
    X j - 83.222 )2 = 5.444
  • Sum of squares for subjects. The sum of squares for subjects (SSS) measures variation of the marginal means of subjects ( X i ) around the grand mean ( X ). It can be computed from the following formula:
    SSS = k
    nΣi=1
    X i - X )2
    SSS = 3
    6Σi=1
    X i - 83.222 )2 = 59.778
  • Error sum of squares. The error sum of squares (SSE) measures variation of all scores ( X i j ) attributable to extraneous variables. It can be computed from the following formula:
    SSE =
    nΣi=1
    kΣj=1
    ( X i j  - X i  - X j  + X )2
    SSE =
    6Σi=1
    3Σj=1
    ( X i j  - X i  - X j  + 83.222 )2 = 3.889
  • Total sum of squares. The total sum of squares (SST) measures variation of all scores ( X i j ) around the grand mean ( X ). It can be computed from the following formula:
    SST =
    nΣi=1
    kΣj=1
    ( X i j  - X )2
    SST =
    6Σi=1
    3Σj=1
    ( X i j  - 83.222 )2 = 69.111

In the formulas above, n is the number of subjects, and k is the number of treatment levels. And the total sum of squares is equal to the sum of the component sums of squares, as shown below:

SST = SSTR + SSS + SSE

SST = 5.444 + 59.778 + 3.889 = 69.111

Degrees of Freedom

The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.

The degrees of freedom used to compute the various sums of squares for a single-factor, repeated measures experiment are shown in the table below:

Sum of squares Degrees of freedom
Treatment k - 1 = 2
Subject n - 1 = 5
Error ( k - 1 )( n - 1 ) = 10
Total nk - 1 = 17

Notice that there is an additive relationship between the various sums of squares. The degrees of freedom for total sum of squares (dfTOT) is equal to the degrees of freedom for the treatment sum of squares (dfTR) plus the degrees of freedom for the subjects sum of squares (dfS) plus the degrees of freedom for the error sum of squares (dfE). That is,

dfTOT = dfTR + dfS + dfE

dfTOT = 2 + 5 + 10 = 17

Mean Squares

A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:

MS = SS / df

To conduct analysis of variance with a single-factor, repeated measures experiment, we are interested in three mean squares:

  • Treatment mean square. The treatment mean square ( MST ) measures variation due to treatment levels. It can be computed as follows:

    MST = SSTR / dfTR

    MST = 5.444 / 2 = 2.722

  • Subjects mean square. The subjects mean square ( MSS ) measures variation due to subjects. It can be computed as follows:

    MSS = SSS / dfS

    MSS = 59.778 / 5 = 11.956

  • Error mean square. The error mean square ( MSE ) measures variation due to extraneous variables (anything other than the treatment or subjects). The error mean square can be computed as follows:

    MSE = SSE / dfE

    MSE = 3.889 / 10 = 0.389

Expected Value

The expected value of a mean square is the average value of the mean square over a large number of experiments.

Statisticians have derived formulas for the expected value of mean squares, assuming the mathematical model described earlier is correct. Those formulas appear below:

Mean square Expected value
MST σ2E + nσ2T
MSS σ2E + kσ2S
MSE σ2E

In the table above, MST is the mean square for treatments; MSS is the mean square for subjects; and MSE is the error mean square.

Test Statistics

The main data analysis goal for this experiment is to test the hypotheses that we stated earlier (see Statistical Hypotheses). That will require the use of test statistics. Let's talk about how to compute test statistics for this study and how to interpret the statistics we compute.

How to Compute Test Statistics

Suppose we want to test the significance of treatment levels or of subjects in a single-factor, repeated measures experiment. We can use the mean squares to define a test statistic F for each source of variation, as shown in the table below:

Source Mean square:
Expected value
F ratio
Treatment (T) σ2E + nσ2T
MST

MSE
Subjects (S) σ2E + kσ2B
MSS

MSE
Error σ2E  

Using formulas from the table with data from this repeated measures experiment, we can compute an F ratio for treatments ( FT ) and an F ratio for subjects ( FS ).

FT = MST / MSE = 2.722/0.389 = 7.0

FS = MSS / MSE = 11.956/0.389 = 30.7

How to Interpret Test Statistics

Consider the F ratio for the treatment effect in this repeated measures experiment. For convenience, we display once again the table that shows expected mean squares and F ratio formulas:

Source Mean square:
Expected value
F ratio
Treatment (T) σ2E + nσ2T
MST

MSE
Subjects (S) σ2E + kσ2S
MSS

MSE
Error σ2E  

Notice that numerator of the F ratio for the treatment effect should equal the denominator when the variation due to the treatment ( σ2 T ) is zero (i.e., when the treatment does not affect the dependent variable). And the numerator should be bigger than the denominator when the variation due to the treatment is not zero (i.e., when the treatment does affect the dependent variable).

The F ratio for subjects works the same way. When subject differences do not affect the dependent variable, the numerator of the F ratio should equal the denominator. Otherwise, the numerator should be bigger than the denominator.

Each F ratio is a convenient measure that we can use to test the null hypothesis about the effect of a source (the treatment or the subjects) on the dependent variable. Here's how to conduct the test:

  • When the F ratio is close to one, the numerator of the F ratio is approximately equal to the denominator. This indicates that the source did not affect the dependent variable, so we cannot reject the null hypothesis.
  • When the F ratio is significantly greater than one, the numerator is bigger than the denominator. This indicates that the source did affect the dependent variable, so we must reject the null hypothesis.

What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.

P-Value

In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.

With analysis of variance for a repeated measures experiment, the F ratios are the observed experimental outcomes that we are interested in. So, the P-value would be the probability that an F ratio would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.

How does an experimenter attach a probability to an observed F ratio? Luckily, the F ratio is a random variable that has an F distribution. The degrees of freedom (v1 and v2) for the F ratio are the degrees of freedom associated with the mean squares used to compute the F ratio.

For example, consider the F ratio for a treatment effect. That F ratio ( FT ) is computed from the following formula:

FT = F(v1, v2) = MST / MSE

MST (the numerator in the formula) has degrees of freedom equal to dfTR ; so for F, v1 is equal to dfTR . Similarly, MSE (the denominator in the formula) has degrees of freedom equal to df; so for F, v2 is equal to df. Knowing the F ratio and its degrees of freedom, we can use an F table or Stat Trek's free F distribution calculator to find the probability that an F ratio will be bigger than the actual F ratio observed in the experiment.

To illustrate the process, let's find P-values for the treatment effect and for the subject effect in this repeated measures experiment.

Treatment Variable P-Value

From previous computations, we know the following:

  • The observed value of the F ratio for the treatment variable is 7.
  • The F ratio (FT) was computed from the following formula:

    FT = F(v1, v2) = MST / MSE

  • The degrees of freedom (v1) for the treatment variable mean square (MST) is 2.
  • The degrees of freedom (v2) for the error mean square (MSE) is 10.

Therefore, the P-value we are looking for is the probability that an F with 2 and 10 degrees of freedom is greater than 7. We want to know:

P [ F(2, 10) > 7 ]

Now, we are ready to use the F Distribution Calculator. We enter the degrees of freedom (v1 = 2) for the treatment mean square, the degrees of freedom (v2 = 10) for the error mean square, and the F value (7) into the calculator; and hit the Calculate button.

F-Distribution calculator shows cumulative probability equals 0.99.

The calculator reports that the probability that F is greater than 7 equals about 0.01. Hence, the correct P-value for the treatment variable is 0.01.

P-Value for Subjects

The process to compute the P-value for subjects is exactly the same as the process used for the treatment variable. From previous computations, we know the following:

  • The observed value of the F ratio for subjects is 30.7.
  • The F ratio (FS) was computed from the following formula:

    FS = F(v1, v2) = MSS / MSE

  • The degrees of freedom (v1) for the subjects mean square (MSB) is 5.
  • The degrees of freedom (v2) for the error mean square (MSE) is 10.

Therefore, the P-value we are looking for is the probability that an F with 5 and 10 degrees of freedom is greater than 30.7. We want to know:

P [ F(5, 10) > 30.7 ]

Now, we are ready to use the F Distribution Calculator. We enter the degrees of freedom (v1 = 5) for the subjects mean square, the degrees of freedom (v2 = 10) for the error mean square, and the F value (33) into the calculator; and hit the Calculate button.

F-Distribution calculator shows cumulative probability is less than 0.99.

The calculator reports that the probability that F is greater than 30.7 equals 0.00001. Hence, the correct P-value is 0.00001.

Interpretation of Results

Having completed the computations for analysis, we are ready to interpret results. We begin by displaying key findings in an ANOVA summary table. Then, we use those findings to test the hypothesis that there is no significant difference between treatment levels.

ANOVA Summary Table

It is traditional to summarize ANOVA results in an analysis of variance table. Here, filled with key results, is the analysis of variance table for the repeated measures experiment that we have been working on.

Analysis of Variance Table

Source SS df MS F P
Treatment 5.44 2 2.72 7.0 0.01
Subjects 59.78 5 11.96 30.7 <0.01
Error 3.89 10 0.39
Total 69.11 17

Recall that the experimenter specified a significance level of 0.05 for this study. Once you know the significance level and the P-values, the hypothesis tests are routine.

The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When a P-value for the treatment effect or the subject effect is bigger than the significance level, we accept the null hypothesis for the effect; when it is smaller, we reject the null hypothesis.

Based on the P-values in the table above, we can draw the following conclusions:

  • The P-value for treatments (i.e., the independent variable) is 0.01. Since the P-value is smaller than the significance level (0.05), we reject the null hypothesis that the independent variable (the type of pill given to the subject) has no effect on the dependent variable.
  • The P-value for subjects is less than 0.01. Since this P-value is also smaller than the significance level (0.05), we reject the null hypothesis that subjects had no effect on the dependent variable.

What About Sphericity?

In the previous lesson, we noted the importance of the sphericity assumption for analysis of variance with repeated measures experiments. Specifically, we noted that violations of the sphericity assumption increase the likelihood of making a Type I error (i.e., rejecting the null hypothesis when it is, in fact, true).

Whenever a standard analysis of variance for a repeated measures experiment leads you to reject the null hypothesis, it is important to consider the role of sphericity in your analysis. You need to (a) test for a violation of the sphericity assumption and/or (b) adjust the analysis to account for sphericity effects.

You may have noticed that we rejected the null hypothesis in the sample problem for this lesson, but we failed to address sphericity in any way. So our work on the sample problem is not finished. We'll learn more about sphericity, and we'll complete the sample problem in the next lesson.