### Analysis of Variance

#### Introduction

#### Completely randomized

#### Follow-up tests

#### Full factorial

#### Randomized block

#### Repeated measures

#### Calculators

#### Appendices

### Analysis of Variance:

Table of Contents

#### Introduction

#### Completely randomized design

#### Follow-up tests

#### Full factorial design

#### Randomized block design

#### Repeated measured design

#### Calculators

#### Appendices

# Comparison of Treatment Means

A **comparison** (aka, a contrast) is a weighted sum of factor level means. Researchers use comparisons to address additional
questions that are not answered by a standard, omnibus analysis of variance.

A standard, omnibus analysis of variance answers one question: Do mean scores differ significantly among treatment groups?
A significant F ratio indicates that the mean score in at least one treatment group differs significantly
from the mean score in at least one other treatment group; but a significant F ratio
does not reveal *which* mean scores are significantly different.

To understand which mean scores are significantly different, researchers conduct follow-up analyses in which they look at comparisons.

## Attributes of a Comparison

A comparison is a weighted sum of mean scores. Mathematically, a comparison can be expressed as:

_{j}X

_{j}j=1

In addition, all comparisons are subject to the following constraint:

_{j}c

_{j}= 0

In the equations above, L is the value of the comparison, c_{j} is a coefficient (weight) for treatment *j*,
X_{j} is the mean score for treatment *j*,
n_{j} is the number of subjects assigned to treatment *j* , and
*k* is the number of treatment groups.

### Alternative Definition of a Comparison

In some textbooks, you may see a comparison defined as a weighted sum of total scores (T_{j}),
rather than as a weighted sum of mean scores (X_{j}).

_{j}T

_{j}j=1

On this website, when we refer to a comparison - in text or in equations - we will be referring to a weighted sum of mean scores. Others may make a different choice. So, if you read about comparisons in other places, be aware of which definition is being used.

With balanced designs (i.e., designs in which sample size is constant across treatment groups), the necessary condition for a comparison reduces to:

Σ c_{j} = 0

And, for convenience, we will assign one additional constraint to the comparisons that we work with in this tutorial:

Σ | c_{j} | = 2

In the equation above, the symbol | c_{j} | refers to the
absolute value of c_{j} .

So, here are the key things you should know about a comparison from a balanced experimental design:

- A comparison is a weighted sum of factor level means.
- The sum of the coefficent raw scores (weights) is equal to zero.
- The sum of the coefficient absolute values is equal to two.

## How to Use Comparisons

Researchers use comparisons to identify particular treatment means for analysis. To understand how they do this, it helps to look at an example. So, consider the following completely randomized, one-factor experiment.

Treatment | ||
---|---|---|

Group 1 | Group 2 | Group 3 |

210 | 210 | 180 |

240 | 240 | 210 |

270 | 240 | 210 |

270 | 270 | 210 |

300 | 270 | 240 |

We conducted a standard analysis of variance for this experiment in a previous lesson (see One-Way Analysis of Variance: Example). That analysis resulted in a significant F ratio (p = .04).

### Additional Research Questions

The significant, omnibus F test (p = .04) tells us that the mean score in at least one treatment group is different from the mean score in at least one other treatment group. But it does not say anything about how the mean scores differ. For example, here are some additional research questions that are not addressed by an omnibus F test:

- Is the mean score in Group 1 significantly different from the mean score in Group 2?
- Is the mean score in Group 1 significantly different from the mean score in Group 3?
- Is the mean score in Group 2 significantly different from the mean score in Group 3?
- Is the mean score in Group 1 significantly different from the average of mean scores in Groups 2 and 3?
- Is the mean score in Group 2 significantly different from the average of mean scores in Groups 1 and 3?
- Is the mean score in Group 3 significantly different from the average of mean scores in Groups 1 and 2?

### Comparisons and Research Questions

Each of the research questions listed above can be represented mathematically by a comparison (a weighted sum of factor level means) in the following form:

L_{i} = Σ c_{j} X_{j}

L_{i} = c_{1}X_{1}
+ c_{2}X_{2}
+ c_{3}X_{3}

To illustrate the process, let's define a comparison for each research question listed above.

**Is the mean score in Group 1 significantly different from the mean score in Group 2?**A comparison (L

_{1}) to represent this research question is obtained by setting c_{1}= 1, c_{2}= -1, and c_{3}= 0, as shown below:L

_{1}= 1 * X_{1}- 1 * X_{2}+ 0 * X_{3}L

_{1}= X_{1}- X_{2}**Is the mean score in Group 1 significantly different from the mean score in Group 3?**A comparison (L

_{2}) to represent this research question is obtained by setting c_{1}= 1, c_{2}= 0, and c_{3}= -1, as shown below:L

_{2}= 1 * X_{1}+ 0 * X_{2}- 1 * X_{3}L

_{2}= X_{1}- X_{3}**Is the mean score in Group 2 significantly different from the mean score in Group 3?**A comparison (L

_{3}) to represent this research question is obtained by setting c_{1}= 0, c_{2}= 1, and c_{3}= -1, as shown below:L

_{3}= 0 * X_{1}+ 1 * X_{2}- 1 * X_{3}L

_{3}= X_{2}- X_{3}**Is the mean score in Group 1 significantly different from the average of mean scores in Groups 2 and 3?**A comparison (L

_{4}) to represent this research question is obtained by setting c_{1}= 1, c_{2}= -0.5, and c_{3}= -0.5, as shown below:L

_{4}= 1 * X_{1}- 0.5 * X_{2}- 0.5 * X_{3}L

_{4}= X_{1}- (X_{2}+ X_{3}) / 2**Is the mean score in Group 2 significantly different from the average of mean scores in Groups 1 and 3?**A comparison (L

_{5}) to represent this research question is obtained by setting c_{1}= -0.5, c_{2}= 1, and c_{3}= -0.5, as shown below:L

_{5}= 1 * X_{2}- 0.5 * X_{1}- 0.5 * X_{3}L

_{5}= X_{2}- (X_{1}+ X_{3}) / 2**Is the mean score in Group 3 significantly different from the average of mean scores in Groups 1 and 2?**A comparison (L

_{6}) to represent this research question is obtained by setting c_{1}= -0.5, c_{2}= -0.5, and c_{3}= 1, as shown below:L

_{6}= 1 * X_{3}- 0.5 * X_{1}- 0.5 * X_{2}L

_{6}= X_{3}- (X_{1}+ X_{2}) / 2

Notice that each of the comparisons satisfy the two constraints that we mentioned earlier for a balanced experimental design:

Σ c_{j} = 0
and
Σ | c_{j} | = 2

For each comparison, the sum of the coefficent raw scores is equal to zero; and the sum of the coefficient absolute values is equal to two.

### Comparison Sum of Squares

With a balanced design, the sum of squares for a given comparison ( L_{i} ) can be computed from the following formula:

SS_{i} = n * L_{i}^{2} / Σ c^{2}_{ij}

where SS_{i} is the sum of squares for comparison L_{i} ,
L_{i} is the value of the comparison, n is the sample size in each group,
and c_{ij} is the coefficient (weight) for level *j* in the formula for comparison L_{i}.

When the design is unbalanced and Σ n_{j}c_{j} = 0,
the sum of squares for a given comparison ( L_{i} ) can be computed from the following formula:

SS_{i} = ( Σ n_{j}c_{ij} X_{j} )^{2} / Σ n_{j}c^{2}_{ij}

where SS_{i} is the sum of squares for comparison L_{i} ,
n_{j} is the sample size in Group j ,
c_{ij} is the coefficient (weight) for level *j* in the formula for comparison L_{i},
and X_{j} is the mean score for Group j .

**Note:** For an example that uses this formula to compute the sum of squares for a comparison,
see Problem 2.

## Planned Comparisons vs. Post Hoc Comparisons

Comparisons fall into one of two groups, depending on their origin in the research plan.

**Planned comparisons.**Planned comparisons (aka,*a priori*comparisons) test hypotheses that were posed upfront in the analysis plan. These are the hypotheses that the experiment was designed to test.**Post hoc comparisons.**Post hoc comparisons (aka,*a posteriori*comparisons) test hypotheses that did not appear in the original analysis plan. These are hypotheses posed after data collection to shed additional light on relationships between mean scores.

## Why Do We Care?

Why do we care about comparisons? With the right tweaks to a standard analysis of variance, comparisons can be tested for statistical significance. With comparisons, we can perform follow-up analyses to address research questions that are not addressed by a standard, omnibus analysis of variance.

To perform these follow-up analyses, you need to do all of the things we've covered in this lesson:

- Define a comparison that represents the research question of interest.
- Compute the value of that comparison.
- Calculate a sum of squares for that comparison.
- Discriminate between planned comparisons and post hoc comparisons.

In subsequent lessons, we will fill in the missing details that will allow you to supplement a standard analysis of variance with relevant follow-up tests.

## Test Your Understanding

**Problem 1**

You're running single-factor experiment with four treatment groups. Group 1 is a control group. Subjects in Group 1 do not receive any vitamins. Subjects in Groups 2, 3, and 4 receive vitamin A, vitamin B, or vitamin C, respectively.

Group 1 | Group 2 | Group 3 | Group 4 |
---|---|---|---|

Control | Vitamin A | Vitamin B | Vitamin C |

You want to know whether the mean score in the control group (X_{1})
is significantly different from average of mean scores
(X_{2}, X_{3},
and X_{4})
in the other three groups. (Assume that sample size is the same in each group.)

Which of the following comparisons describes the research question you want to test?

(A) L_{1} = X_{1} - X_{2}

(B) L_{2} = X_{1} - X_{2}
- X_{3} - X_{4}

(C) L_{3} = X_{1}
- (X_{2} + X_{3}
+ X_{4}) / 3

(D) L_{4} = X_{1}
- (X_{2} + X_{3}
+ X_{4}) / 4

(E) None of the above.

**Solution**

The correct answer is (C). The comparison L_{3} is expressed in the correct form:

L = Σ c_{j} X_{j}

where c_{1} = 1, c_{2} = -1/3, c_{3} = -1/3,
and c_{4} = -1/3.

Note also that the coefficients of comparison L_{3} satisfy the constraints that we described earlier:

Σ c_{j} = 0
and
Σ | c_{j} | = 2

Comparison L_{3} measures the difference between the mean score in the control group (X_{1})
and the average of the other three treatment means - (X_{2} + X_{3}
+ X_{4}) / 3. If L_{3} were close to zero, we would conclude that the mean of the
control group was not very different from the mean of the other three groups combined.

Comparison L_{1} compares the mean score in
Group 1 to the mean score in Group 2, but it ignores the mean scores in Group 3 and Group 4. So
L_{1} does not address the research question posed by the researcher.

Comparisons L_{2} and L_{4} also do not address the research question of interest.
And comparisons L_{2} and L_{4} do not satisfy the constraints that we described earlier:

Σ c_{j} = 0
and
Σ | c_{j} | = 2

So comparisons L_{2} and L_{4} cannot be correct answers to this problem.

**Problem 2**

You're running single-factor experiment with three treatment groups. Each group has 10 subjects. Mean scores for each group appear below:

Group 1 | Group 2 | Group 3 |
---|---|---|

50 | 60 | 70 |

You want to test the hypothesis that the mean score in Group 1 is not significantly different from the mean score in Group 3. Here is the comparison relevant to that hypothesis:

L_{1} = 1 * X_{1}
+ 0 * X_{2}
- 1 * X_{3}

L_{1} = X_{1} - X_{3}

L_{1} = 50 - 70 = -20

What is the sum of squares for this comparison?

(A) 1000

(B) 2000

(C) 3000

(D) 4000

(E) None of the above

**Solution**

The correct answer is (B). Since we are dealing with a
balanced design, the sum of squares for comparison L_{1} can be computed from the following formula:

SS_{i} = n * L_{i}^{2} / Σ c^{2}_{ij}

SS_{1} = 10 * (-20)^{2} / [ (1)^{2} + (0)^{2} + (-1)^{2} ]

SS_{1} = 4000 / 2 = 2000

where SS_{i} is the sum of squares for comparison L_{i} ,
L_{i} is the value of the comparison, n is the sample size in each group,
and c_{ij} is the coefficient (weight) for level *j* in the formula for comparison L_{i}.