Scheffé's Test for Multiple Comparisons
Scheffé's test (aka, Scheffé's method, Scheffé's S method) is a procedure for testing multiple comparisons in analysis of variance.
The lesson is all about Scheffé's test  what it is, why it is needed, when to use it, and how to implement it.
Prerequisites: This lesson assumes familiarity with comparisons.
You should know how to represent a statistical hypothesis mathematically by a comparison.
You should be able to compute the sum of squares associated with a comparison.
And you should understand how the probability of committing a
Type I error is affected by the number of comparisons tested.
If you don't know these things, review the following lessons:
 Comparison of Treatment Means.
This lesson defines an ordinary comparison. It explains how to represent a statistical hypothesis mathematically by a comparison.
And it explains how to compute the sum of squares for a comparison.
 Multiple Comparisons.
This lesson describes how the probability of committing a Type I error is affected by the number of comparisons tested.
What is Scheffé's test?
Scheffé's test is a method for testing all pairwise and all nonpairwise comparisons of treatment means.
Here's how it works:
 Step 1. Set a significance level (α) for the error rate familywise.
(The significance level for Scheffé's test should equal the signifcance level used for the omnibus ANOVA in Step 3.)
 Step 2. Find the value for each comparison (L_{i}) that you want to test.
 Step 3. Generate an ANOVA table from a standard, omnibus analysis of variance.
 Step 4. Use the following formula to compute a critical value for Scheffé's test of comparison L_{i}:

___________________________________ 
CV_{i} = √ 
(k  1) F(v_{1}, v_{2}) MSE [ Σ(c_{j}^{2} / n_{j} ) ]

where CV_{i} is the critical value for comparison L_{i}, (k  1) is the between groups degrees of freedom,
F(v_{1}, v_{2}) is the F value with v_{1}, v_{2} degrees of freedom and a significance level of α,
v_{1} is degrees of freedom for the between groups factor,
v_{2} is degrees of freedom for the mean square error,
MSE is the mean square error,
c_{j} is a coefficient (weight) for treatment j in comparison L_{i}, and
n_{j} is sample size in Group j.
Note: To find values for the degrees of freedom and the mean squared error,
refer to the ANOVA table from Step 3. To find F(v_{1}, v_{2}), use Stat Trek's
F Distribution Calculator with the significance level from Step 1.
 Step 5. Compare the value from Step 2 (L_{i}) with the value from Step 4 (CV_{i}).
If L_{i} is bigger than CV_{i}, the comparison is statistically significant.
Why Do We Need Scheffé's test?
The Scheffé test is used mainly with post hoc comparisons in analysis of variance (ANOVA) experiments.
The test is used to determine whether the mean score in one treatment group differs from the mean score in a second treatment group, or
whether the mean score for one set of treatment groups differs from the mean score for a second set of treatment groups.
When to Use Scheffé's test
In some situations, Scheffé's test is a good technique for testing the statistical significance of multiple comparisons.
In other situations, it is not so good.
Advantages
There are several things to like about the Scheffé test, including the following:
 The Scheffé test can be used to make all possible comparisons among treatment means  pairwise comparisons (comparisons involving only two means) and nonpairwise comparisons
(comparisons involving more than two means).
 The Scheffé test sets the error rate familywise equal to a significance level (α) specified by the experimenter.
 The Scheffé test can be used with unequal sample sizes between groups.
 The Scheffé test provides a more sensitive test of nonpairwise comparisons than some other post hoc testing procedures
(e.g., Tukey's HSD test).
 When an experiment calls for many planned comparisons, the risk of
Type I errors
can be unacceptably high. In this situation, the Scheffé test, which controls error rate familywise,
may be a good alternative to tests that are normally used for planned comparisons.
For an experimenter who wants to test a lot of comparisons post hoc (particularly nonpairwise comparisons)
and still control error rate familywise, the Scheffé test is a good choice.
Disadvantages
There are several things to dislike about the Scheffé test, including the following:
 The Scheffé test has lower statistical power than tests that are designed for planned comparisons.
 For testing pairwise comparisons, the Scheffé test is less sensitive some other post hoc procedures
(e.g., Tukey's HSD test).
Note: A good way to increase the power of the Scheffé test is to use large sample sizes.
What Do Statisticians Say?
If you ask a statistician about when to use Scheffé's test, here are some comments you might hear:
 For post hoc testing, it only makes sense to use Scheffé's test after a significant omnibus analysis of variance.
If the analysis of variance does not provide evidence of significant differences among means,
there is no need to conduct followup tests looking for those differences.
 For post hoc testing of many comparisons, it makes sense to use Scheffé's test. For post hoc testing of only a few comparisons,
Bonferroni's correction might be the better choice.
 For a priori testing, Scheffé's test can be an acceptable choice when the experiment calls for tests
of many comparisons. When there are many comparisons to be tested, Scheffé's test might be considered a "safe" technique;
because compared to other methods, it provides a reasonable balance between control of Type I errors
and risk of Type II errors.
A StepByStep Example
In this section, we'll work through a simple example to illustrate the planning and analysis
required for post hoc testing with Scheffé's test.
Experimental Design
To test the longterm effect of aerobic exercise on resting pulse rate,
an investigator conducts a controlled experiment.
The experiment uses a completely randomized design, consisting of three treatment groups:
 Control. Subjects do not participate in an exercise program.
 Loweffort. Subjects jog 1 mile on Monday, Wednesday, and Friday.
 Higheffort. Subjects jog 2 miles every day, except Sunday.
Five subjects are randomly assigned to each group; and, after 28 days of treament,
their resting pulse rate is measured on day 29.
A Priori Analysis
To test planned comparisons, the investigator poses the research questions to be answered,
states statistical hypotheses implied by each research question, and
identifies the analytical technique(s) used to test each statistical hypothesis  all before any data is collected.
Then, following data collection, data is analyzed according to plan.
Research Question
For this experiment, the researcher is initially interested in one research question. That question,
and the associated statistical hypotheses, appears below:
 Overall research question. Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?
H_{0}: μ_{i} = μ_{j}
H_{1}: μ_{i} ≠ μ_{j}
Analytical Techniques
The overall research question asks whether the mean pulse rate in one treatment group differs from the mean pulse rate in any other group.
The null hypothesis implied by this research question can be tested by an omnibus analysis of variance.
For this example, assume that the investigator specifies a significance level of 0.05 to test the statistical significance
of the main research question.
Experimental Data
Pulse rate measurements for each subject in each treatment group appear below:
Table 1. Pulse Rate for Each Subject in Each Group
Group 1 (control) 
Group 2 (low effort) 
Group 3 (high effort) 
80 
70 
50 
85 
75 
60 
90 
80 
70 
95 
85 
80 
100 
90 
90 
ANOVA Results
The overall research question for a priori analysis is: Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?
The statistical hypotheses implied by that question are:
H_{0}: μ_{i} = μ_{j}
H_{1}: μ_{i} ≠ μ_{j}
We can test this null hypothesis with a standard, omnibus analysis of variance.
Here is the ANOVA table from that analysis.
Table 2. ANOVA Summary Table
Source 
SS 
df 
MS 
F 
P 
BG 
1000 
2 
500 
4.0 
0.046 
Error 
1500 
12 
125 


Total 
2500 
14 



The P value for the betweengroups (BG) effect is 0.046, which is less that the significance level of 0.05.
Therefore, we reject the null hypothesis of no difference in pulse rates between treatment groups.
Note: We explained how to conduct a oneway analysis of variance in previous lessons.
If you're wondering how to produce the ANOVA table shown above, see OneWay Analysis of Variance: Example
or OneWay Analysis of Variance With Excel.
Post Hoc Analysis
Having ascertained through the a priori analysis that a significant difference exists among the mean scores,
suppose the experimenter wants to investigate how the means differ.
Post Hoc Research Questions
For this post hoc analysis, the researcher decides to ask four followup questions. For each question,
there is an implied statistical hypothesis which can be tested by a unique comparison. The questions,
hypotheses, and comparisons appear below:
 Followup question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in the loweffort group (Group 2)?
H_{0}: μ_{1} = μ_{2}
H_{1}: μ_{1} ≠ μ_{2}
This statistical hypothesis can be represented mathematically by the comparison L_{1}:
L_{1} = X_{1}  X_{2}
 Followup question 2. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in the higheffort group (Group 3)?
H_{0}: μ_{1} = μ_{3}
H_{1}: μ_{1} ≠ μ_{3}
This statistical hypothesis can be represented mathematically by the comparison L_{2}:
L_{2} = X_{1}  X_{3}
 Followup question 3. Will mean pulse rate of subjects in the loweffort group (Group 2) differ from the mean pulse rate of subjects
in the higheffort group (Group 3)?
H_{0}: μ_{2} = μ_{3}
H_{1}: μ_{2} ≠ μ_{3}
This statistical hypothesis can be represented mathematically by the comparison L_{3}:
L_{3} = X_{2}  X_{3}
 Followup question 4. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects
in treatment groups (Group 2 and Group 3)?
H_{0}: μ_{1} = (μ_{2} + μ_{3}) / 2
H_{1}: μ_{1} ≠ (μ_{2} + μ_{3}) / 2
This statistical hypothesis can be represented mathematically by the comparison L_{4}:
L_{4} = X_{1}
 0.5X_{2}  0.5X_{3}
In the equations above, X_{1}, X_{2},
and X_{3} are mean scores for Groups 1, 2, and 3, respectively.
Post Hoc Analysis With Scheffé's Test
Each null hypothesis associated with a followup question can be
represented mathematically by a unique comparison. To determine whether to reject the null hypothesis for a followup question,
we can test its associated comparison for statistical significance, using Scheffé's test.
To illustrate the process, we'll work though Scheffé's test stepbystep.
Step 1. Specify a Significance Level
For post hoc analyses with Scheffé's test, the significance level should equal the significance level used a priori
for the omnibus, analysis of variance. We used a significance level of 0.05 for the a priori analysis, so we will use a
significance level of 0.05 for Scheffé's test.
Step 2. Find Comparison Values
Each comparison is a function of mean scores from treatment groups.
Mean pulse rate within each group (computed from raw scores in Table 1) appears below:
Table 3. Mean Pulse Rate in Each Treatment Group
Group 1 (control) 
Group 2 (low effort) 
Group 3 (high effort) 
90 
80 
70 
Given the treatment means, it is a simple matter to compute values for each comparison, as shown below:
Table 4. Comparison Values
Comparison 
Value 
L_{1} = X_{1}  X_{2} 
10 
L_{2} = X_{1}  X_{3} 
20 
L_{3} = X_{2}  X_{3} 
10 
L_{4} = X_{1}
 0.5X_{2}  0.5X_{3} 
15 
Step 3. Generate ANOVA Table
The summary table from an omnibus analysis of variance includes two outputs that we can use
to test the statistical significance of a comparison. Those outputs are
(1) the value of the mean squared error and (2) the degrees of freedom for the mean squared error.
We generated the ANOVA summary table earlier, as part of the a priori analysis. For convenience, here it is again.
Table 2. ANOVA Summary Table
Source 
SS 
df 
MS 
F 
P 
BG 
1000 
2 
500 
4.0 
0.046 
Error 
1500 
12 
125 


Total 
2500 
14 



Step 4. Find the Critical Values
The critical value for Scheffé's test of comparison L_{i} can be computed from the following formula:

___________________________________ 
CV_{i} = √ 
(k  1) F(v_{1}, v_{2}) MSE [ Σ(c_{j}^{2} / n_{j} ) ]

where CV_{i} is the critical value for comparison L_{i}, (k  1) is the between groups degrees of freedom,
F(v_{1}, v_{2}) is the F value with v_{1}, v_{2} degrees of freedom,
v_{1} is degrees of freedom for the between groups factor,
v_{2} is degrees of freedom for the mean square error,
MSE is the mean square error,
c_{j} is a coefficient (weight) for treatment j in comparison L_{i}, and
n_{j} is sample size in Group j.
To find values for the degrees of freedom and the mean squared error,
refer to the ANOVA table. From the table, we see that v_{1} equals 2,
v_{2} equals 12, and the mean squared error equals 125.
To find F(v_{1}, v_{2}), use Stat Trek's
F Distribution Calculator.
In the field for the numerator degrees of freedom, enter 2. In the field for the denominator degrees of freedom, enter 12.
And in the field for the cumulative probability, enter 1  α which is 1  0.05 or 0.95; Then, click the Calculate button.
From the calculator, we see that F(2,12) equals 3.89 when the significance level (α) is 0.05.
At last, we have all the values we need to compute a critical value for each comparison:

_________________________________ 
CV_{i} = √ 
(k  1) F(v_{1}, v_{2}) MSE [ Σ(c_{j}^{2} / n_{j} ) ]


___________________________ 
CV_{1} = √ 
2 * 3.89 * 125 * (0.2 + 0.2) = 19.7


___________________________ 
CV_{2} = √ 
2 * 3.89 * 125 * (0.2 + 0.2) = 19.7


___________________________ 
CV_{3} = √ 
2 * 3.89 * 125 * (0.2 + 0.2) = 19.7


____________________________________ 
CV_{4} = √ 
2 * 3.89 * 125 * (0.2 + 0.05 + 0.05) = 17

Step 5. Test Hypotheses
To test the statistical significance of each comparison, we compare the value of the comparison (L_{i} from Step 2)
with the critical value for the comparison (CV_{i} from Step 4).
If L_{i} is bigger than CV_{i}, the comparison is statistically significant.
Table 5 shows Scheffé test results for each comparison.
Table 5. Scheffé Test Results
Comparison 
L_{i} value 
CV_{i} value 
Conclusion 
X_{1}  X_{2} 
10 
19.7 
Not significant 
X_{1}  X_{3} 
20 
19.7 
Significant 
X_{2}  X_{3} 
10 
19.7 
Not significant 
X_{1}  0.5X_{2}  0.5X_{3} 
15 
17.0 
Not significant 
The second comparison is statistically significant, since L_{2} is bigger than CV_{2}.
The second comparison measures the difference between resting pulse rate in the control group (Group 1) and
resting pulse rate in the higheffort group (Group 3). From this post hoc analysis, we conclude that the higheffort treatment has
a significant effect on resting pulse rate.
None of the other comparisons are statistically significant.
Note: In a previous lesson, we tested the fourth comparison as part of a planned analysis and found it to be
statistically significant. This illustrates the value of deciding in advance which comparisons to test.
When the number of hypotheses tested is small, a priori tests (like the F ratio) tend to be more sensitive than
post hoc tests (like the Scheffé test).