F Ratio for Planned Comparisons

This lesson explains when and how to use an F ratio with analysis of variance to test statistical hypotheses represented by one or more planned comparisons.

Prerequisites: This lesson assumes familiarity with comparisons and orthogonal comparisons. You should be able to distinguish an ordinary comparison from an orthogonal comparison. You should know how to represent a statistical hypothesis mathematically by a comparison. You should be able to compute the sum of squares associated with a comparison. And you should understand how the probability of committing a Type I error is affected by the number of comparisons tested. If you don't know these things, review the following lessons:

  • Comparison of Treatment Means. This lesson defines an ordinary comparison. It explains how to represent a statistical hypothesis mathematically by a comparison. And it explains how to compute the sum of squares for a comparison.
  • Orthogonal Comparisons. This lesson explains how to distinguish an ordinary comparison from an orthogonal comparison.
  • Multiple Comparisons. This lesson describes how the probability of committing a Type I error is affected by the number of comparisons tested.

How to Compute an F Ratio

In statistics, F ratios abound. They are computed in many different ways for many different purposes. In this lesson, we want to compute an F ratio that can be used to test a statistical hypothesis represented by a planned comparison. For this purpose, an F ratio can be computed from the following formula:

F(1, v2) = SSi / MSWG

where F is the value of the F ratio, SSi is the sum of squares for comparison i, and MSWG is the within-groups mean square (from an ANOVA table). The numerator of the F ratio has one degree of freedom. The denominator of the F ratio has degrees of freedom (v2) equal to the degrees of freedom for the within-groups mean square.

Note how easy it is to compute this F ratio. You only need:

  • The within-groups mean square, which is readily available from a standard ANOVA table.
  • The sum of squares for a comparison, which is simple to calculate by hand. (Later in this lesson, we'll work through an example to demonstrate the hand computation.)

What About a t Ratio?

Some textbooks use t ratios, rather than F ratios, to test the statistical significance of multiple comparisons. The two ratios are related according to the following formula:

t(v)2 = F(1, v)

The square of a t ratio with v degrees of freedom is equal to an F ratio with 1 and v degrees of freedom.

Each ratio leads to the same conclusion about statistical significance. If a t ratio is statistically significant, the corresponding F ratio will also be statistically significant; and vice versa.

When to Use an F Ratio

In some situations, the F ratio is a good technique for testing the statistical significance of multiple comparisons. In other situations, it is not so good.

Advantages

There are several things to like about the F ratio, including the following:

  • The F ratio does a great job of controlling error rate per comparison. The experimenter sets the significance level for each individual comparison, and the F ratio assesses statistical significance accordingly.
  • The F ratio is easy to compute. All you need to compute an F ratio is output from a standard ANOVA table and the sum of squares for the comparison.

For an experimenter who is most concerned with controlling error rate per comparison, the F ratio is a good choice.

Disadvantages

There are several things to dislike about the F ratio, including the following:

  • The F ratio does a poor job of controlling error rate familywise. The more hypotheses you test, the more likely it is that you will reject at least one hypothesis that should not be rejected.
  • The F ratio is not a good choice for post hoc testing. With post hoc tests, the significance level of an uncorrected F ratio underestimates the true likelihood of committing a Type I error.

For an experimenter who is most concerned with controlling error rate familywise, the F ratio may be a poor choice. And for an experimenter who is interested in post hoc testing (data snooping), the F ratio may be a poor choice.

What Do Statisticians Say?

Most statisticians agree that the F ratio is the wrong choice for post hoc testing. Other techniques (e.g., Scheffé's S method, Bonferroni correction) are preferred.

For planned tests of multiple comparisons, statisticians fall into one of three camps.

  • Don't use an unadjusted F ratio. When an experiment calls for many hypothesis tests, error rate familywise can become unacceptably high. Folks in this camp favor methods that control error rate familywise (e.g., Bonferroni's correction).
  • Use an unadjusted F ratio only when a small number of comparisons will be tested. This strategy allows the experimenter to control error rate per comparison, while holding error rate familywise to a level that the experimenter considers acceptably low.
  • Use an unadjusted F ratio only when a small number of orthogonal comparisons will be tested. This strategy recognizes the fact that orthogonal comparisons use nonoverlapping data for hypothesis tests, almost like conducting a separate experiment to test each comparison. If you did conduct a separate experiment for each comparison, the error rate per comparison would equal the error rate familywise for that experiment.

In the end, the analysis method you choose will reflect your tolerance for controlling error rate per comparison versus error rate familywise. On this website, we will use an unadjusted F ratio only when the experiment calls for testing a small number of orthogonal comparisons.

A Step-By-Step Example

In this section, we'll work through an example to demonstrate how to use an F ratio to test the statistical significance of planned, orthogonal comparisons.

Experimental Design

To test the long-term effect of aerobic exercise on resting pulse rate, an investigator conducts a controlled experiment. The experiment uses a completely randomized design, consisting of three treatment groups:

  • Control. Subjects do not participate in an exercise program.
  • Low-effort. Subjects jog 1 mile on Monday, Wednesday, and Friday.
  • High-effort. Subjects jog 2 miles every day, except Sunday.

Five subjects are randomly assigned to each group; and, after 28 days of treament, their resting pulse rate is measured on day 29.

Analysis Plan

Before collecting any data, the investigator poses the research questions to be answered, states statistical hypotheses implied by each research question, and identifies the analytical technique used to test each statistical hypothesis.

Research Questions

For this experiment, the researcher is interested in three research questions. Those questions, the associated statistical hypotheses, and the associated comparisons appear below:

  • Overall research question. Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group?

    H0: μi = μj

    H1: μi ≠ μj

  • Follow-up question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in treatment groups (Group 2 and Group 3)?

    H0: μ1 = (μ2 + μ3) / 2

    H1: μ1 ≠ (μ2 + μ3) / 2

    This statistical hypothesis can be represented mathematically by the comparison L1:

    L1 = X1 - 0.5X2 - 0.5X3

    where X1, X2, and X3 are means scores for Groups 1, 2, and 3, respectively.
  • Follow-up question 2. Will mean pulse rate of subjects in the low-effort group (Group 2) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?

    H0: μ2 = μ3

    H1: μ2 ≠ μ3

    This statistical hypothesis can be represented mathematically by comparison L2:

    L2 = X2 - X3

    where X2 and X3 are means scores for Groups 2 and 3, respectively.

Analytical Techniques

The overall research question asks whether the mean pulse rate in one treatment group differs from the mean pulse rate in any other group. The null hypothesis implied by this research question can be tested by an omnibus analysis of variance.

The remaining questions are follow-up questions. To determine whether to reject the null hypothesis for a follow-up question, we test its associated comparison for statistical significance. Here are the comparisons we will be testing:

  • L1 = X1 - 0.5X2 - 0.5X3
  • L2 = X2 - X3

Notice that these comparisons are orthogonal. Because we are testing a small number of orthogonal comparisons (only two comparisons), we can use an F ratio to test the significance of each comparison.

For this example, assume that the investigator specifies a significance level of 0.05 to test the statistical significance of hypotheses associated each research question.

Experimental Data

Pulse rate measurements for each subject in each treatment group appear below:

Table 1. Pulse Rate for Each Subject in Each Group

Group 1 (control) Group 2 (low effort) Group 3 (high effort)
80 70 50
85 75 60
90 80 70
95 85 80
100 90 90

ANOVA Results

The overall research question is: Will mean pulse rate in one treatment group differ from mean pulse rate in any other treatment group? The statistical hypotheses implied by that question are:

H0: μi = μj

H1: μi ≠ μj

We can test this null hypothesis with a standard, omnibus analysis of variance. Here is the ANOVA table from that analysis.

Table 2. ANOVA Summary Table

Source SS df MS F P
BG 1000 2 500 4.0 0.046
WG 1500 12 125
Total 2500 14

The P value for the between-groups (BG) effect is 0.046, which is less that the significance level of 0.05. Therefore, we reject the null hypothesis of no difference in pulse rates between treatment groups.

Note: We explained how to conduct a one-way analysis of variance in previous lessons. If you're wondering how to produce the ANOVA table shown above, see One-Way Analysis of Variance: Example or One-Way Analysis of Variance With Excel.

Follow-up Tests

For this experiment, the investigator planned to conduct two follow-up tests to supplement the omnibus analysis of variance. In case you've forgotten, here are the two follow-up questions:

  • Follow-up question 1. Will mean pulse rate of subjects in the control group (Group 1) differ from the mean pulse rate of subjects in treatment groups (Group 2 and Group 3)?
  • Follow-up question 2. Will mean pulse rate of subjects in the low-effort group (Group 2) differ from the mean pulse rate of subjects in the high-effort group (Group 3)?

Each of these questions can be addressed by testing the statistical significance of a particular comparison. To illustrate the process, we'll work though a step-by-step analysis for the first follow-up question.

Step 1. Compute Mean Scores

Mean pulse rate within each group (computed from raw scores in Table 1) appears below:

Table 3. Mean Pulse Rate in Each Treatment Group

Group 1 (control) Group 2 (low effort) Group 3 (high effort)
90 80 70

Step 2. Define a Comparison

Next, we define a comparison that represents our research question. For the first follow-up question, we want to compare the mean score in Group 1 with the mean of scores in Groups 2 and 3. Therefore, this is the comparison we need to use:

L1 = X1 - 0.5X2 - 0.5X3

L1 = 90 - 0.5*80 - 0.5*70= 15

where L1 is the value of the comparison, X1 is the mean score in Group 1, X2 is the mean score in Group 2, and X3 is the mean score in Group 3.

Step 3. Compute Sum of Squares

With a balanced design, the sum of squares for a given comparison ( Li ) can be computed from the following formula:

SSi = n * Li2 / Σ c2ij

where SSi is the sum of squares for comparison Li , Li is the value of the comparison, n is the sample size in each group, and cij is the coefficient (weight) for level j in the formula for comparison Li.

Plugging values from our sample problem into the formula, we get:

SS1 = 5 * 152 / [ (1)2 + (-.5)2 + (-.5)2]

SS1 = 1125 / 1.5 = 750

Step 4. Produce ANOVA Summary Table

The summary table from an omnibus analysis of variance includes two outputs that we can use to test the statistical significance of a comparison. Those outputs are (1) the value of the within-groups mean square and (2) the degrees of freedom for the within-groups mean square.

We generated the ANOVA summary table earlier. For convenience, here it is again.

Table 2. ANOVA Summary Table

Source SS df MS F P
BG 1000 2 500 4.0 0.046
WG 1500 12 125
Total 2500 14

Step 5. Find the F Ratio

The F ratio for a comparison equals its sum of squares divided by the within-groups mean square (from the ANOVA table).

F(1, v2) = SSi / MSWG

where F is the value of the F ratio, SSi is the sum of squares for comparison i, and MSWG is the within-groups mean square. The numerator of any F ratio for a comparison has one degree of freedom. The degrees of freedom (v2) for the denominator equal the degrees of freedom (from the ANOVA table) for the within-groups mean square.

For this problem, the F ratio is:

F(1, 12) = 750 / 125 = 6.0

Step 6. Find the P-Value

With a planned comparison, the F ratio is probability that an F statistic would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.

We can use Stat Trek's F Distribution Calculator to find the probability that an F statistic will be bigger than the actual F ratio observed in the experiment. Enter the numerator degrees of freedom (1), the denominator degrees of freedom (12), and the observed F ratio (6.0) into the calculator; then, click the Calculate button.

F distribution calculator

From the calculator, we see that P( F < 6.0 ) equals 0.97; so the P ( F > 6.0 ) equals 1 minus 0.97 or 0.03. Therefore, the P-Value is 0.03.

Step 7. Test the Hypothesis

If the P-value for a comparison is less than the significance level, we reject the associated hypothesis. Otherwise, we fail to reject.

In this example, the P-value (0.03) is less than the significance level (0.05). Therefore, we reject the null hypothesis that the mean score in Group 1 is equal to the mean score in Groups 2 and 3.

What About the Other Follow-up Test?

As part of this experiment, the investigator planned to conduct two follow-up tests to supplement the omnibus analysis of variance. In the previous section, we described a seven-step process for conducting the first follow-up test. You would use the same seven-step process to conduct the second follow-up test.

ANOVA Summary Table

Follow-up tests are often reported in an enhanced ANOVA summary table. The enhanced table shows all of the results from a standard ANOVA summary table. In addition, it shows results (sum of squares, mean square, degrees of freedom, F ratio, and P-value) for each planned comparison.

Here is the enhanced ANOVA summary table for the present experiment.

Table 4. Enhanced ANOVA Summary Table

Source SS df MS F P
BG 1000 2 500 4.0 0.046
    L1 750 1 750 6.0 0.03
    L2 250 1 250 2.0 0.18
WG 1500 12 125

In this example, the between groups effect (BG) is statistically significant (p=0.046), indicating that the mean pulse rate in at least one group is significantly different from the mean pulse rate in another group. The comparison effects (L1 and L2) show results for follow-up tests. Only the L1 effect is statistically significant (p=0.03), indicating that the mean pulse rate in the control group is significantly different from the mean pulse in the two treatment groups. The mean pulse rate in the low-effort group (Group 2) is not significantly different from the mean pulse rate in the high-effort group (Group 3). Based on these findings, the investigator concludes that one or both of the treatments affect resting pulse rate.

Note: The mean square for a comparison is computed just like the mean square for any other treatment effect:

MS = SS / df

where MS is the mean square, SS is the sum of squares, and df is the degrees of freedom.

The degrees of freedom for every comparison is equal to one. Therefore, the mean square for a comparison equals the sum of squares for the comparison.