Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


How to Calculate Degrees of Freedom

The degrees of freedom (df) play a crucial role in several types of statistical analysis. The exact way to calculate degrees of freedom depends on the specific analysis you are conducting. Below, we show how to calculate degrees of freedom for several types of t-tests, for several types of chi-square tests, and for regression analysis.

What Are Degrees of Freedom?

In statistics, degrees of freedom (df) refer to the number of independent values that are free to vary given certain constraints. For example,

  • Given a sample of n observations, suppose the mean is constrained to equal 5. Here, the degrees of freedom are one less than sample size (df = n - 1). This is because only n - 1 observations can independently vary, since the value of the last observation is determined by the constraint that the mean equal 5.

In general, degrees of freedom are important in hypothesis testing, regression analysis, and the calculation of confidence intervals, as they affect the shape of statistical distributions (like the t-distribution or chi-square distribution) used in these analyses.

One-Sample t-Test

In a one-sample t-test, you're comparing the sample mean to a known population mean.

  • Formula for degrees of freedom (df):

    df = n - 1

    where n is sample size.
  • If sample size is 30, degrees of freedom would be:

    df = 30 - 1 = 29

Independent Two-Sample t-Test

In an independent two-sample t-test, you're comparing the means of two independent groups. The degrees of freedom depend on whether the two samples have equal or unequal variances and on the precision required.

Equal Variances Assumed (Pooled t-test)

If you assume equal variances between the two groups, the degrees of freedom are calculated using a pooled variance estimate.

  • Formula for degrees of freedom (df):

    df = n1 + n2 - 2

    where n1 is sample size in the first group, and n2 is sample size in the second group.
  • If you have 30 observations in the first group (n1 = 30) and 40 observations in the second group (n2 = 40), the degrees of freedom would be:

    df = 30 + 40 - 2 = 68

Unequal Variances Assumed (Welch’s t-test)

If you do not assume equal variances between the two groups, then Welch’s t-test is used; and the degrees of freedom are calculated using a more complex formula that accounts for the difference in variances between the two groups.

  • Formula for degrees of freedom (Welch-Satterthwaite equation):

    num = (s12/n1 + s22/n2)2

    den = [(s12/n1)2/(n1 - 1)] + [(s22/n2)2/(n2 - 1)]

    df = num / den

    where s12 and s22 are sample variances, and n1 and n2 are sample sizes in the two groups.
  • This formula is complex and requires you to know both the sample variances and sizes. If you have unequal variances, you'll use this formula to calculate the degrees of freedom. The result is typically a non-integer value, and you would round it to the nearest whole number for practical use.

Unequal Variances Assumed (Conservative Approach)

Compared to the Welch-Satterthwaite equation, the conservative approach simplifies the calculation; but it results in wider confidence intervals and more cautious hypothesis tests.

  • Formula for degrees of freedom (conservative approach):

    df = min(n1 - 1, n2 - 1 )

    where n1 and n2 are sample sizes in the two groups.
  • If you have 20 observations in the first group (n1 = 20) and 35 observations in the second group (n2 = 35), the degrees of freedom would be:

    df = 20 - 1 = 19

Matched Pairs t-Test

In a matched pairs t-test, you're comparing the means of two related groups or measurements (e.g., before and after treatment on the same subjects). The degrees of freedom are based on the number of paired differences.

  • Formula for degrees of freedom:

    df = n - 1

    where n is the number of pairs of observations (or the number of differences).
  • If you have 15 pairs of data points (e.g., 15 subjects before and after treatment), the degrees of freedom would be:

    df = 15 - 1 = 14

Chi-Square Goodness of Fit Test

The chi-square goodness of fit test is used to determine whether sample data are consistent with a hypothesized distribution.

The degrees of freedom (df) are determined by the number of categories or groups in the data, minus 1, and sometimes adjusted if certain parameters are estimated from the data.

When One or More Parameters Are Estimated

When one or more parameters are estimated from the data, the degrees of freedom formula accounts for p, the number of parameters estimated.

  • Formula for degrees of freedom:

    df = k - 1 - p

    where k is the number of levels of a categorical variable, and p is the number of estimated parameters.

When No Parameters Are Estimated

In a basic chi-square goodness of fit test, you generally have no parameters to estimate from the data, so p = 0. Thus, the formula simplifies to

  • Formula for degrees of freedom:

    df = k - 1

    where k is the number of levels of a categorical variable.

Chi-Square Test for Homogeneity

The chi-square test for homogeneity is used to compare the distribution of frequency counts across different populations. It answers the following question: Are frequency counts distributed identically across different populations?

  • Formula for degrees of freedom:

    df = (r - 1)(c - 1)

    where r is the number of populations, and c is the number of levels for the categorical variable.

Chi-Square Test of Independence

The chi-square test for independence is used to determine whether two categorical variables are independent of each other. This test is often applied to contingency tables (cross-tabulations of two categorical variables).

  • Formula for degrees of freedom:

    df = (r - 1)(c - 1)

    where r is the number of rows in the contingency table, and c is the number of columns.
  • The degrees of freedom are calculated based on the number of categories (levels) in each of the two variables you're testing. The reasoning is that the more rows and columns there are, the more constraints there are on how the observed and expected frequencies can vary.

Degrees of Freedom in Regression

To determine the degrees of freedom (df) in a simple linear or multiple regression model, you need to consider the number of observations (n) and the number of estimated parameters (p). Here’s a breakdown of how to calculate the degrees of freedom in regression:

Degrees of Freedom for Error (Residual df)

This represents the variation in the dependent variable that is not explained by the regression model. It is given by:

  • Formula for degrees of freedom:

    dferror = n - p

    where n is the total number of observations (data points) and p is the number of parameters estimated in the model (including the intercept).
  • In simple linear regression (with one predictor), p = 2 (intercept + slope). For multiple regression, p would be the number of predictors plus one (for the intercept).

Degrees of Freedom for Regression (Model df)

This represents the variation explained by the model (how much the predictors help explain the variability in the dependent variable). It is given by:

  • Formula for degrees of freedom:

    dfmodel = p - 1

    where p is the number of parameters estimated in the model (including the intercept).

For example, here is how to calculate degrees of freedom for the regression model for three common scenarios.

  • In simple linear regression (one predictor), there are two estimated parameters (intercept and slope), so dfmodel = 2 − 1 = 1.
  • In multiple regression with two predictors and no interactions, there are three estimated parameters (intercept and two regression coefficients), so dfmodel = 3 − 1 = 2.
  • In multiple regression with two predictors and one interaction, there are four estimated parameters (intercept, two regression coefficients for main effects, and one regression coefficient for the interaction ), so dfmodel = 4 − 1 = 3.