Statistics Dictionary

To see a definition, select a term from the dropdown text box below. The statistics dictionary will display the definition, plus links to related web pages.

Select term:

Chi-Square Test for Independence

A chi-square test for independence is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.

The test consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

  • State the hypotheses. A chi-square test for independence is conducted on two categorical variables. Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent. The alternative hypothesis states that the variables are not independent.

  • Formulate analysis plan. The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify a significance level and should identify the chi-square test for independence as the test method.

  • Analyze sample data. Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the P-value associated with the test statistic.

    • Degrees of freedom. The degrees of freedom (DF) is equal to:

      DF = (r - 1) * (c - 1)

      where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.

    • Expected frequencies. The expected frequency counts are computed separately for each level of one categorical variable at each level of the other categorical variable. Compute r*c expected frequencies, according to the following formula.

      Er,c = (nr * nc) / n

      where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level c of Variable B, and n is the total sample size.

    • Test statistic. The test statistic is a chi-square random variable (Χ2) defined by the following equation.

      Χ2 = Σ [ (Or,c - Er,c)2 / Er,c ]

      where Or,c is the observed frequency count at level r of Variable A and level c of Variable B, and Er,c is the expected frequency count at level r of Variable A and level c of Variable B.

    • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

  • Interpret results. If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.
See also:   Chi-Square Test for Independence