Chi-Square Goodness of Fit Test

This lesson explains how to conduct a chi-square goodness of fit test. The test is applied when you have one categorical variable from a single population. It is used to determine whether sample data are consistent with a hypothesized population distribution.

For example, suppose a company prints baseball cards. The company claims that 30% of its cards are rookies; 60% are veterans but not All-Stars; and 10% are veteran All-Stars. We could gather a random sample of baseball cards and use a chi-square goodness of fit test to see whether our sample distribution differed significantly from the distribution claimed by the company. The sample problem at the end of the lesson considers this example.

When to Use the Chi-Square Goodness of Fit Test

The chi-square goodness of fit test is appropriate when the following conditions are met:

The sampling method is simple random sampling.
Population size (N) is at least 10 times as big as sample size (n).
The variable under study is categorical.
The data under study are counts, not means or percentages.
The expected value of the number of sample observations in each level of the variable is at least 5.

General Procedure for Hypothesis Testing

To test any hypothesis, the same five-step procedure is used: (1) state the hypotheses, (2) choose the significance level, (3) compute the test statistic, (4) find the P-value, and (5) interpret results. Here, we apply the general procedure to the chi-square goodness of fit test.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis (H₀) and an alternative hypothesis (H_a). The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false, as shown below.

For a chi-square goodness of fit test, the hypotheses take the following form.

H₀: The data are consistent with a specified distribution.
H_a: The data are not consistent with a specified distribution.

Typically, the null hypothesis (H₀) specifies the proportion of observations at each level of the categorical variable. The alternative hypothesis (H_a) is that at least one of the specified proportions is not true.

Choose the Significance Level

The significance level is the probability of rejecting the null hypothesis when it is actually true. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10.

Compute the Test Statistic

The chi-square goodness of fit test requires that you compute the degrees of freedom (df) for the test statistic, an expected cell count for each catergory of the variable, and a chi-square test statistic. Formulas for all the necessary computations appear below.

Degrees of freedom. The degrees of freedom (df) is equal to the number of levels (k) of the categorical variable minus 1.
df = k - 1
Expected frequency counts. The expected frequency counts at each level of the categorical variable are equal to the sample size times the hypothesized proportion from the null hypothesis
E_i = np_i
where E_i is the expected frequency count for the ith level of the categorical variable, n is the total sample size, and p_i is the hypothesized proportion of observations in level i.
Test statistic. The test statistic is a chi-square random variable (χ²) defined by the following equation.
χ² = Σ [ (O_i - E_i)² / E_i ]
where O_i is the observed frequency count for the ith level of the categorical variable, and E_i is the expected frequency count for the ith level of the categorical variable.

Find the P-Value

The P-value is the probability of observing a sample statistic as extreme as the test statistic. To find the probability for a chi-square test statistic with degrees of freedom equal to df, use a chi-square distribution table or a statistical software. (The sample problem at the end of this lesson uses Stat Trek's Chi-Square Distribution Calculator to find the P-value for a chi-square test statistic.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. This involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

Problem

Acme Toy Company prints baseball cards. The company claims that 30% of the cards are rookies, 60% veterans but not All-Stars, and 10% are veteran All-Stars. Suppose a random sample of 100 cards has 50 rookies, 45 veterans, and 5 All-Stars.

Is this result consistent with Acme's claim? Use a 0.05 level of significance.

Solution

To solve this problem, we conduct a chi-square goodness of fit test. Like any hypothesis test, a chi-square goodness of fit test consists of five steps: (1) state the hypotheses, (2) choose the significance level, (3) compute the test statistic, (4) find the P-value, and (5) interpret results.

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
- Null hypothesis: The proportion of rookies, veterans, and All-Stars is 30%, 60% and 10%, respectively.
- Alternative hypothesis: At least one of the proportions in the null hypothesis is false.
Choose the significance level. For this analysis, the significance level is 0.05.
Compute the test statistic. Applying the chi-square goodness of fit test to sample data, we compute the degrees of freedom, the expected frequency counts, and the chi-square test statistic.
df = k - 1 = 3 - 1 = 2

(E_i) = n * p_i
(E₁) = 100 * 0.30 = 30
(E₂) = 100 * 0.60 = 60
(E₃) = 100 * 0.10 = 10

χ² = Σ [ (O_i - E_i)² / E_i ]
χ² = [ (50 - 30)² / 30 ] + [ (45 - 60)² / 60 ] + [ (5 - 10)² / 10 ]
χ² = (400 / 30) + (225 / 60) + (25 / 10) = 13.33 + 3.75 + 2.50 = 19.58

where df is the degrees of freedom, k is the number of levels of the categorical variable, n is the number of observations in the sample, E_i is the expected frequency count for level i, O_i is the observed frequency count for level i, and χ² is the chi-square test statistic.
Find the P-value. The P-value is the probability that a chi-square statistic having 2 degrees of freedom is more extreme (bigger) than 19.58. Based on the chi-square statistic and the degrees of freedom, we determine the P-value. We use the Chi-Square Distribution Calculator to find P(χ² > 19.58) = 0.00006.

Interpret results. Since the P-value (0.00006) is less than the significance level (0.05), we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the variable under study was categorical, the population size was more than 10 times bigger than sample size, and each level of the categorical variable had an expected frequency count of at least 5.

Last lesson Next lesson