Statistics Dictionary
To see a definition, select a term from the dropdown text box below. The statistics
dictionary will display the definition, plus links to related web pages.
Select term:
ChiSquare Test for Homogeneity
The chisquare test of homogeneity is applied to a single
categorical variable
. It is used to compare the distribution
of frequency counts across different populations. It answers
the following question: Are frequency counts distributed
identically across different populations?
The test procedure is appropriate when the following conditions are met:

For each population, the sampling method is
simple random sampling
.
 The population is at least 10 times as large as the sample.

The variable under study is
categorical
.

If sample data are displayed in a
contingency table
(Populations x Category levels),
the expected frequency count for each cell of the table is
at least 5.
This approach consists of four steps: (1) state the hypotheses,
(2) formulate an analysis plan, (3) analyze sample data, and
(4) interpret results.

State the hypothesis. Every hypothesis test requires a
null hypothesis
and an
alternative hypothesis. Suppose that data were sampled from r populations, and
assume that the categorical variable had c levels.
At any specified level of the categorical variable,
the null hypothesis states that each population has the same
proportion of observations. Thus,
H_{0}: P_{level 1 of population 1}
= P_{level 1 of population 2} = . . .
= P_{level 1 of population r}
H_{0}: P_{level 2 of population 1}
= P_{level 2 of population 2} = . . .
= P_{level 2 of population r}
. . .
H_{0}: P_{level c of population 1}
= P_{level c of population 2} = . . .
= P_{level c of population r}

The alternative hypothesis (H_{a}) is that at least
one of the null hypothesis statements is false.

Formulate an analysis plan. The analysis plan describes
how to use sample data to accept or reject the null
hypothesis. The plan should specify the
significance level and the test method (i.e., the chisquare
test of homogeneity).

Analyze sample data. Using sample data from the contingency tables, find the
degrees of freedom, expected frequency counts,
test statistic, and the Pvalue associated with the test statistic.
The analysis described in this section is illustrated in the
sample problem at the end of this lesson.

Degrees of freedom. The
degrees of freedom
(DF) is equal to:
DF = (r  1) * (c  1)
where r is the number of populations, and
c is the number of levels for the categorical
variable.

Expected frequency counts. The expected frequency counts
are computed separately for each population
at each level of the categorical variable, according to the
following formula.
E_{r,c} = (n_{r} * n_{c}) / n
where
E_{r,c} is the expected frequency count for
population r
at level c of the categorical variable,
n_{r} is the total number of observations from
population r,
n_{c} is the total number of observations at
treatment level c, and
n is the total sample size.

Test statistic. The test statistic is a chisquare random variable
(Χ^{2}) defined by
the following equation.
Χ^{2} =
Σ [ (O_{r,c}  E_{r,c})^{2} / E_{r,c} ]
where
O_{r,c} is the observed frequency count in population r
for level c of the categorical variable, and
E_{r,c} is the expected frequency count in
population r for level c of the categorical
variable.

Pvalue. The Pvalue is the probability of observing a
sample statistic as extreme as the test statistic. Since the
test statistic is a chisquare, use the
ChiSquare Distribution Calculator
to assess the probability associated with the test statistic. Use
the degrees of freedom computed above.

Interpret results. If the sample findings are unlikely, given
the null hypothesis, the researcher rejects the null hypothesis.
Typically, this involves comparing the Pvalue to the
significance level
,
and rejecting the null hypothesis when the Pvalue is less than
the significance level.