Dummy Variable
A dummy variable (aka, an indicator variable) is a numeric variable that represents
categorical
data, such as gender, race, political affiliation, etc.
Researchers use dummy variables to analyze regression equations when one or more independent variables are
categorical. The key
to the analysis is to express categorical variables as dummy variables.
Technically, dummy variables are dichotomous, quantitative variables; they can take
on any two quantitative values. As a practical matter, regression results are easier to interpret when dummy
variables take on two specific values, 1 or 0. Typically, 1 represents the presence of a qualitative
attribute, and 0 represents the absence.
The number of dummy variables required to represent a particular categorical variable depends on
the number of values that the categorical variable can assume. To represent a categorical variable
that can assume k different values, a researcher would need to define k - 1
dummy variables.
For example, suppose we are interested in political affiliation, a categorical variable that
might assume three values - Republican, Democrat, or Independent. We could represent political
affiliation with two dummy variables:
- X1 = 1, if Republican; X1 = 0, otherwise.
- X2 = 1, if Democrat; X2 = 0, otherwise.
In this example, notice that we don't have to create a dummy variable to represent the "Independent" category
of political affiliation. If X1 equals 0 and X2 equals zero, we know the
voter is neither Republican nor Democrat. Therefore, voter must be Independent.