### Linear Regression

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

### Linear Regession: Table of Contents

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

# Interaction Effects in Regression

In regression, an interaction effect exists when the effect of an independent variable on a dependent variable changes, depending on the value(s) of one or more other independent variables.

## Interaction Effects in Equations

In a regression equation, an interaction effect is represented as the product of two or more independent variables.
For example, here is a typical regression equation *without* an interaction:

ŷ = b_{0} + b_{1}X_{1} + b_{2}X_{2}

where ŷ is the predicted value of a dependent variable, X_{1} and X_{2} are independent variables,
and b_{0}, b_{1}, and b_{2} are regression coefficients.

And here is the same regression equation *with* an interaction:

ŷ = b_{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{1}X_{2}

Here, b_{3} is a regression coefficient, and X_{1}X_{2} is the interaction. The interaction between X_{1} and X_{2}
is called a two-way interaction, because it is the interaction between two independent variables. Higher-order interactions are possible, as illustrated
by the three-way interaction in the following equation:

ŷ = b_{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3}
+ b_{4}X_{1}X_{2} + b_{5}X_{1}X_{3} + b_{6}X_{2}X_{3}
+ b_{7}X_{1}X_{2}X_{3}

Analysts usually steer clear of higher-order interactions, like X_{1}X_{2}X_{3}, since they can be
hard to interpret.

## Interaction Plots

An interaction plot is a line graph that reveals the presence or absence of interactions among independent variables. To create an interaction plot, do the following:

- Show the dependent variable on the vertical axis (i.e., the Y axis); and an independent variable, on the horizontal axis (i.e., the X axis).
- Plot mean scores on the dependent variable separately for each level of a potential interacting variable.
- Connect the mean scores, producing separate lines for each level of the interacting variable

To understand potential interaction effects, compare the lines from the interaction plot:

- If the lines are parallel, there is no interaction.
- If the lines are not parallel, there is an interaction.

For example, suppose researchers develop a drug to treat anxiety. The dependent variable is anxiety (plotted on the Y axis). The independent variable is dose (plotted on the X axis). Researchers might hypothesize an interaction effect, based on gender. To visualize the potential interaction, they would plot mean anxiety score by gender for each dose and connect the means with lines, as shown below:

In the plot above, the lines are parallel. This suggests no intereaction effect, based on gender. The drug has the same effect on men as on women. For both men and women, 1 mg of drug lowers anxiety level by 0.2 units.

Suppose, however, the interaction plot looked like this:

Here, the lines are not parallel. The line for women is steeper. This suggests a possible interaction effect, based on gender. The plot tells us that the drug reduces anxiety more effectively for women than for men. But is the reduction significant? To answer that question, we need to conduct a statistical test.

## Regression Analysis With Interaction Terms

In this section, we work through two problems to compare regression analysis with and without interaction terms. With each problem, the goal is to examine effects of drug dosage and gender on anxiety levels.

To conduct the analyses, we will use following data from eight subjects:

Anxiety | Dose, mg | Gender | DG |
---|---|---|---|

90 | 0 | 0 | 0 |

88 | 20 | 0 | 0 |

85 | 40 | 0 | 0 |

83 | 60 | 0 | 0 |

88 | 0 | 1 | 0 |

72 | 20 | 1 | 20 |

62 | 40 | 1 | 40 |

43 | 60 | 1 | 60 |

In the table, notice that we've expressed gender as a dummy variable, where 1 represents females and 0 represents non-females (in this case, males). Notice also that the variable in the fourth column (DG) is an interaction term, with a value equal to the product of dose times gender.

### Without Interaction

First, let's ignore the interaction term. When we regress dose and gender against anxiety, we get the following regression table.

We see that both dose and gender are statistically significant at the 0.05 level. And, with further analysis, we find that the coefficient of multiple determination is a respectable 0.80.

### With Interaction

Now, let's include the interaction term in our analysis. When we regress dose, gender, and the dose-gender interaction against anxiety, we get the following regression table.

We see that the interaction between dose and gender is statistically significant at the 0.001 level. When we examine the main effects, we see that dose is statistically significant, but gender isn't. And finally, the coefficient of multiple determination is 0.99.

### How to Interpret Results

Typically, when a regression equation includes an interaction term, the first question you ask is: Does the interaction term contribute in a meaningful way to the explanatory power of the equation? You can answer that question by:

- Assessing the statistical significance of the interaction term.
- Comparing the coefficient of determination with and without the interaction term.

If the interaction term is statistically significant, the interaction term is probably important. And if the coefficient of determination is also much bigger with the interaction term, it is definitely important. If neither of these outcomes are observed, the interaction term can be removed from the regression equation.

Results from our sample problem are summarized in the table below:

Analytical output | Without interaction | With interaction |
---|---|---|

Dose, p-value | 0.028 | 0.044 |

Gender, p-value | 0.022 | 0.392 |

DG, p-value | NA | 0.000 |

R^{2} |
0.80 | 0.99 |

The interaction term is statistically significant (p = 0.000), and R^{2} is much bigger with the interaction
term than without it (0.99 versus 0.80). Therefore, we conclude for this problem that the interaction
term contributes in a meaningful way to the predictive ability of the regression equation.

When the interaction term is statistically significant, there's good news and bad news.

- First, the good news. A significant interaction term means a better fit to the data, and better predictions from the regression equation.
- Now, the bad news. A significant interaction term means uncertainty about the relative importance of main effects.

If your goal is to understand the relative importance of individual predictors, that goal will be harder to achieve when interaction effects are significant. When an interaction effect exists, the effect of one independent variable depends on the value(s) of one or more other independent variables.

For example, consider the interaction plot for our sample problem.

For males, drug dosage has a minimal effect on anxiety; but for females, the effect is dramatic. The effect of drug dose cannot be understood without accounting for the gender of the person receiving the medication.

**Bottom line:** When an interaction effect is significant, do not try to interpret the importance of
main effects in isolation.

## Test Your Understanding

**Problem 1**

Consider the interaction plot shown below. The plot shows the effect of income and community type (urban, suburban, rural) on attitudes toward gun control.

What conclusions would you draw from the plot?

I. The effect of income varies, depending on community type.

II. The regression equation should include an interaction term.

III. The regression equation should not include an interaction term.

(A) I only.

(B) II only.

(C) III only.

(D) I and II.

(E) I and III.

**Solution**

The correct answer is (D). Attitudes toward gun control are more strongly affected by income in urban areas than in rural areas. So the effect of income varies, depending on community type. When the effect of one independent variable depends on the level of another independent variable, we have an interaction; and an interaction term should be included in the regression equation.