### Linear Regression

#### Introduction

#### Regression

#### Regression Analysis

#### Statistical Inference

### Simple Linear Regression Lessons

# Residual Analysis in Regression

Because a linear regression model is not always appropriate for the data, you should assess the appropriateness of the model by defining residuals and examining residual plots.

**Note:** Your browser does not support HTML5 video. If you view this web page on a different browser
(e.g., a recent version of Edge, Chrome, Firefox, or Opera), you can watch a video treatment of this lesson.

## Residuals

The difference between the observed value of the dependent variable
(*y*) and the predicted value (*ŷ*) is called the
**residual** (*e*). Each data point has one
residual.

Residual = Observed value - Predicted value

*e* = *y* - *ŷ*

Both the sum and the mean of the residuals are equal to zero. That is,
Σ *e* = 0 and e = 0.

## Residual Plots

A **residual plot** is a graph that shows the
residuals on the vertical axis and the independent variable
on the horizontal axis. If the points in a residual plot
are randomly dispersed
around the horizontal axis, a linear regression model is
appropriate for the data; otherwise, a non-linear model is more
appropriate.

The table below shows inputs and outputs from a simple linear regression analysis.

x | y | ŷ | e |
---|---|---|---|

60 | 70 | 65.411 | 4.589 |

70 | 65 | 71.849 | -6.849 |

80 | 70 | 78.288 | -8.288 |

85 | 95 | 81.507 | 13.493 |

95 | 85 | 87.945 | -2.945 |

And the chart below displays the residual (e) and independent variable (X) as a residual plot.

The residual plot shows a fairly random pattern - the first residual is positive, the next two are negative, the fourth is positive, and the last residual is negative. This random pattern indicates that a linear model provides a decent fit to the data.

Below, the residual plots show three typical patterns. The first plot shows a random pattern, indicating a good fit for a linear model.

Random pattern

Non-random: U-shaped

Non-random: Inverted U

The other plot patterns are non-random (U-shaped and inverted U), suggesting a better fit for a non-linear model.

In the next lesson, we will work on a problem, where the residual plot shows a non-random pattern. And we will show how to "transform" the data to use a linear model with nonlinear data.

## Test Your Understanding

In the context of regression analysis, which of the following statements are true?

I. When the sum of the residuals is greater than zero, the data set is
nonlinear.

II. A random pattern of residuals supports a linear model.

III. A random pattern of residuals supports a non-linear model.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I and III

**Solution**

The correct answer is (B). A random pattern of residuals supports a linear model; a non-random pattern supports a non-linear model. The sum of the residuals is always zero, whether the data set is linear or nonlinear.

Bestsellers Handheld Calculators Updated daily | ||

1. Texas Instruments TI-84 Plus CE Lightning Graphing Calculator $150.00 $142.00 | ||

2. Texas Instruments Ti-84 plus Graphing calculator - Black $101.00 | ||

3. Texas Instruments TI-84 Plus CE Graphing Calculator, Black $150.00 $126.99 | ||

4. Texas Instruments VOY200/PWB Graphing Calculator | ||

5. HP 12CP Financial Calculator $79.99 $49.90 |

Cracking the AP Statistics Exam, 2013 Edition (College Test Preparation) $19.99 $2.36 88% off | |

See more AP Statistics study guides |

Bestsellers Statistics and Probability Updated daily | ||

1. Barron's AP Statistics, 9th Edition $18.99 $12.91 | ||

2. Statistics for People Who (Think They) Hate Statistics $82.00 $77.90 | ||

3. Naked Statistics: Stripping the Dread from the Data $16.95 $13.56 | ||

4. How to Lie with Statistics $13.95 $7.27 | ||

5. Statistics, 4th Edition $205.00 $150.62 |