### Linear Regression

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

### Linear Regession: Table of Contents

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

# Introduction to Multiple Regression

Simple linear regression is a technique for predicting the value of a dependent variable, based on the value of a single independent variable. Sometimes, you only need one relevant independent variable to make an accurate prediction.

Often, however, the prediction is better when you use two or more independent variables. Multiple regression is a technique for predicting the value of a dependent variable, based on the values of two or more independent variables.

## The Regression Equation

This is a tutorial about *linear* regression, so our focus is on *linear* relationships between variables. The
regression equation that expresses the linear relationships between a single dependent variable and one or
more independent variables is:

ŷ = b_{0} + b_{1}x_{1} + b_{2}x_{2} + … +
b_{k-1}x_{k-1} + b_{k}x_{k}

In this equation, ŷ is the *predicted* value of the dependent variable. Values of the *k* independent variables
are denoted by x_{1}, x_{2}, x_{3}, … , x_{k}.

And finally, we have the *b*'s - b_{0}, b_{1}, b_{2}, … , b_{k}. The b's are constants,
called regression coefficients. Values are assigned to the *b*'s based on the principle of least squares.

## What is the Principle of Least Squares?

In multiple regression, the deviation of the actual value for a dependent variable from its predicted value is called the
residual. The residual (e) for a single observation *i* is:

e_{i} = y_{i} - ŷ_{i} =
y_{i} - ( b_{0} + b_{1}x_{1i} + b_{2}x_{2i} + … + b_{k}x_{ki} )

Assume that the set of data consists of *n* observations. The principle of least squares requires that the sum of squared residuals
for all *n* observations be minimized. That is, we want the following value to be as small as possible:

Σ [ y_{i} - ( b_{0} + b_{1}x_{1i} + b_{2}x_{2i} + … + b_{k}x_{ki} ]^{2}

Regression analysis requires that the values of b_{0}, b_{1}, … , b_{k} be defined to minimize
the sum of the squared residuals. When we assign values to regression coefficients in this way, we are following the
principle of least squares.

## Normal Equations for Simple Regression

Finding the right values for regression coefficients (i.e., values that satisfy a least squares criterion) involves solving a set of linear equations. These equations can be derived using calculus, and they are called normal equations.

To illustrate the use of normal equations, let's look at simple linear regression - regression with one dependent variable (y) and one independent variable (x). With simple linear regression, the regression equation is:

ŷ = b_{0} + b_{1}x

The normal equations for simple linear regression are:

Σ y_{i} = nb_{0} + b_{1}( Σx_{i} )

Σ x_{i}y_{i} = b_{0}( Σx_{i} ) + b_{1}( Σx_{i}^{2} )

Here, we have two equations with two unknowns. The unknowns are the regression coefficients b_{0} and b_{1}. Using ordinary
algebra, we can solve for b_{0} and b_{1}. The result is:

b_{1} = Σ [ (x_{i} - x)(y_{i} - y) ] / Σ [ (x_{i} - x)^{2}]

b_{0} = y - b_{1} * x

where x is the mean x score, and y is the mean y score. Note that these are the same equations that we presented in a previous lesson, when we introduced the topic of simple linear regression.

The use of normal equations to assign values to regression coefficients becomes more complicated when there are two or more independent variables. We'll tackle that challenge in the next lesson.

## Test Your Understanding

**Problem 1**

Which of the following statements are true?

I. A regression equation with *k* independent variables has *k* regression coefficients.

II. Regression coefficients (b_{o}, b_{1}, b_{2}, etc.) are variables in the regression equation.

III. The principle of least squares calls for minimizing the sum of the squared residuals.

(A) I only.

(B) II only.

(C) III only.

(D) All of the above.

(E) None of the above.

**Solution**

The correct answer is (C). The principle of least squares defines regression coefficients that minimize the sum of the squared residuals.
A regression equation with *k* independent variables has *k* + 1 regression coefficients. For example, if there were two
independent variables, there would be three regression coefficients - b_{o}, b_{1}, and b_{2}.
And finally, regression coefficients are constants - not variables.