### Linear Regression

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

### Linear Regession: Table of Contents

#### Introduction

#### Simple Regression

- Linear Regression
- Regression Example
- Residual Analysis
- Transformations
- Influential Points
- Slope Estimate
- Slope Test

#### Multiple Regression

# Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine
whether there is a significant linear relationship between
an independent variable *X* and a dependent variable
*Y*.

The test focuses on the slope of the regression line

Y = Β_{0} + Β_{1}X

where Β_{0} is a constant,
Β_{1} is the slope (also called the regression coefficient),
X is the value of the independent variable, and Y is the
value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

## Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

- The dependent variable
*Y*has a linear relationship to the independent variable*X*. - For each value of X, the probability distribution of Y has the same standard deviation σ.
- For any given value of X,

Previously, we described how to verify that regression requirements are met.

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

## State the Hypotheses

If there is a significant linear relationship between the independent
variable *X* and the dependent variable
*Y*, the slope will *not* equal zero.

H_{o}: Β_{1} = 0

H_{a}: Β_{1} ≠ 0

The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.

## Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

- Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
- Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero.

## Analyze Sample Data

Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.

- Standard error. Many statistical software packages and some graphing calculators
provide the
standard error of the slope as a regression analysis
output. The table below shows hypothetical output for the following
regression equation: y = 76 + 35x .
Predictor Coef SE Coef T P Constant 76 30 2.53 0.01 X 35 20 1.75 0.04 SE = s

where y_{b1}= sqrt [ Σ(y_{i}- ŷ_{i})^{2}/ (n - 2) ] / sqrt [ Σ(x_{i}- x)^{2}]_{i}is the value of the dependent variable for observation*i*, ŷ_{i}is estimated value of the dependent variable for observation*i*, x_{i}is the observed value of the independent variable for observation*i*, x is the mean of the independent variable, and n is the number of observations. - Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35.
- Degrees of freedom. For simple linear regression (one independent
and one dependent variable), the
degrees of freedom (DF) is equal to:
DF = n - 2

where n is the number of observations in the sample. - Test statistic. The test statistic is a t statistic
(t) defined by
the following equation.
t = b

where b_{1}/ SE_{1}is the slope of the sample regression line, and SE is the standard error of the slope. - P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

## Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

## Test Your Understanding

**Problem**

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Annual bill = 0.55 * Home size + 15 |
||||

Predictor | Coef | SE Coef | T | P |

Constant | 15 | 3 | 5.0 | 0.00 |

Home size | 0.55 | 0.24 | 2.29 | 0.01 |

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance.

**Solution**

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

**State the hypotheses.**The first step is to state the null hypothesis and an alternative hypothesis.H

_{o}: The slope of the regression line is equal to zero.H

If the relationship between home size and electric bill is significant, the slope will_{a}: The slope of the regression line is*not*equal to zero.*not*equal zero.**Formulate an analysis plan**. For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero.**Analyze sample data**. To apply the linear regression t-test to sample data, we require the standard error of the slope, the slope of the regression line, the degrees of freedom, the t statistic test statistic, and the P-value of the test statistic.We get the slope (b

_{1}) and the standard error (SE) from the regression output.b

_{1}= 0.55 SE = 0.24We compute the degrees of freedom and the t statistic test statistic, using the following equations.

DF = n - 2 = 101 - 2 = 99

t = b

_{1}/SE = 0.55/0.24 = 2.29where DF is the degrees of freedom, n is the number of observations in the sample, b

Based on the t statistic test statistic and the degrees of freedom, we determine the P-value. The P-value is the probability that a t statistic having 99 degrees of freedom is more extreme than 2.29. Since this is a two-tailed test, "more extreme" means greater than 2.29 or less than -2.29. We use the t Distribution Calculator to find P(t > 2.29) = 0.0121 and P(t < -2.29) = 0.0121. Therefore, the P-value is 0.0121 + 0.0121 or 0.0242._{1}is the slope of the regression line, and SE is the standard error of the slope.**Interpret results**. Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

**Note:** If you use this approach on an exam, you may also want to mention
that this approach is only appropriate when the
standard requirements for simple linear regression are satisfied.