Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y. The test, called a linear regression t-test, focuses on the slope of the simple linear regression line

Y = Β₀ + Β₁X

where Β₀ is a constant (the y-intercept), Β₁ is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variable.

Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

Linearity. The relationshp between the independent variable X and the dependent variable Y should be linear. To check this, make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. (In a a previous lesson, we explained how to check linearity with a scatterplot.)
Homoscedasticity. The variance of residuals should be constant across all levels of the independent variable. To check for homoscedasticity, plot residuals against the independent variable. If the spread is roughly constant, homoscedasticity holds. (Bartlett's test and Hartley's Fmax test can also be used to test for homogeneity of variance; but these tests are not part of the AP Statistics curriculum, and they will not appear on the AP Statistics test.)
Independence. Residuals should be independent of each other. The value of one residual should not provide any information about the value of another. Plot residuals against time or observation order. If the residuals fluctuate randomly around the zero line with no clear pattern, they are likely independent. If they show a trend (e.g., increasing or decreasing) or cyclical behavior, this indicates dependence.
Normality. The residuals should be normally distributed, especially for small sample sizes. Plot a histogram of the residuals and check for a bell-shaped distribution. Or produce a normal probability plot. If points on the plot will fall approximately along a straight line, the residuals are normally distributed. (This assumption is less critical when the sample size is large.)

Prior to analysis, you would want to verify that each condition was satisfied. If you take the AP Statistics exam, you may be required to state these requirements.

General Procedure for Hypothesis Testing

To test any hypothesis, the same five-step procedure is used: (1) state the hypotheses, (2) choose the significance level, (3) compute the test statistic, (4) find the P-value, and (5) interpret results. Here, we apply the general procedure to a linear regression t-test.

If you have access to statistical software, everything you need to test the statistical significance of regression slope (e.g., the slope value, standard error, t-score test statistic, and P-value) is provided as output in a standard regression table. Below, we show you how to read those values from a regression table. And, if you don't have access to statistical software, we show you how to compute each value by hand.

State the Hypotheses

If there is a significant linear relationship between the independent variable X and the dependent variable Y, the slope (Β₁) will not equal zero.

H₀: Β₁ = 0

H_a: Β₁ ≠ 0

The null hypothesis (H₀) states that the slope is equal to zero, and the alternative hypothesis (H_a) states that the slope is not equal to zero. If the null hypothesis is true, then changes in the independent variable do not impact the dependent variable.

Choose the Significance Level

The significance level is the probability of rejecting the null hypothesis when it is actually true. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10.

Compute the Test Statistic

The test statistic will be a t-score, computed for a linear regression t-test. If you have regression output from statistical software, you can find the t-score test statistic directly in a standard regression table. You only need to compute degrees of freedom (df) for the t-score. If you don't have statistical software, you can compute a value for the t-score test statistic (t) from the value for the regression coefficient (b₁) and the standard error (SE) of the regression coefficient, using the equations below.

Degrees of Freedom

The t-score test statistic follows a t-distribution, with degrees of freedom equal to df. In the general case, degrees of freedom is:

df = n - k - 1

For simple linear regression (one dependent variable and one independent variable), degrees of freedom reduces to:

df = n - 2

where n is sample size, and k is the number of independent variables.

Regression Coefficient

With simple linear regression, the regression coefficient is called the slope. Many statistical software packages and some graphing calculators provide the slope as a regression analysis output. The table below shows regression output for the following regression equation: y = 76 + 35x .

Predictor	Coef	SE Coef	T	P
Constant	76	30	2.53	0.01
X	35	20	1.75	0.04

Regression output shows a value for the regression coefficient in two places. From the equation, you can see that the regression coefficient for the independent variable is 35. Or, from the regression output, you can see that the regression coefficient (shaded in gray) for the independent variable is 35.

If you need to compute the regression coefficient (b₁) by hand, use this formula:

b₁ = Σ [ (x_i - x)(y_i - y) ] / Σ [ (x_i - x)²]

where x_i is is the value of the independent variable for subject i, y_i is is the value of the independent variable for subject i, x is the mean score for the independent variable, and y is the mean score for the dependent variable.

Standard Error

The standard error of the slope is also provided as a regression analysis output. The table below shows same regression output for the same regression equation: y = 76 + 35x .

Predictor	Coef	SE Coef	T	P
Constant	76	30	2.53	0.01
X	35	20	1.75	0.04

You will find a value for the standard error of the regression coeffient in the table. Here, the standard error of the slope (shaded in gray) is 20. In this example, the standard error is referred to as "SE Coeff". However, other software packages might use a different label for the standard error. It might be "StDev", "SE", "Std Dev", or something else.

If you need to compute the standard error of the slope (SE) by hand, use the following formula:

SE = s_b₁ = sqrt [ Σ(y_i - ŷ_i)² / (n - 2) ] / sqrt [ Σ(x_i - x)² ]

where y_i is the value of the dependent variable for observation i, ŷ_i is estimated value of the dependent variable for observation i, x_i is the observed value of the independent variable for observation i, x is the mean of the independent variable, and n is the number of observations.

Test Statistic

The t-score test statistic is also provided as a regression analysis output. The table below shows the t-score test statistic (shaded in gray) for a two-tailed hypothesis test.

Predictor	Coef	SE Coef	T	P
Constant	76	30	2.53	0.01
X	35	20	1.75	0.04

If you need to compute a t-score test statistic (t) by hand, use the following equation.

t = b₁ / SE

where b₁ is the regression coefficient (i.e., the slope of the regression line), and SE is the standard error of the slope.

Find the P-Value

The P-value is the probability of observing a sample statistic as extreme as the test statistic. The P-value probability appears in the P column of the regression output table. By default, most software programs show P-value probabilities for a two-tailed hypothesis test. The P-value probability for a one-tailed hypothesis test would be half the probability shown for a two-tailed test. The table below shows the P-value (shaded in gray) for a two-tailed hypothesis test.

Predictor	Coef	SE Coef	T	P
Constant	76	30	2.53	0.01
X	35	20	1.75	0.04

If you don't have a regression output table, you can find the P-value from a t-distribution table or a t-distribution calculator. First, find the t-score test statistic and degrees of freedom as described above. For a one-tailed test, the P-value is the probability of seeing a sample statistic greater than the absolute value of the test statistic. For a two-tailed test, the P-value is twice the probability of seeing a sample statistic greater than the absolute value. (See the Problem 1 in the next section for an example of how to find the P-value from a t-distribution calculator.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. This involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

Problem 1. Two-Tailed Test

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Regression equation: Annual bill = 0.55 * Home size + 15
Predictor	Coef	SE Coef	T	P
Constant	15	3	5.0	0.00
Home size	0.55	0.24	2.29	0.02

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance. Assume that all the requirements for simple linear regression are satisfied.

Solution

The solution to this problem takes five steps: (1) state the hypotheses, (2) choose the significance level, (3) compute the test statistic, (4) find the P-value, and (5) interpret results. Here, we apply the general procedure to a linear regression t-test. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
H₀: The slope of the regression line is equal to zero.

H_a: The slope of the regression line is not equal to zero.
This is a two-tailed test. If the relationship between home size and electric bill is significant, the slope will be significantly bigger than zero or significantly smaller than zero.
Choose the significance level. For this analysis, the significance level is 0.05.

Compute the test statistic. Since we were provided with a regression output table, we can read the test statistic (shaded in gray) directly from the table. The t-score test statistic is 2.29.

Predictor	Coef	SE Coef	T	P
Constant	15	3	5.0	0.00
Home size	0.55	0.24	2.29	0.02

And finally, we compute the degrees of freedom (df) for the test statistic.

df = n - 2 = 101 - 2 = 99

where n is the number of observations in the sample.

Find the P-value. The regression output shows that the P-value (shaded in gray) for a two-tailed hypothesis slope test is 0.02.

Predictor	Coef	SE Coef	T	P
Constant	15	3	5.0	0.00
Home size	0.55	0.24	2.29	0.02

Here is how that P-value was calculated. Since this is a two-tailed test, the P-value is the probability that a t-score having 99 degrees of freedom is more extreme than the absolute value of the t-score test statistic (i.e., greater than 2.29 or less than -2.29). We use the t Distribution Calculator to find P(t > 2.29) is about 0.012 and P(t < -2.29) is about 0.012. Therefore, the P-value is 0.012 + 0.012 or 0.024.

Interpret results. Since the P-value (0.02) is less than the significance level (0.05), we cannot accept the null hypothesis. Therefore, we conclude that there is a relationship between home size and electric bill.

Requirements for Regression

If you use simple linear regression on the AP Statistics exam, you may want to list the conditions required for regression analysis and/or describe steps you would take to verify that each condition was met:

Linearity. The relationshp between the independent variable X and the dependent variable Y should be linear. To check this, make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. (In a a previous lesson, we explained how to check linearity with a scatterplot.)
Homoscedasticity. The variance of residuals should be constant across all levels of the independent variable. To check for homoscedasticity, plot residuals against the independent variable. If the spread is roughly constant, homoscedasticity holds. (Bartlett's test and Hartley's Fmax test can also be used to test for homogeneity of variance; but these tests are not part of the AP Statistics curriculum, and they will not appear on the AP Statistics exam.)
Independence. Residuals should be independent of each other. The value of one residual should not provide any information about the value of another. Plot residuals against time or observation order. If the residuals fluctuate randomly around the zero line with no clear pattern, they are likely independent. If they show a trend (e.g., increasing or decreasing) or cyclical behavior, this indicates dependence.
Normality. The residuals should be normally distributed, especially with small samples. Plot a histogram of the residuals and check for a bell-shaped distribution. Or produce a normal probability plot. If points on the plot will fall approximately along a straight line, the residuals are normally distributed. (The normality assumption is less critical when the sample size is large.)

Problem 2. One-Tailed Test

A teacher wants to determine whether hours spent on a smartphone have an effect on student exam scores. She hypothesizes that there is an inverse effect - more hours lead to lower grades. The teacher collects data from 21 students and runs a simple linear regression. Here is the output produced by Excel:

Regression equation: Exam score = -0.46 * hours + 113
Predictor	Coef	SE Coef	T	P
Constant	113	18,6	6.08	0.00
Hours	-0.46	0.24	-1.9	0.07

Is there a significant linear relationship between exam scores and hours on the phone? Use a 0.05 level of significance. Assume that all the requirements for simple linear regression are satisfied.

Hint: By default, the regression output from Excel assumes a two-tailed hypothesis test.

Solution

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
H₀: β = 0

H_a: β < 0
This is a one-tailed test. If the relationship between exam scores and phone use is significant, the slope (β) will be significantly smaller than zero.
Choose the significance level. For this analysis, the significance level is 0.05.

Compute the test statistic. Since we were provided with a regression output table, we can read the test statistic (shaded in gray) directly from the table. The t-score test statistic is -1.9.

Predictor	Coef	SE Coef	T	P
Constant	113	18.6	6.08	0.00
Hours	-0.46	0.24	-1.9	0.073

And finally, we compute the degrees of freedom (df) for the test statistic.

df = n - 2 = 21 - 2 = 19

where n is the number of observations in the sample.

Note: If we did not have the regression table generated by Excel, we would use the test statistic and degrees of freedom from Step 3 to find the P-value in Step 4. However, a P-value is is included in the table, so we could skip Step 3 for this problem.

Find the P-value. By default, Excel computes a P-value for a two-tailed hypothesis test. The regression output shows that the P-value (shaded in gray) for a two-tailed hypothesis slope test is 0.073.

Predictor	Coef	SE Coef	T	P
Constant	113	18.6	6.08	0.00
Hours	-0.46	0.24	-1.9	0.073

To obtain the P-value for a one-tailed test, we divide the two-tailed P-value by 2.

P_one-tailed = P_two-tailed / 2 = 0.073 / 2 = 0.036

Interpret results. Since the P-value (0.036) is less than the significance level (0.05), we cannot accept the null hypothesis. Therefore, we conclude that there is an inverse relationship between exam scores and phone use. Students who spend more time on the phone tend to have lower exam scores.

Last lesson