Teach yourself statistics

Teach yourself statistics

Regression Slope: Confidence Interval

This lesson describes how to construct a confidence interval around the slope of a regression line. We focus on the equation for simple linear regression, which is:

ŷ = b₀ + b₁x

where b₀ is a constant, b₁ is the slope (also called the regression coefficient), x is the value of the independent variable, and ŷ is the predicted value of the dependent variable.

Estimation Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

Linearity. The relationshp between the independent variable X and the dependent variable Y should be linear. To check this, make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. (In a a previous lesson, we explained how to check linearity with a scatterplot.)
Homoscedasticity. The variance of residuals should be constant across all levels of the independent variable. To check for homoscedasticity, plot residuals against the independent variable. If the spread is roughly constant, homoscedasticity holds. (Bartlett's test and Hartley's Fmax test can also be used to test for homogeneity of variance; but these tests are not part of the AP Statistics curriculum, and they will not appear on the AP Statistics test.)
Independence. Residuals should be independent of each other. The value of one residual should not provide any information about the value of another. Plot residuals against time or observation order. If the residuals fluctuate randomly around the zero line with no clear pattern, they are likely independent. If they show a trend (e.g., increasing or decreasing) or cyclical behavior, this indicates dependence.
Normality. The residuals should be normally distributed, especially for small sample sizes. Plot a histogram of the residuals and check for a bell-shaped distribution. Or produce a normal probability plot. If points on the plot will fall approximately along a straight line, the residuals are normally distributed. (This assumption is less critical when the sample size is large.)

Note: Before you attempt to construct a confidence interval around the slope of a regression line, make sure the above requirements are met. If you take the AP Statistics exam, you may be required to state these requirements.

The Variability of the Slope Estimate

To construct a confidence interval for the slope of the regression line, we need to know the standard error of the sampling distribution of the slope. Many statistical software packages and some graphing calculators provide the standard error of the slope as a regression analysis output. The table below shows hypothetical output for the following regression equation: ŷ = 76 + 35x .

Predictor	Coef	SE Coef	T	P
Constant	76	30	2.53	0.01
X	35	20	1.75	0.04

In the output above, the standard error of the slope (shaded in gray) is equal to 20. In this example, the standard error is referred to as "SE Coeff". However, other software packages might use a different label for the standard error. It might be "StDev", "SE", "Std Dev", or something else.

If you need to calculate the standard error of the slope (SE) by hand, use the following formula:

SE = sqrt [ Σ(y_i - ŷ_i)² / (n - 2) ] / sqrt [ Σ(x_i - x)² ]

where y_i is the value of the dependent variable for observation i, ŷ_i is estimated value of the dependent variable for observation i, x_i is the observed value of the independent variable for observation i, x is the mean of the independent variable, and n is the number of observations.

How to Find the Confidence Interval for the Slope of a Regression Line

Previously, we described how to construct confidence intervals . For convenience, we repeat the five steps below.

Choose the confidence level. The confidence level describes the uncertainty of a sampling plan. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
Compute the standard error. Many statistical software packages and some graphing calculators provide the standard error of the slope as a regression analysis output. Use that, if you can.
If you need to calculate the standard error (SE) of the slope by hand, use the following formula:

SE = sqrt [ Σ(y_i - ŷ_i)² / (n - 2) ] / sqrt [ Σ(x_i - x)² ]
Find the critical value. When calculating the margin of error for a regression slope, use a t-score for the critical value, with degrees of freedom (df) equal to n - 2. Here's how to find the critical value t-score:
- Compute alpha (α): α = 1 - (confidence level / 100)
- Find the critical probability (p*): p* = 1 - α/2
- Find the degrees of freedom (df): df = n - 2
- Find the t-score having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).
To find the critical t-score, use an online calculator (e.g.,Stat Trek's t Distribution Calculator), a graphing calculator, or a t-distribution statistical table (found in the appendix of most introductory statistics texts).
Find the margin of error. To compute the margin of error, use the following equation:
Margin of error = Critical value * Standard error of slope
Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval is defined by the following equation:
Confidence interval = Slope ± Margin of error

In the next section, we work through a problem that shows how to use this approach to construct a confidence interval for the slope of a regression line. Note that this approach is used for simple linear regression (one independent variable and one dependent variable).

Test Your Understanding

Problem 1

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Regression equation: Annual bill = 0.55 * Home size + 15
Predictor	Coef	SE Coef	T	P
Constant	15	3	5.0	0.00
Home size	0.55	0.24	2.29	0.01

What is the 99% confidence interval for the slope of the regression line?

(A) 0.25 to 0.85
(B) 0.02 to 1.08
(C) -0.08 to 1.18
(D) 0.20 to 1.30
(E) 0.30 to 1.40

Solution

The correct answer is (C). Use the following five-step approach to construct a confidence interval.

Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.
Find the standard error. The standard error is given in the regression output. It is 0.24.
Find the critical value. The critical value is a factor used to compute the margin of error. With simple linear regression, to compute a confidence interval for the slope, the critical value is a t-score with degrees of freedom equal to n - 2. To find the critical value, we take these steps.
- Compute alpha (α):
  α = 1 - (confidence level / 100)
  
  α = 1 - 99/100 = 0.01
- Find the critical probability (p*):
  p* = 1 - α/2 = 1 - 0.01/2 = 0.995
- Find the degrees of freedom (df):
  df = n - 2 = 101 - 2 = 99.
- The critical value is the t statistic having 99 degrees of freedom and a cumulative probability equal to 0.995. From the t Distribution Calculator, we find that the critical value is about 2.63.
Compute margin of error (ME):
ME = critical value * standard error

ME = 2.63 * 0.24 = 0.63
Specify the confidence interval (CI). The range of the confidence interval is defined by the sample statistic + margin of error. Here, the sample statistic is the regression slope, 0.55; so the confidence interval is:
CI = Slope ± ME

CI = 0.55 ± 0.63
And the uncertainty is denoted by the confidence level, which is 99%.

Therefore, the 99% confidence interval for this sample is 0.55 ± 0.63, which is -0.08 to 1.18 If you use shorthand notation, you could describe this confidence interval as (-0.08, 1.18).

If we replicated the same study multiple times with different random samples and computed a confidence interval for each sample, we would expect 99% of the confidence intervals to contain the true slope of the regression line.

Requirements for Regression

If you use simple linear regression on the AP Statistics exam, you may want to list the conditions required for regression analysis and/or describe steps you would take to verify that each condition was met:

Linearity. The relationshp between the independent variable X and the dependent variable Y should be linear. To check this, make sure that the XY scatterplot is linear and that the residual plot shows a random pattern. (In a a previous lesson, we explained how to check linearity with a scatterplot.)
Homoscedasticity. The variance of residuals should be constant across all levels of the independent variable. To check for homoscedasticity, plot residuals against the independent variable. If the spread is roughly constant, homoscedasticity holds. (Bartlett's test and Hartley's Fmax test can also be used to test for homogeneity of variance; but these tests are not part of the AP Statistics curriculum, and they will not appear on the AP Statistics exam.)
Independence. Residuals should be independent of each other. The value of one residual should not provide any information about the value of another. Plot residuals against time or observation order. If the residuals fluctuate randomly around the zero line with no clear pattern, they are likely independent. If they show a trend (e.g., increasing or decreasing) or cyclical behavior, this indicates dependence.
Normality. The residuals should be normally distributed, especially with small samples. Plot a histogram of the residuals and check for a bell-shaped distribution. Or produce a normal probability plot. If points on the plot will fall approximately along a straight line, the residuals are normally distributed. (The normality assumption is less critical when the sample size is large.)

Last lesson Next lesson