Linear Regression Example
In this lesson, we apply regression analysis to some
fictitious data, and we show how to interpret the results of our analysis.
Note: Regression computations are usually handled by a software package or a
graphing calculator. For this
example, however, we will do the computations "manually", since
the gory details have educational value.
Problem Statement
Last year, five randomly selected students took a math aptitude test
before they began their statistics course. The Statistics
Department has three questions.
 How well does the regression equation fit the data?
How to Find the Regression Equation
In the table below, the x_{i} column shows scores on the
aptitude test. Similarly, the y_{i} column shows statistics
grades. The last two columns show deviations scores  the difference between the
student's score and the average score on each test. The last two rows show sums and mean
scores that we will use to conduct the regression analysis.
Student 
x_{i} 
y_{i} 
(x_{i}x) 
(y_{i}y) 
1 
95 
85 
17 
8 
2 
85 
95 
7 
18 
3 
80 
70 
2 
7 
4 
70 
65 
8 
12 
5 
60 
70 
18 
7 
Sum 
390 
385 


Mean 
78 
77 


And for each student, we also need to compute the squares of the deviation scores (the last two columns in the table below).
Student 
x_{i} 
y_{i} 
(x_{i}x)^{2} 
(y_{i}y)^{2} 
1 
95 
85 
289 
64 
2 
85 
95 
49 
324 
3 
80 
70 
4 
49 
4 
70 
65 
64 
144 
5 
60 
70 
324 
49 
Sum 
390 
385 
730 
630 
Mean 
78 
77 


And finally, for each student, we need to compute the product of the
deviation scores.
Student 
x_{i} 
y_{i} 
(x_{i}x)(y_{i}y) 
1 
95 
85 
136 
2 
85 
95 
126 
3 
80 
70 
14 
4 
70 
65 
96 
5 
60 
70 
126 
Sum 
390 
385 
470 
Mean 
78 
77 

The regression equation is a linear equation of the form:
ŷ = b_{0} + b_{1}x . To conduct a regression
analysis, we need to solve for b_{0} and b_{1}.
Computations are shown below. Notice that all of our inputs for the
regression analysis come from the above three tables.
First, we solve for the regression coefficient (b_{1}):
b_{1} = Σ [ (x_{i}  x)(y_{i}  y) ] / Σ [ (x_{i}  x)^{2}]
b_{1} = 470/730
b_{1} = 0.644
Once we know the value of the regression coefficient (b_{1}), we can solve for the regression slope (b_{0}):
b_{0} = y  b_{1} * x
b_{0} = 77  (0.644)(78)
b_{0} = 26.768
Therefore, the regression equation is: ŷ = 26.768 + 0.644x .
How to Use the Regression Equation
Once you have the regression equation, using it is a snap. Choose
a value for the independent variable (x), perform the
computation, and you have an estimated value (ŷ)
for the dependent variable.
In our example, the independent variable is the student's score
on the aptitude test. The dependent variable is the student's
statistics grade. If a student made an 80 on the aptitude
test, the estimated statistics grade (ŷ) would be:
ŷ = b_{0} + b_{1}x
ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80
ŷ = 26.768 + 51.52 = 78.288
Warning: When you use a regression equation,
do not use values for the independent variable that are outside
the range of values used to create the equation. That is called
extrapolation, and it can produce unreasonable
estimates.
In this example, the aptitude test scores used to create the
regression equation ranged from 60 to 95. Therefore,
only use values inside that range to estimate statistics grades.
Using values outside that range (less than 60 or greater than 95)
is problematic.
How to Find the Coefficient of Determination
Whenever you use a regression equation, you should ask how well
the equation fits the data. One way to assess fit is to check the
coefficient of determination, which can be computed from
the following formula.
R^{2} = { ( 1 / N ) * Σ [ (x_{i}  x) * (y_{i}  y) ] / (σ_{x} * σ_{y} ) }^{2}
where N is the number of
observations used to fit the model, Σ is the summation symbol,
x_{i} is the x value for observation i,
x is the mean x value,
y_{i} is the y value for observation i,
y is the mean y value,
σ_{x} is the standard deviation of x, and
σ_{y} is the standard deviation of y.
Computations for the sample problem of this lesson are shown below. We begin by computing the standard deviation of x (σ_{x}):
σ_{x} = sqrt [ Σ ( x_{i}  x )^{2} / N ]
σ_{x} = sqrt( 730/5 ) = sqrt(146) = 12.083
Next, we find the standard deviation of y, (σ_{y}):
σ_{y} = sqrt [ Σ ( y_{i}  y )^{2} / N ]
σ_{y} = sqrt( 630/5 ) = sqrt(126) = 11.225
And finally, we compute the coefficient of determination (R^{2}):
R^{2} = { ( 1 / N ) * Σ [ (x_{i}  x) * (y_{i}  y) ] / (σ_{x} * σ_{y} ) }^{2}
R^{2} = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]^{2}
R^{2} = ( 94 / 135.632 )^{2} = ( 0.693 )^{2} = 0.48
A coefficient of determination equal to 0.48 indicates that about
48% of the variation in statistics grades (the
dependent variable) can be explained by the
relationship to math aptitude scores (the
independent variable). This would be considered a good fit
to the data, in the sense that it would substantially improve an
educator's ability to predict student performance in statistics
class.