AP* Statistics Tutorial: Measures of Variability
Statisticians use summary measures to describe the amount of variability
or spread in a set of data. The most common measures of variability
are the range, the interquartile range (IQR), variance, and
standard deviation.
The Range
The range is the difference between the largest
and smallest values in a
set of values.
For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11.
For this set of numbers, the range would be 11 - 1 or 10.
The Interquartile Range (IQR)
The interquartile range (IQR) is the difference between
the largest and smallest values in the middle 50% of a set of
data.
To compute an interquartile range from a set of data, first remove
observations from the lower quartile. Then, remove observations
from the upper quartile. Then, from the remaining observations,
compute the difference between the largest and smallest values.
For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11.
After we remove observations from the lower and upper quartiles,
we are left with: 4, 5, 5, 6. The interquartile range (IQR) would
be 6 - 4 = 2.
The Variance
In a
population,
variance is the average squared
deviation from the population mean, as defined by the following formula:
σ2 = Σ ( Xi - μ )2 / N
where σ2 is the population variance, μ
is the population mean, Xi is the ith element
from the population, and N is the number of elements in the population.
The variance of a
sample,
is defined by slightly different formula, and uses a slightly different
notation:
s2 = Σ ( xi
- x )2 / ( n - 1 )
where s2 is the sample variance, x is
the sample mean, xi is the ith element from the sample, and n
is the number of elements in the sample. Using this formula, the sample
variance can be considered an unbiased estimate of the true population
variance. Therefore, if you need to estimate an unknown population variance,
based on data from a sample, this is the formula to use.
The Standard Deviation
The standard deviation is the square root of the
variance. Thus, the standard deviation of a population is:
σ = sqrt [ σ2 ] = sqrt [ Σ ( Xi - μ )2 / N ]
where σ is the population standard deviation,
σ2 is the population variance, μ
is the population mean, Xi is the ith element
from the population, and N is the number of elements in the population.
And the standard deviation of a sample is:
s = sqrt [ s2 ] = sqrt [ Σ ( xi
- x )2 / ( n - 1 ) ]
where s is the sample standard deviation,
s2 is the sample variance, x is
the sample mean, xi is the ith element from the sample, and n
is the number of elements in the sample.
Effect of Changing Units
Sometimes, researchers change units (minutes to hours, feet to meters, etc.).
Here is how measures of variability are affected when we change units.
- If you add a constant to every value, the distance between values does
not change. As a result, all of the measures of variability (range,
interquartile range, standard deviation, and variance) remain the
same.
- On the other hand, suppose you multiply every value by a constant. This
has the effect of multiplying the range, interquartile range (IQR),
and standard deviation by that constant. It has an even greater
effect on the variance. It multiplies the variance by the square
of the constant.
Test Your Understanding of This Lesson
Problem 1
A population consists of four observations: {1, 3, 5, 7}. What is the variance?
(A) 2
(B) 4
(C) 5
(D) 6
(E) None of the above
Solution
The correct answer is (C). First, we need to compute the population mean.
μ = ( 1 + 3 + 5 + 7 ) / 4 = 4
Then we plug all of the known values into formula for the variance of a
population, as shown below:
σ2 = Σ ( Xi - μ )2 / N
σ2 = [ ( 1 - 4 )2
+ ( 3 - 4 )2 + ( 5 - 4 )2 + ( 7 - 4 )2 ] / 4
σ2 = [ ( -3 )2 +
( -1 )2 + ( 1 )2 + ( 3 )2 ] / 4
σ2 = [ 9 + 1 + 1 + 9 ] / 4
= 20 / 4 = 5
Problem 2
A sample consists of four observations: {1, 3, 5, 7}. What is the
standard deviation?
(A) 2
(B) 2.58
(C) 6
(D) 6.67
(E) None of the above
Solution
The correct answer is (B). First, we need to compute the sample mean.
x = ( 1 + 3 + 5 + 7 ) / 4 = 4
Then we plug all of the known values into formula for the standard deviation of a
sample, as shown below:
s = sqrt [ Σ ( xi
- x )2 / ( n - 1 ) ]
s = sqrt { [ ( 1 - 4 )2 + ( 3 - 4 )2
+ ( 5 - 4 )2 + ( 7 - 4 )2 ] / ( 4 - 1 ) }
s = sqrt { [ ( -3 )2 + ( -1 )2 +
( 1 )2 + ( 3 )2 ] / 3 }
s = sqrt { [ 9 + 1 + 1 + 9 ] / 3 } = sqrt (20 / 3) = sqrt ( 6.67 ) = 2.58
|