AP Statistics Tutorial: Difference Between Means
Many statistical applications involve comparisons between two
independent sample means.
Difference Between Means: Theory
Suppose we have two
populations
with means equal to μ1 and μ2. Suppose
further that we take all possible
samples
of size n1 and n2. And finally, suppose that the
following assumptions are valid.
- The size of each population is large relative to the sample
drawn from the population. That is, N1 is large relative
to n1, and N2 is large relative
to n2. (In this context, populations are considered to
be large if they are at least 10 times bigger than their sample.)
- The samples are
independent;
that is, observations in population 1 are not affected by observations
in population 2, and vice versa.
- The set of
differences between sample means are normally
distributed. This will be true if each population is normal
or if the sample sizes are large. (Based on the
central limit theorem, sample sizes of 40 are
large enough).
Given these assumptions, we know the following.
It is straightforward to derive the last bullet point, based on material
covered in previous lessons. The derivation starts with a recognition
that the variance of the difference between independent random variables
is equal to the sum of the individual variances. Thus,
σ2d =
σ2
(x1 -
x2) =
σ2
x1 +
σ2
x2
If the populations N1 and N2 are both large
relative to n1 and n2, respectively,
then
σ2
x1 =
σ21 / n1
And
σ2
x2 =
σ22 / n2
Therefore,
σd2 =
σ12 / n1 +
σ22 / n2
And
σd =
sqrt( σ12 / n1 +
σ22 / n2 )
Difference Between Means: Sample Problem
In this section, we work through a sample problem to show how to apply
the theory presented above. The approach presented is valid
whenever we need to analyze
differences between independent sample means. In this example,
differences between means are modeled with a normal distribution;
so we use Stat Trek's
Normal Distribution Calculator
to compute probabilities. The Calculator is free.
Normal Distribution Calculator
The normal calculator solves common statistical problems, based on the normal
distribution. The calculator computes cumulative probabilities, based on three
simple inputs. Simple instructions guide you quickly to an accurate solution.
If anything is unclear, frequently-asked questions and sample
problems provide straightforward explanations. Access this free calculator
from the Stat Tables tab, which appears in the header of every Stat Trek web page.
Problem 1
For boys, the average number of absences in the first grade
is 15 with a standard deviation of 7; for girls, the average
number of absences is 10 with a standard deviation of 6.
In a nationwide survey, suppose 100 boys and 50 girls are
sampled. What is the probability that the male sample
will have at most three more days of absences than
the female sample?
(A) 0.025
(B) 0.035
(C) 0.045
(D) 0.055
(E) None of the above
Solution
The correct answer is B. The solution involves four steps.
- Find the mean difference (male absences minus female absences)
in the population.
μd = μ1 - μ2 =
15 - 10 = 5
- Find the standard deviation of the difference.
σd =
sqrt( σ12 / n1 +
σ22 / n2 )
σd =
sqrt(72/100 + 62/50) =
sqrt(49/100 + 36/50) = sqrt(0.49 + .72) = sqrt(1.21) = 1.1
- Find the
z-score
that is produced when boys have three more days of absences than
girls. When boys have three more days of absences, the number of
male absences minus female absences is three.
And the associated z-score
is
z = (x - μ)/σ = (3 - 5)/1.1 = -2/1.1 = -1.818
- Find the probability. This problem requires us to find the
probability that the average number of absences in the boy sample
minus the average number of absences in the girl sample
is less than 3.
To find this probability, we enter the z-score (-1.818) into
Stat Trek's
Normal Distribution Calculator.
We find that the probability of a z-score being -1.818 or less
is about 0.035.
Therefore, the probability that the difference between samples will be
no more than 3 days is 0.035.
|