Difference Between Means
Statistics problems often involve comparisons between two
independent sample means. This lesson explains how to compute
probabilities associated with differences between means.
Difference Between Means: Theory
Suppose we have two
populations
with means equal to μ1 and μ2. Suppose
further that we take all possible
samples
of size n1 and n2. And finally, suppose that the
following assumptions are valid.
- The set of
differences between sample means is normally
distributed. This will be true if each population is normal
or if the sample sizes are large. (Based on the
central limit theorem, sample sizes of 40 would probably be
large enough).
Given these assumptions, we know the following.
It is straightforward to derive the last bullet point, based on material
covered in previous lessons. The derivation starts with a recognition
that the variance of the difference between independent random variables
is equal to the sum of the individual variances. Thus,
σ2d =
σ2
(x1 -
x2) =
σ2
x1 +
σ2
x2
If the populations N1 and N2 are both large
relative to n1 and n2, respectively,
then
σ2
x1 =
σ21 / n1
σ2
x2 =
σ22 / n2
σd2 =
σ12 / n1 +
σ22 / n2
σd =
sqrt( σ12 / n1 +
σ22 / n2 )
Difference Between Means: Sample Problem
In this section, we work through a sample problem to show how to apply
the theory presented above. In this example,
we will use Stat Trek's
Normal Distribution Calculator
to compute probabilities.
Normal Distribution Calculator
The normal calculator solves common statistical problems, based on the normal
distribution. The calculator computes cumulative probabilities, based on three
simple inputs. Simple instructions guide you to an accurate solution, quickly
and easily. If anything is unclear, frequently-asked questions and sample
problems provide straightforward explanations. The
calculator is free. It can found in the Stat Trek
main menu under the Stat Tools tab. Or you can tap the button below.
Normal Distribution Calculator
Problem 1
For boys, the average number of absences in the first grade
is 15 with a standard deviation of 7; for girls, the average
number of absences is 10 with a standard deviation of 6.
In a nationwide survey, suppose 100 boys and 50 girls are
sampled. What is the probability that the male sample
will have at most three more days of absences than
the female sample?
(A) 0.025
(B) 0.035
(C) 0.045
(D) 0.055
(E) None of the above
Solution
The correct answer is B. The solution involves three or four steps, depending on
whether you work directly with raw scores or z-scores. The "raw score" solution
appears below:
- Find the standard deviation of the difference.
σd =
sqrt( σ12 / n1 +
σ22 / n2 )
σd =
sqrt(72/100 + 62/50)
σd = sqrt(49/100 + 36/50)
σd = sqrt(0.49 + .72) = sqrt(1.21) = 1.1
- Find the probability. This problem requires us to find the
probability that the average number of absences in the boy sample
minus the average number of absences in the girl sample
is less than 3.
To find this probability, we use Stat Trek's
Normal Distribution Calculator.
Specifically, we enter the following inputs: 3, for the normal random variable;
5, for the mean; and 1.1, for the standard deviation.
We find that the probability of the mean difference
(male absences minus female absences) being 3 or less
is about 0.035.
Thus, the probability that the difference between samples will be
no more than 3 days is 0.035.
Alternatively, we could have worked with z-scores (which have a mean of 0 and
a standard deviation of 1). Here's the z-score solution:
- Find the probability. To find this probability, we use Stat Trek's
Normal Distribution Calculator.
Specifically, we enter the following inputs: -1.818, for the normal random variable;
0, for the mean; and 1, for the standard deviation.
We find that the probability of probability of a z-score being -1.818 or less
is about 0.035.
Of course, the result is the same, whether you work with raw scores or z-scores.
Note: Some analysts might have used the t-distribution to compute probabilities
for this problem. We chose the normal distribution because the population variance was known
and the sample size was large; but either choice would have been acceptable. In a previous lesson, we offered some guidelines for
choosing between the normal and the t-distribution.