### Beyond AP Statistics

#### Probability Basics

#### Small Samples

#### Distributions

#### Power

### Beyond AP Statistics

#### Probability Basics

#### Small Samples

#### Distributions

#### Power

# Difference Between Means

Statistics problems often involve comparisons between two independent sample means. This lesson explains how to compute probabilities associated with differences between means.

## Difference Between Means: Theory

Suppose we have two
populations
with means equal to μ_{1} and μ_{2}. Suppose
further that we take all possible
samples
of size n_{1} and n_{2}. And finally, suppose that the
following assumptions are valid.

- The size of each population is large relative to the sample
drawn from the population. That is, N
_{1}is large relative to n_{1}, and N_{2}is large relative to n_{2}. (In this context, populations are considered to be large if they are at least 20 times bigger than their sample.) - The samples are independent; that is, observations in population 1 are not affected by observations in population 2, and vice versa.
- The set of differences between sample means is normally distributed. This will be true if each population is normal or if the sample sizes are large. (Based on the central limit theorem, sample sizes of 40 would probably be large enough).

Given these assumptions, we know the following.

- The
expected value of the difference between all
possible sample means
is equal to the difference between population means. Thus,
E(x

_{1}- x_{2}) = μ_{d}= μ_{1}- μ_{2}. - The standard deviation of the difference between sample
means (σ
_{d}) is approximately equal to:σ

_{d}= sqrt( σ_{1}^{2}/ n_{1}+ σ_{2}^{2}/ n_{2})

It is straightforward to derive the last bullet point, based on material covered in previous lessons. The derivation starts with a recognition that the variance of the difference between independent random variables is equal to the sum of the individual variances. Thus,

σ^{2}_{d} =
σ^{2}
_{(x1 -
x2)} =
σ^{2}
_{x1} +
σ^{2}
_{x2}

If the populations N_{1} and N_{2} are both large
relative to n_{1} and n_{2}, respectively,
then

σ^{2}
_{x1} =
σ^{2}_{1} / n_{1}

σ^{2}
_{x2} =
σ^{2}_{2} / n_{2}

σ_{d}^{2} =
σ_{1}^{2} / n_{1} +
σ_{2}^{2} / n_{2}

σ_{d} =
sqrt( σ_{1}^{2} / n_{1} +
σ_{2}^{2} / n_{2} )

## Difference Between Means: Sample Problem

In this section, we work through a sample problem to show how to apply the theory presented above. In this example, we will use Stat Trek's Normal Distribution Calculator to compute probabilities.

## Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator**Problem 1**

For boys, the average number of absences in the first grade is 15 with a standard deviation of 7; for girls, the average number of absences is 10 with a standard deviation of 6.

In a nationwide survey, suppose 100 boys and 50 girls are
sampled. What is the probability that the male sample
will have *at most* three more days of absences than
the female sample?

(A) 0.025

(B) 0.035

(C) 0.045

(D) 0.055

(E) None of the above

**Solution**

The correct answer is B. The solution involves three or four steps, depending on whether you work directly with raw scores or z-scores. The "raw score" solution appears below:

- Find the mean difference (male absences minus female absences)
in the population.
μ

_{d}= μ_{1}- μ_{2}= 15 - 10 = 5 - Find the standard deviation of the difference.
σ

_{d}= sqrt( σ_{1}^{2}/ n_{1}+ σ_{2}^{2}/ n_{2})σ

_{d}= sqrt(7^{2}/100 + 6^{2}/50)σ

_{d}= sqrt(49/100 + 36/50)σ

_{d}= sqrt(0.49 + .72) = sqrt(1.21) = 1.1 - Find the probability. This problem requires us to find the probability that the average number of absences in the boy sample minus the average number of absences in the girl sample is less than 3. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: 3, for the normal random variable; 5, for the mean; and 1.1, for the standard deviation. We find that the probability of the mean difference (male absences minus female absences) being 3 or less is about 0.035.

Thus, the probability that the difference between samples will be no more than 3 days is 0.035.

Alternatively, we could have worked with z-scores (which have a mean of 0 and a standard deviation of 1). Here's the z-score solution:

- Find the mean difference (male absences minus female absences)
in the population.
μ

_{d}= μ_{1}- μ_{2}= 15 - 10 = 5 - Find the standard deviation of the difference.
σ

_{d}= sqrt( σ_{1}^{2}/ n_{1}+ σ_{2}^{2}/ n_{2})σ

_{d}= sqrt(7^{2}/100 + 6^{2}/50) = sqrt(49/100 + 36/50)σ

_{d}= sqrt(0.49 + .72) = sqrt(1.21) = 1.1 - Find the
z-score
that is produced when boys have three more days of absences than
girls. When boys have three more days of absences, the number of
male absences minus female absences is three.
And the associated z-score
is
z = (x - μ)/σ = (3 - 5)/1.1 = -2/1.1 = -1.818

- Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -1.818, for the normal random variable; 0, for the mean; and 1, for the standard deviation. We find that the probability of probability of a z-score being -1.818 or less is about 0.035.

Of course, the result is the same, whether you work with raw scores or z-scores.

**Note:** Some analysts might have used the t-distribution to compute probabilities
for this problem. We chose the normal distribution because the population variance was known
and the sample size was large; but either choice would have been acceptable. In a previous lesson, we offered some guidelines for
choosing between the normal and the t-distribution.