Teach yourself statistics

Teach yourself statistics

Sampling Distribution of the Mean

Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a mean score for each sample. The probability distribution of this statistic is the sampling distribution of the mean.

Shape of Sampling Distribution

When the sampling method is simple random sampling, the sampling distribution of the mean will often be shaped like a t-distribution or a normal distribution, centered over the mean of the population. The mean of the sampling distribution equals the mean of the population distribution.

μ_s = μ_p

where μ_s is the mean of the sampling distribution and μ_p is the mean of population.

When to Use the t-Distribution

It is safe to assume that the shape of the sampling distribution for a mean will be close to a t-distribution when any of the following conditions are true:

The population distribution is normal.
The sample size is at least 15; and the sample has no outliers and exhibits no skewness.
The sample size is at least 30; and the sample has little skewness and no outliers.
The sample size is greater than 30.

When to Use the Normal Distribution

The central limit theorem predicts that the sampling distribution will be approximately normally distributed when the sample size is sufficiently large. If the population distribution is already approximately normal, a sample size of 30 will produce a sampling distribution that is approximately normal.

Normal Distribution or t-Distribution?

When the sample size is large, the t-distribution is almost identical to the normal distribution. In that case, you could use either distribution for analysis. Here are guidelines for choosing between the two.

If the population standard deviation is unknown and sample size is large, use the t-distribution with degrees of freedom equal to sample size minus one.
If the population standard deviation is known and sample size is large, use the normal distribution.

Warning: If the population distribution is highly skewed, a larger sample size (maybe, 50 or more) will be needed to produce a sampling distribution that is approximately normal.

Standard Deviation of the Sampling Distribution

Suppose we draw all possible simple random samples of size n from a population of size N. Suppose further that we compute a mean score x for each sample. In this way, we create a sampling distribution of the mean.

We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μ_x) is equal to the mean of the population (μ). And the standard deviation of the sampling distribution (SD) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n), as shown in the equation below:

SD = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]

In the standard deviation formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard deviation formula can be approximated by:

SD = σ / sqrt(n).

You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.

Standard Error of the Sampling Distribution

Often, we don't know the value for population standard deviation σ. And, if we don't know the population standard deviation, we cannot compute the standard deviation of the sampling distribution of the mean (SD).

However, we can use the sample standard deviation s to estimate the unknown population standard deviation. Substituting s into the equation for SD, we get:

s = sqrt [ Σ ( x_i - x )² / ( n - 1 ) ]

SE = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]

where s is the sample standard deviation, x is the sample mean, x_i is the ith element from the sample, n is the number of elements in the sample, and SE is a sample estimate of SD, the standard deviation of the sampling distribution. SE is the standard error of the sampling distribution of the mean.

And when the population size is very large relative to the sample size, the standard error formula can be approximated by:

SE = s / sqrt(n)

In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. For example, it will allow us to compute confidence intervals for mean scores and to test hypotheses about mean scores.

How to Find Probability

The sampling distribution of a sample mean is a probability distribution. You can use the sampling distribution to find a cumulative probability for any sample mean. Specifically, you can find:

P(x ≤ C)

where x is a sample mean and C is a constant, which could be any real number.

Finding the probability that the a sample mean will be no greater than the constant C is a four-step process:

Step 1: Find Mean of Distribution

The mean of the sampling distribution of sample mean equals the mean of the population from which the sample was drawn. Thus,

μ_s = μ_p

where μ_s is the mean of the sampling distribution and μ_p is the mean of population.

Step 2: Find Standard Deviation

Earlier in this lesson (see above), we explained how to compute standard deviation of the sampling distribution when you know population variance. And we showed how to estimate the standard deviation with the standard error when you don't know the population variance. When population size is big relative to sample size, you can use these formulas for standard deviation and standard error:

SD = σ / sqrt(n)

SE = s / sqrt(n)

where SD is the standard deviation of the sampling distribution, SE is the standard error, σ is the popuation standard deviation, s is the sample estimate of the population standard deviation, and n is sample size.

Step 3: Transform C Into z- or t-Score

If you know the standard deviation of the sampling distribution and sample size is large (30 or more), compute a z-score using this formula:

z = (C – μ_s) / SD

where C is a constant for which we want to find a probability, μ_s is the mean of the sampling distribution, and SD is the standard deviation of the sampling distribution.

If you don't know the standard deviation of the sampling distribution and sample size is small (less than 30), compute a t-score using this formula:

t = (C – μ_s) / SE

where SE is the standard error of the sampling distribution.

If you compute a t-score, you will also need to find the degrees of freedom. For the sampling distribution of a mean, degrees of freedom equals sample size minus one.

df = n - 1

where df is degrees of freedom.

Step 4: Find Probability

Find the probability for the z-score or a t-score that you calculated in Step 3; and you have found the probability that a sample mean will be no greater than the constant C.

You can find the probability for the z-score or a t-score from a handheld graphing calculator, from a written probability table commonly found in the appendix of introductory statistics texts, or from an online probability calculator, like Stat Trek's normal distribution calculator and t distribution calculator.

Test Your Understanding

Here are two problems to illustrate how to use the sampling distribution of the sample mean to solve common statistical problems. In the first problem, we compute a z-score and use a normal distribution calculator to arrive at a solution. In the second problem, we compute a t-score and use a t distribution calculator.

Problem 1

Assume that a school district has 10,000 6th graders. In this district, the average weight of a 6th grader is 80 pounds, with a population standard deviation of 20 pounds. Suppose you draw a random sample of 50 students. What is the probability that the sample mean will be less than 75 pounds?

Solution: Here is the four-step solution to solve this problem.

Step 1. Find the mean of the sampling distribution. The mean of the sampling distribution (μ_s) will equal the mean of the population (μ_p). Thus, the mean of the sampling distribution is equal to 80.
μ_s = μ_p

μ_s = 80
Step 2. Find the standard deviation of the sampling distribution. The standard deviation of the sampling distribution can be computed using the following formula.
SD = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
SD = [ 20 / sqrt(50) ] * sqrt[ (10,000 - 50 ) / (10,000 - 1) ]
SD = (20/7.071) * (0.995) = 2.81

Note: Because population size (10,000) is large relative to sample size (50), we could have used this simpler formula to compute standard deviation:

SD = σ / sqrt(n)

We'll demonstrate the simpler formula in the next problem.
Step 3. Transform C into a z- or t-score. In this problem, C is 75, the constant value for which we want to find a cumulative probability. Because sample size is large and we know the population standard deviation, we compute a z-score.
z = (C - μ_s)/SD = (75 - 80)/2.81 = -1.78
Step 4. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -1.78, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)

Normal Distribution Calculator

The Calculator tells us that the probability that the average weight of a sampled student will be less than 75 pounds is 0.03754. Not very likely.

Problem 2

Let's revisit Problem 1, with a twist. Here is the problem now. Assume that a school district has 10,000 6th graders. In this district, the average weight of a 6th grader is 80 pounds. Suppose you draw a random sample of 50 students and find the sample standard deviation to be 20 pounds. If you drew another random sample of 50 students, what is the probability that the sample mean in the second sample would be less than 75 pounds?

Solution: Here is the four-step solution to solve this problem.

Step 1. Find the mean of the sampling distribution. The mean of the sampling distribution (μ_s) will equal the mean of the population (μ_p). Thus, the mean of the sampling distribution is equal to 80.
μ_s = μ_p

μ_s = 80
Step 2. Find the standard deviation of the sampling distribution. Since we don't know the standard deviation of the population (σ), we cannot compute the standard deviation of the sampling distribution. But we do know the standard deviation of the sample (s); so we can can compute the standard error (SE), and we use the standard error to estimate the standard deviation of the sampling distribution. Since population size (10,000) is large relative to sample size (50), we can use this simple formula to compute standard error:
SE = s / sqrt(n)

SE = [ 20 / sqrt(50) ] = 2.83
Step 3. Transform C into a z- or t-score. In this problem, C is 75, the constant value for which we want to find a cumulative probability. Because we are using the sample standard deviation to estimate the population standard deviation, we compute a t-score.
t = (C - μ_s)/SE = (75 - 80)/2.83 = -1.77

And we find that the degrees of freedom for this t-score to be:

df = n - 1 = 50 - 1 = 49
Step 4. Find the probability. To find this probability, we use Stat Trek's t Distribution Calculator. Specifically, we enter the following inputs: -1.77 for the t-score and 49 for the degrees of freedom.

t Distribution Calculator

The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is 0.041.

Note: As sample size increases, the t distribution more closely resembles the normal distribution. Since the sample size (n=50) in Problem 1 and Problem 2 is relatively large, it is not surprising that we get a similar result, whether we use a normal distribution calculator or a t distribution calculator. In both cases, we find the probability that the average weight of a sampled student will be less than 75 pounds is approximately 0.04.

Last lesson Next lesson