Sampling Distribution of the Mean
Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a mean score for each sample. The probability distribution of this statistic is the sampling distribution of the mean.
Shape of Sampling Distribution
When the sampling method is simple random sampling, the sampling distribution of the mean will often be shaped like a t-distribution or a normal distribution.
When to Use the t-Distribution
It is safe to assume that the shape of the sampling distribution for a mean will be close to a t-distribution when any of the following conditions are true:
- Population values are approximately normally distributed.
- Sample size is smaller than 15; and the plot of sample data is symmetric, unimodal, without outliers.
- Sample size is between 15 and 40; and the plot of sample data is unimodal, without outliers, and only moderately skewed.
- Sample size is greater than 40, without outliers.
When to Use the Normal Distribution
The central limit theorem predicts that the sampling distribution will be approximately normally distributed when the sample size is sufficiently large.
If the population distribution is already approximately normal, a sample size of 30 will produce a sampling distribution that is approximately normal. If the population distribution is highly skewed, a sample size of 50 or more may be needed to produce a sampling distribution that is approximately normal.
Normal Distribution or t-Distribution?
When the sample size is large, the t-distribution is almost identical to the normal distribution. In that case, you could use either distribution for analysis. Here are guidelines for choosing between the two.
- If the population standard deviation is unknown and sample size is large, use the t-distribution with degrees of freedom equal to sample size minus one.
- If the population standard deviation is known and sample size is large, use the normal distribution.
Standard Deviation of the Sampling Distribution
Suppose we draw all possible simple random samples of size n from a population of size N. Suppose further that we compute a mean score x for each sample. In this way, we create a sampling distribution of the mean.
We know the following about the sampling distribution of the mean. The mean of the sampling distribution (μx) is equal to the mean of the population (μ). And the standard deviation of the sampling distribution (σx) is determined by the standard deviation of the population (σ), the population size (N), and the sample size (n), as shown in the equation below:
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
In the standard deviation formula, the factor sqrt[ (N - n ) / (N - 1) ] is called the finite population correction or fpc. When the population size is very large relative to the sample size, the fpc is approximately equal to one; and the standard deviation formula can be approximated by:
σx = σ / sqrt(n).
You often see this "approximate" formula in introductory statistics texts. As a general rule, it is safe to use the approximate formula when the sample size is no bigger than 1/20 of the population size.
Standard Error of the Sampling Distribution
Often, we don't know the value for population standard deviation σ. And, if we don't know the population standard deviation, we cannot compute the standard deviation of the sampling distribution of the mean (σx).
However, we can use the sample standard deviation s to estimate the unknown population standard deviation. Substituting s into the equation for σx, we get:
s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
SEm = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
where s is the sample standard deviation, x is the sample mean, xi is the ith element from the sample, n is the number of elements in the sample, and SEm is a sample estimate of σx, the standard deviation of the sampling distribution. SEm is the standard error of the sampling distribution of the mean.
And when the population size is very large relative to the sample size, the standard error formula can be approximated by:
SEm = s / sqrt(n)
In future lessons, you will see that being able to compute the standard error from sample data is essential for inferential statistics. It will allow us to compute confidence intervals for mean scores and to test hypotheses about mean scores.
Summary of Key Points
The key takeaways from this lesson are summarized below.
-
A sampling distribution of the mean can be approximated by a t-distribution when:
- The sampling method is simple random sampling.
- Population values are normally distributed.
- When sample size is large, a sampling distribution of the mean can be approximated by a normal distribution.
- The standard error of the sampling distribution of the mean can be computed from the following formula:
SEm = [ s / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
-
If population size is large relative to sample size (at least 20 times bigger than sample size),
the standard error of the sampling distribution of the mean can be computed from the following formula:
SEm = s / sqrt(n)
Test Your Understanding
Here are two problems to illustrate how sampling distributions are used to solve common statistical problems.
Problem 1
Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds, with a population standard deviation of 20
pounds. Suppose you draw a random sample of 50 students. What is the
probability that the sample mean will be less than 75 pounds?
Solution: To solve this problem, we need to define the sampling distribution of the mean. Because our sample size is relatively large (greater than 40), the central limit theorem tells us that the sampling distribution will approximate a normal distribution.
To define our normal distribution, we need to know both the mean of the sampling distribution and the standard deviation. Finding the mean of the sampling distribution is easy, since it is equal to the mean of the population. Thus, the mean of the sampling distribution is equal to 80.
The standard deviation of the sampling distribution can be computed using the following formula.
σx = [ σ / sqrt(n) ] * sqrt[ (N - n ) / (N - 1) ]
σx = [ 20 / sqrt(50) ] * sqrt[ (10,000 - 50 ) / (10,000 - 1) ]
σx = (20/7.071) * (0.995) = 2.81
Let's review what we know and what we want to know. We know that the sampling distribution of the mean is normally distributed with a mean of 80 and a standard deviation of 2.81. We want to know the probability that a sample mean is less than or equal to 75 pounds.
Because we know the population standard deviation and the sample size is large, we'll use the normal distribution to find probability. To solve the problem, we plug these inputs into the Normal Distribution Calculator: mean = 80, standard deviation = 2.81, and normal random variable = 75.
The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.03759.
Notice we enter the standard error, not the sample standard deviation, in the standard deviation field. We do this because the normal distribution is not a sampling distribution; it is a population distribution. It is defined by the population mean and population standard deviation. But the normal distribution becomes a sampling distribution when we set its standard deviation equal to the standard error.
Problem 2
Let's revisit Problem 1, with a twist. Here is the problem now. Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds. Suppose you draw a random sample of 50 students and find the sample standard deviation to be 20
pounds. If you drew another random sample of 50 students, what is the
probability that the sample mean in the second sample would be less than 75 pounds?
Solution: Because our sample size is relatively large and we don't know the population standard deviation (we can only estimate it from sample data), we decide that the t distribution may be a good choice to represent the shape of the sampling distribution of the mean.
With that in mind, we find the degrees of freedom (df).
df = n - 1 = 50 - 1 = 49
And we compute a t score, based on sample data.
t = [ x - μ ] / [ s / sqrt( n ) ]
t = [ 75 - 80 ] / [ 20 / sqrt( 50 ) ] = -1.7677
where t is the t score, x is the sample mean, μ is the population mean, s is the sample standard deviation, and n is sample size.
To solve the problem, we select "t score" in the Statistic field of the calculator; and we plug these inputs into the t Distribution Calculator: degrees of freedom = 49, t score = -1.7677.
The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.042.
Alternatively, we might have input sample raw scores into the t Distribution Calculator. Using this approach, we select "mean score" in the Statistic field of the calculator; and we plug these inputs into the calculator: degrees of freedom = 49, sample mean = 75, population mean = 80, and sample standard deviation = 20.
The Calculator tells us that the probability that the average weight of a sampled student is less than 75 pounds is equal to 0.042. We get the same result whether we use mean scores or t scores as input. Using the mean score saves us the trouble of computing a t score. The calculator does that for us behind the scenes.
Notice that when we use the t Distribution Calculator instead of the Normal Distribution Calculator, we don't enter the standard error in the standard deviation field. The calculator computes standard error behind the scenes, based on other inputs that we provide (i.e., degrees of freedom and sample standard deviation).
Note: As sample size increases, the t distribution more closely resembles the normal distribution. Since the sample size (n=50) in Problem 1 and Problem 2 is relatively large, it is not surprising that we get a similar result, whether we use a normal distribution calculator or a t distribution calculator. In both cases, we find the probability that the average weight of a sampled student will be less than 75 pounds is approximately 0.04.
Problem 3
When you use the Normal Distribution Calculator and the t Distribution Calculator to find cumulative probabilities for the sampling distribution of the mean, what should you enter in the "Standard deviation" field of each calculator?
(A) Enter the sample standard deviation in each calculator.
(B) Enter the standard error in each calculator.
(C) Enter the standard error in the Normal Distribution Calculator and the sample standard deviation in the t Distribution Calculator.
(D) Enter the sample standard deviation in the Normal Distribution Calculator and the standard error in the t Distribution Calculator.
(E) None of the above.
Solution
The correct answer is (C). Here's why:
- The normal distribution is not a sampling distribution; it is a population distribution. But the normal distribution becomes a sampling distribution when we set its standard deviation equal to the standard error. So, when we are trying to find cumulative probabilities for a sampling distribution, the standard error is the right input for the normal distribution calculator.
- The t distribution is a sampling distribution. It is defined in part by the sample standard deviation, so the sample standard deviation is a necessary input for the t distribution calculator. The standard error is not required to define the t distribution or to make the t distribution a sampling distribution.
Bottom line: With the t distribution calculator, enter the sample standard deviation in the standard deviation field. With the normal distribution calculator, enter the sampling error in the standard deviation field. This is a subtle difference to be aware of when you want to find probabilities from a sampling distribution.