Sampling Distributions
Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution.
Sampling distributions play a critical role in inferential statistics (e.g., testing hypotheses, defining confidence intervals). To make use of a sampling distribution, analysts must understand the variability of the distribution and the shape of the distribution. This lesson introduces those topics.
Variability of a Sampling Distribution
The variability of a sampling distribution depends on four factors:
- The standard deviation in the population from which the sample is drawn.
- N: The number of observations in the population.
- n: The number of observations in the sample.
- The way that the random sample is chosen.
When the population standard deviation is known, the standard deviation of a sampling distribution can be computed. When the population standard deviation is not known, the standard deviation of a sampling distribution can be estimated from sample data. The estimate of the standard deviation of a sampling distribution is called the standard error.
Formulas for computing the standard deviation of a sampling distribution differ, depending on the statistic in question. Similarly, formulas for computing the standard error of a sampling distribution differ, depending on the statistic in question. In future lessons, we present formulas for computing the standard deviation and the standard error for different kinds of statistics.
Note: If the population size is much larger than the sample size, then the sampling distribution has roughly the same standard deviation and the same standard error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/20) of the population size, the standard deviation and the standard error will be meaningfully smaller, when we sample without replacement.
Shape of a Sampling Distribution
In some situations, a sampling distribution will be approximately normal in shape. In those situations, a researcher can use the normal distribution for analysis. In other situations, a sampling distribution will more closely follow a t-distribution; and a researcher can use the t-distribution for analysis.
Let's look at some guidelines for determining when a sampling distribution will be shaped like a normal distribution or a t-distribution.
A Normal Distribution
The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.
How large is "large enough"? The answer depends on two factors.
- Requirements for accuracy. The more closely the sampling distribution needs to resemble a normal distribution, the more sample points will be required.
- The shape of the underlying population. The more closely the original population resembles a normal distribution, the fewer sample points will be required.
In practice, some statisticians say that a sample size of 20 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 30. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger.
A t-distribution
If the underlying population distribution is normally distributed, the sampling distribution will be shaped like a t-distribution. This is true, even when the sample size is small.
In practice, many statisticians relax the normality requirement. They are comfortable using the t-distribution when the population distribution is roughly bell-shaped, even if it is not exactly normal.
How to Choose Between t-distribution and Normal Distribution
The t-distribution and the normal distribution are both bell-shaped distributions. This suggests that we might use either the t-distribution or the normal distribution to analyze sampling distributions that are roughly bell-shaped. Which should we choose?
Guidelines exist to help you make that choice. Some focus on the population standard deviation.
- If the population standard deviation is known, use the normal distribution
- If the population standard deviation is unknown, use the t-distribution.
Other guidelines focus on sample size.
- If the sample size is large, use the normal distribution. (See the discussion above on the central limit theorem to understand what is meant by a "large" sample.)
- If the sample size is small, use the t-distribution.
In practice, researchers employ a mix of the above guidelines. On this site, we use the normal distribution when the population standard deviation is known and the sample size is large. We use the t-distribution when standard deviation is unknown, although the t-distribution and the normal distribution are nearly identical when the sample size is very large. We use the t-distribution when the sample size is small, unless the underlying distribution is distinctly not normal. The t-distribution should not be used with small samples from populations that are not approximately normal.