Sampling Distributions
Suppose that we draw all possible samples of size n from a given
population. Suppose further that we compute a
statistic (e.g., a mean, proportion, standard deviation) for each
sample. The probability
distribution of this statistic is called a sampling distribution.
Variability of a Sampling Distribution
The variability of a sampling distribution is measured by its
variance or its
standard deviation. The variability of a sampling
distribution depends on three factors:
- The way that the random sample is chosen.
If the population size is much larger than the sample size, then the sampling
distribution has roughly the same sampling error, whether we sample
with or
without replacement. On the other hand, if the sample represents a
significant fraction (say, 1/10) of the population size, the sampling error
will be noticeably smaller, when we sample without replacement.
Central Limit Theorem
The central limit theorem states that the
sampling distribution of any statistic will be normal or nearly normal,
if the sample size is large enough.
How large is "large enough"? As a rough rule of thumb, many statisticians
say that a sample size of 30 is large enough. If you know something
about the shape of the sample distribution, you can refine that
rule. The sample size is large enough if any of the following
conditions apply.
- The sample size is greater than 40, without outliers.
The exact shape of any normal curve is totally determined by its mean and
standard deviation. Therefore, if we know the
mean and standard deviation of a statistic, we can find the mean
and standard deviation of the
sampling distribution of the statistic (assuming that the
statistic came from a "large" sample).
Sampling Distribution of the Mean
Suppose we draw all possible samples of size n from a population of size N.
Suppose further that we compute a mean score for each sample. In this way, we
create a sampling distribution of the mean.
We know the following.
The mean of the population (μ) is equal
to the mean of the sampling distribution (μx).
And the standard error of the sampling distribution (σx)
is determined by the standard deviation of the population (σ),
the population size, and the sample size. These relationships are shown in the
equations below:
μx
= μ and
σx = σ * sqrt( 1/n - 1/N )
Therefore, we can specify the sampling distribution of the mean whenever two
conditions are met:
- The population standard deviation σ is known.
Note: When the population size is very large, the factor 1/N is approximately
equal to zero; and the standard deviation formula reduces to:
σx = σ / sqrt(n).
You often see this formula in introductory statistics texts.
Sampling Distribution of the Proportion
In a population of size N, suppose that the probability of the occurence
of an event (dubbed a "success") is P; and the probability of the event's
non-occurence (dubbed a "failure") is Q. From this population, suppose that we
draw all possible samples of size n. And finally, within each sample,
suppose that we determine the proportion of successes p and failures q.
In this way, we create a sampling distribution of the proportion.
We find that the mean of the sampling distribution of the proportion (μp)
is equal to the probability of success in the population (P). And the standard
error of the sampling distribution (σp)
is determined by the standard deviation of the population (σ),
the population size, and the sample size. These relationships are shown in the
equations below:
μp = P
and σp = σ * sqrt( 1/n - 1/N ) =
sqrt[ PQ/n - PQ/N ]
where σ = sqrt[ PQ ].
Note: When the population size is very large, the factor PQ/N is approximately
equal to zero; and the standard deviation formula reduces to:
σp = sqrt( PQ/n ).
You often see this formula in intro statistics texts.
Test Your Understanding of This Lesson
In this section, we offer two examples to illustrate how to apply the Central
Limit Theorem to solve some commom statistical problems. Since the Central
Limit Theorem makes use of the normal distribution, use the Normal Distribution
Calculator to compute probabilities. The Calculator is free.
Normal
Distribution Calculator
The normal calculator solves common statistical problems, based on the normal
distribution. The calculator computes cumulative probabilities, based on three
simple inputs. Simple instructions guide you to an accurate solution, quickly
and easily. If anything is unclear, frequently-asked questions and sample
problems provide straightforward explanations. The
calculator is free. It can be found under the Stat Tables
tab, which appears in the header of every Stat Trek web page.
Example 1
Assume that a school district has 10,000 6th graders. In this district, the
average weight of a 6th grader is 80 pounds, with a standard deviation of 20
pounds. Suppose you draw a random sample of 50 students. What is the
probability that the average weight of a sampled student will be less than 75
pounds?
Solution: To solve this problem, we need to define the sampling
distribution of the mean. Because our sample size is greater than
40, the Central Limit Theorem tells us that the sampling distribution will be
normally distributed.
To define our normal distribution, we need to know both the mean of the sampling
distribution and the standard deviation. Finding the mean of the sampling
distribution is easy, since it is equal to the mean of the population. Thus,
the mean of the sampling distribution is equal to 80.
The standard deviation of the sampling distribution can be computed using the
following formula.
σx
= σ * sqrt( 1/n - 1/N )
σx = 20 * sqrt(
1/50 - 1/10000 ) = 20 * sqrt( 0.0199 ) = 20 * 0.141 = 2.82
Let's review what we know and what we want to know. We know that the sampling
distribution of the mean is normally distributed with a mean of 80 and a
standard deviation of 2.82. We want to know the probability that a sample mean
is less than or equal to 75 pounds. To solve the problem, we plug these inputs
into the Normal Probability Calculator: mean = 80, standard deviation = 2.82,
and value = 75. The Calculator tells us that the probability that the average
weight of a sampled student is less than 75 pounds is equal to 0.038.
Example 2
Find the probability that of the next 120 births, no more than 40% will be
boys. Assume equal probabilities for the births of boys and girls. Assume
also that the number of births in the population (N) is very large, essentially
infinite.
Solution: The Central Limit Theorem tells us that the proportion of boys
in 120 births will be normally distributed.
The mean of the sampling distribution will be equal to the mean of the
population distribution. In the population, half of the births result in boys;
and half, in girls. Therefore, the probability of boy births in the population
is 0.50. Thus, the mean proportion in the sampling distribution should also be
0.50.
The standard deviation of the sampling distribution can be computed using the
following formula.
σp = sqrt[ PQ/n - PQ/N ]
σp = sqrt[ (0.5)(0.5)/120 ] = sqrt[
0.25/120 ] = 0.04564
In the above calculation, the term PQ/N was equal to zero, since the population
size (N) was assumed to be infinite.
Let's review what we know and what we want to know. We know that the sampling
distribution of the proportion is normally distributed with a mean of 0.50 and
a standard deviation of 0.04564. We want to know the probability that no more
than 40% of the sampled births are boys. To solve the problem, we plug these
inputs into the Normal Probability Calculator: mean = .5, standard deviation =
0.04564, and value = .4. The Calculator tells us that the probability that no
more than 40% of the sampled births are boys is equal to 0.014.
Note: This use of the Central Limit Theorem provides a good approximation
of the true probabilities. The exact probability, computed using a binomial
distribution, is 0.018 - very close to the approximation obtained with the
Central Limit Theorem. The accuracy of the approximation increases as sample
size increases.