Sample Size: Stratified Random Samples
The precision and cost of a stratified design are influenced by the way
that sample elements are allocated to strata.
How to Assign Sample to Strata
One approach is
proportionate stratification. With proportionate stratification,
the sample size of each stratum is proportionate to the population
size of the stratum. Strata
sample sizes are determined by the following equation :
nh = ( Nh / N ) * n
where nh is the sample size for stratum h, Nh
is the population size for stratum h, N is total population size, and n
is total sample size.
Another approach is
disproportionate stratification, which can be a better choice (e.g.,
less cost, more precision) if sample elements are assigned correctly to strata.
To take advantage of disproportionate stratification, researchers need to
answer such questions as:
Although a consideration of all these questions is beyond the scope of this
tutorial, the remainder of this lesson does address the first two questions.
(To answer the other questions, as well as the first two questions, consider
using the Sample
Size Calculator.)
Sample Size Calculator
Stat Trek's Sample Size Calculator can help you find the right sample allocation
plan for your stratified design. You specify your main goal - maximize precision,
minimize cost, stay within budget, etc. Based on your goal, the calculator prompts you
for the necessary inputs and handles all computations automatically. It tells you
the best sample size for each stratum. The calculator creates a summary report that
lists key findings, including the margin of error. And it describes analytical
techniques. And the calculator
is free. You can find the Sample Size Calculator in Stat Trek's
main menu under the Stat Tools tab. Or you can tap the button below.
Sample Size Calculator
How to Maximize Precision, Given a Stratified Sample With a
Fixed Budget
The ideal sample allocation plan would provide the most precision for the least
cost. Optimal allocation does just that. Based on optimal
allocation, the best sample size for stratum h would be:
nh = n * [ ( Nh * σh ) / sqrt( ch )
] / [ Σ ( Ni * σi ) /
sqrt( ci ) ]
where nh is the sample size for stratum h, n is total sample
size, Nh is the population size for stratum h, σh is
the standard deviation of stratum h, and ch is the direct
cost to sample an individual element from stratum h. Note that ch
does not include indirect costs, such as overhead costs.
The effect of the above equation is to sample more heavily from a stratum when
-
The variability within the stratum is large.
How to Maximize Precision, Given a Stratified Sample With a
Fixed Sample Size
Sometimes, researchers want to find the sample allocation plan that provides the
most precision, given a fixed sample size. The solution to this problem is a
special case of optimal allocation, called Neyman allocation.
The equation for Neyman allocation can be derived from the equation for optimal
allocation by assuming that the direct cost to sample an individual element is
equal across strata. Based on Neyman allocation, the best sample size for
stratum h would be:
nh = n * ( Nh * σh )
/ [ Σ ( Ni * σi ) ]
where nh is the sample size for stratum h, n is total sample
size, Nh is the population size for stratum h, and σh
is the standard deviation of stratum h.
Test Your Understanding
This section presents a sample problem that illustrates how to maximize
precision, given a fixed sample size and a stratified sample.
(In a
subsequent lesson, we re-visit this problem and see how stratified
sampling compares to other sampling methods.)
Problem 1
At the end of every school year, the state administers a reading test to a
sample of 36 third graders. The school system has 20,000 third graders, half
boys and half girls. The results from last year's test are shown in the table
below.
Stratum
|
Mean score |
Standard deviation |
Boys
|
70 |
10.27 |
Girls
|
80 |
6.66 |
This year, the researchers plan to use a stratified sample, with one stratum
consisting of boys and the other, girls. Use the results from last year to
answer the following questions?
Assume a 95% confidence
level.
Solution: The first step is to decide how to allocate sample in order to
maximize precision. Based on Neyman allocation, the best sample size for
stratum h is:
nh = n * ( Nh * σh )
/ [ Σ ( Ni * σi ) ]
where nh is the sample size for stratum h, n is total sample
size, Nh is the population size for stratum h, and σh
is the standard deviation of stratum h. By this equation, the number of
boys in the sample is:
nboys = 36 * ( 10,000 * 10.27 ) / [ ( 10,000 * 10.27 ) + ( 10,000 *
6.67 ) ]
nboys = 21.83
Therefore, to maximize precision, the total sample of 36 students should consist
of 22 boys and (36 - 22) = 14 girls.
The remaining questions can be answered during the process of
computing the
confidence interval.
Elsewhere on this website, we described
how to compute a confidence interval.
We employ that process below.
- Identify a sample statistic. For this problem, we use
the overall sample mean to estimate the population mean. To compute the overall
sample mean, we use the following equation (which was introduced
in a
previous lesson):
x = Σ ( Nh / N ) * xh
x = ( 10,000/20,000 ) * 70 + ( 10,000/20,000 ) * 80
x = 75
Therefore, based on data from the sample strata, we estimate that the mean
reading achievement level in the population is equal to 75.
- Select a confidence level. In this analysis, the confidence level
is defined for us in the problem. We are working with a 95%
confidence level.
- Find the margin of error. Elsewhere on this site, we show
how to compute the margin of error when the sampling
distribution is approximately normal. The key steps are
shown below.
- Find standard deviation or standard error. The equation to
compute the standard error was introduced in a
previous lesson. We use that equation here:
SE = (1 / N) * sqrt { Σ [ N2h
* ( 1 - nh/Nh ) * s2h / nh
] }
SE = (1 / 20,000) * sqrt { [ 10,0002 * ( 1 - 22/10,000 ) * (10.27)2
/ 22 ] + [ 10,0002 * ( 1 - 14/10,000 ) * (6.66)2 / 14 ] }
SE = 1.41
Thus, the standard deviation of the sampling distribution (i.e., the standard
error) is 1.41.
- Find critical value. The critical value is a factor used to
compute the margin of error. We express the critical
value as a
z-score.
To find the critical value, we take these steps.
- Compute margin of error (ME):
ME = critical value * standard error
ME = 1.96 * 1.41 = 2.76
- Specify the confidence interval. The range of the confidence
interval is defined by the sample statistic +
margin of error. And the uncertainty is denoted
by the confidence level.
Thus, with this sample design, we are 95% confident that the sample
estimate of reading achievement is 75 + 2.76.
In summary, given a total sample size of 36 students, we can get the
greatest precision from a stratified sample if we sample 22 boys
and 14 girls. This results in a 95% confidence interval of
72.24 to 77.76. The margin of error is 2.76.