How to Estimate a Population Total from a Cluster Sample
This lesson describes how to estimate a population total, given survey data from a cluster sample. A good
analysis should provide two outputs:
First, we describe how to conduct a good analysis stepbystep. Then, we will illustrate the analysis
with a sample problem.
How to Analyze Survey Data
A good analysis of survey data from a cluster sample includes eight steps:
 Estimate a population parameter (in this case, the population total).
 Compute sample variance within each cluster.
 Compute sample variance between each cluster.
 Compute standard error.
 Specify a confidence level.
 Find the critical value (often a zscore or a tscore).
 Compute margin of error.
 Define confidence interval.
Let's look a little bit closer at each step  what we do in each step and why we do it. When you understand what is really going on,
it will be easier for you to apply formulas correctly and to interpret analytical findings.
Note: The formulas presented below are only appropriate for cluster sampling.
Estimating a Population Total
The first step in the analysis is to develop a point estimate
for the population total. Before we can accomplish this objective, we need to compute a mean score or a proportion
for each sampled cluster.
Use this formula to compute the sample means:
Sample mean in cluster h = x_{h} = Σx_{h} / m_{h}
where Σx_{h} is the sum of all the sample observations in cluster h, and m_{h} is the number of sample observations
in cluster h.
Once we know the sample mean in each cluster, we can estimate the population total (t) from the following formula:
Population total = t = N/n * ΣM_{h} * x_{h}
where N is the number of clusters in the population,
n is the number of clusters in the sample,
M_{h} is the number of observations in the population from cluster h,
and x_{h} is the sample mean from cluster h.
A proportion is a special case of the mean. It represents the number of observations that have a particular attribute divided by the
total number of observations in the group. Use this formula to compute the proportion for each sampled cluster:
p_{h} = m'_{h} / m_{h}
where p_{h} is a sample estimate of the population proportion for cluster h,
m'_{h} is the number of sample observations from cluster h that have the attribute,
and m_{h} is the total number of sample observations from cluster h.
Once we have estimated a sample proportion for each cluster, we can estimate a population total:
Population total = t = N/n * ΣM_{h} * p_{h}
where t is an estimate of the number of elements in the population that have a specified attribute,
N is the number of clusters in the population,
n is the number of clusters in the sample,
M_{h} is the number of observations from cluster h in the population,
and p_{h} is the sample proportion from cluster h.
Because different samples can produce different point estimates, you can be fairly sure that the estimate from your sample does
not equal the true value of the population parameter exactly.
Therefore, you need a way to express the uncertainty inherent in your estimate. The remaining six steps in the analysis are
geared toward quantifying the uncertainty in your estimate.
Computing Variance Within Clusters
If you are using onestage cluster sampling,
you can skip this step. But if you are using
twostage cluster sampling, you will need
to compute the variance within each sampled cluster.
For a mean score, the variance within each cluster can be estimated from sample data as:
s^{2}_{h} = Σ ( x_{i}_{h}  x_{h} )^{2} / ( m_{h}  1 )
where s^{2}_{h} is a sample estimate of population variance in cluster h,
x_{i}_{h} is the value of the ith element from cluster h,
x_{h} is the sample mean from cluster h,
and m_{h} is the number of observations sampled from cluster h.
For a proportion, the variance within each cluster can be estimated as:
s^{2}_{h} = [ m_{h} / (m_{h}  1) ] * p_{h} * (1  p_{h})
where s^{2}_{h} is a sample estimate of the variance within cluster h,
m_{h} is the number of observations sampled from cluster h,
and p_{h} is a sample estimate of the proportion in cluster h.
Computing Variance Between Clusters
Use the following formula to estimate the variance of total scores between sampled clusters (s^{2}_{b}):
s^{2}_{b} = Σ ( t_{h}  t/N )^{2} / ( n  1 )
where s^{2}_{b} is a sample estimate of the variance between sampled clusters,
t_{h} is the total from cluster h,
t is the sample estimate of the population total,
N is the number of clusters in the population,
and n is the number of clusters in the sample.
Note: If you are working with proportions, t_{h} is:
t_{h} = M_{h} * p_{h}
where M_{h} is the number of population elements in cluster h,
and p_{h} is the observed proportion in cluster h.
Computing Standard Error
The standard error is possibly the most important
output from our analysis. It allows us to compute the
margin of error and the
confidence interval.
When we estimate a population total from a cluster sample, the standard error (SE) of the estimate is:
SE = 
N * sqrt { [ ( 1  n/N ) / n ] * s^{2}_{b}/n +


N/n * Σ ( 1  m_{h}/M_{h} ) * M^{2}_{h} * s^{2}_{h}/m_{h} ) }

where N is the number of clusters in the population,
n is the number of clusters in the sample,
s^{2}_{b} is a sample estimate of the variance between clusters,
m_{h} is the number of elements from cluster h in the sample,
M_{h} is the number of elements from cluster h in the population,
and s^{2}_{h} is a sample estimate of the population variance in cluster h.
With onestage cluster sampling, the formula for the standard error reduces to:
SE = N * sqrt { [ ( 1  n/N ) / n ] * s^{2}_{b}/n }
Think of the standard error as the standard deviation
of a sample statistic.
In survey sampling, there are usually many different subsets of the population that we might choose for analysis. Each different
sample might produce a different estimate of the value of a population parameter. The standard error provides a quantitative
measure of the variability of those estimates.
Specifying Confidence Level
In survey sampling, different samples can be randomly selected from the same population;
and each sample can often produce a different confidence interval.
Some confidence intervals include the true population parameter; others do not.
A confidence level refers to the percentage of all possible samples that produce confidence intervals that include the true population parameter.
For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample.
A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.
As part of the analysis, survey researchers choose a confidence level. Probably, the most frequently chosen confidence level is 95%.
Finding Critical Value
Often expressed as a tscore or a
zscore, the critical value is a factor used to compute the margin of error.
To find the critical value, follow these steps:
 Compute alpha (α): α = 1  (confidence level / 100)
 Find the critical probability (p*): p* = 1  α/2
 To express the critical value as a zscore, find the zscore having a
cumulative probability
equal to the critical probability (p*).
 To express the critical value as a tscore, follow these steps:
Researchers use a tscore when sample size is small; a zscore when it is large (at least 30).
You can use the Normal Distribution Calculator to find the critical zscore, and the
t Distribution Calculator to find the critical t statistic.
Computing Margin of Error
The margin of error
expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter.
Here is the formula for computing margin of error (ME):
ME = SE * CV
where SE is standard error, and CV is the critical value.
Defining Confidence Interval
Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic.
A confidence interval is an interval estimate combined with a probability statement.
Here is how to compute the minimum and maximum values for a confidence interval.
Mean 
Proportion 
CI_{min} = x  SE * CV
CI_{max} = x + SE * CV 
CI_{min} = p  SE * CV
CI_{max} = p + SE * CV 
In the table above, x is the sample estimate of the population mean, p is the sample estimate of the population proportion,
SE is the standard error, and CV is the critical value (either a zscore or a tscore). And,
the confidence interval is an interval estimate that ranges between CI_{min} and CI_{max}.
Sample Problem
This section presents a sample problem that illustrates how to estimate a population total
when the sampling method is onestage cluster sampling.
Sample Size Calculator
The analysis of data collected via cluster sampling can be complex and
timeconsuming. Stat Trek's Sample Size Calculator can help. The calculator computes
standard error, margin of error, and confidence intervals. It assesses sample size requirements, estimates
population parameters, and tests hypotheses. The calculator
is free. You can find the Sample Size Calculator in Stat Trek's
main menu under the Stat Tools tab. Or you can tap the button below.
Sample Size Calculator
Example 1
A botanist divides a field into 1000 equalsize plots. In each plot, he plants 100 clover seeds; and ultimately, each seed sprouts.
The botanist randomly selects 20 plots and counts the number of fourleaf clovers in each sampled plot. His findings appear below:
0, 1, 1, 2, 2, 2, 3, 3, 3, 3
3, 3, 3, 3, 4, 4, 4, 4, 4, 8

Using sample data, estimate the total number of fourleaf clovers in the
field. Find the margin
of error and the
confidence interval. Assume a 95%
confidence level.
Solution: To solve this problem, we follow the sevenstep process described above.
 Estimate the population total. Before we can estimate the population total, we need to first estimate the sample mean for
each cluster. The formula for a cluster mean is:
Sample mean in cluster h = x_{h} = Σx_{h} / m_{h}
where Σx_{h} is the sum of all the sample observations in cluster h, and m_{h} is the number of sample observations
in cluster h.
Using the above formula, we can compute a sample mean for each of the 20 sampled plots:
Mean_{plot 1} = x_{1} = Σx_{1} / n_{1} = 0/100 = 0
Mean_{plot 2} = x_{2} = Σx_{2} / n_{2} = 1/100 = 0.01
. . .
Mean_{plot 19} = x_{19} = Σx_{19} / n_{19} = 4/100 = 0.04
Mean_{plot 20} = x_{20} = Σx_{20} / n_{20} = 8/100 = 0.08
Given the sample means within strata, we can estimate the population total (t) from the following formula:
t = N/n * ΣM_{h} * x_{h}
t = 1000/20 * ( 100 * 0 + 100 * 0.01 + ... + 100 * 0.04 + 100 * 0.08 )
t = 50 * 60 = 3000
Therefore, based on sampled data, we estimate that there are 3000 fourleaf clovers in the field.
 Compute sample variance within each cluster. If our problem involved twostage cluster sampling, we would need to compute sample
variance within each cluster. But since our problem uses onestage cluster sampling, we don't need to compute variance within clusters.
 Compute sample variance between clusters. We use the following formula to estimate the variance of total scores between
sampled clusters (s^{2}_{b}):
s^{2}_{b} = Σ ( t_{h}  t/N )^{2} / ( n  1 )
s^{2}_{b} = 1/19 * Σ ( t_{h}  3000/1000 )^{2}
= 0.05263 * Σ ( t_{h}  3 )^{2}
s^{2}_{b} = 0.05263 * [(03)^{2} + (13)^{2} + ... + (43)^{2} + (83)^{2}]
s^{2}_{b} = 0.05263 * [9 + 4 + 4 + 1 + 1 + 1 + . . . + 1 + 1 + 1 + 1 + 1 + 25]
s^{2}_{b} = 0.05263 * 50 = 2.63
 Compute standard error. With onestage cluster sampling, the standard error (SE) of the estimate is:
SE = N * sqrt [ ( 1  n/N ) * s^{2}_{b}/n ]
SE = 1000 * sqrt [ ( 1  20/1000 ) * 2.63/20 ]
SE = 1000 * sqrt ( 0.098 * 0.1315 )
SE = 1000 * sqrt (0.12887) = 1000 * 0.359 = 359
Thus, the standard error of the sampling distribution of the total is 359.
 Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 95%
confidence level.
< Find the critical value. The critical value is a factor used to compute the margin of error. To find the critical value, we take these steps.
 Compute the margin of error (ME):
ME = critical value * standard error
ME = 2.093 * 359 = 751
 Specify the confidence interval. The minimum and maximum values of the confidence interval are:
CI_{min} = x  SE * CV = 3000  751 = 2249
CI_{max} = x + SE * CV = 3000 + 751 = 3751
In summary, here are the results of our analysis. Based on sample data, we estimate that there are 3000 fourleaf clovers in the field.
Given a 95% confidence level, the margin of error around that estimate is 751; and the 95% confidence interval is 2249 to 3751.