How to Estimate a Population Total from a Simple Random Sample
This lesson describes how to estimate a population total, given survey data from a simple random sample. A good analysis should provide two outputs:
- A point estimate of the population total.
- A quantitative measure of uncertainty associated with the point estimate (e.g., a margin or error and/or a confidence interval).
First, we describe how to conduct a good analysis step-by-step. Then, we will illustrate the analysis with a sample problem.
How to Analyze Survey Data
Any good analysis of survey sample data includes the same seven steps:
- Estimate a population parameter (in this case, the population total).
- Estimate population variance.
- Compute standard error.
- Specify a confidence level.
- Find the critical value (often a z-score or a t-score).
- Compute margin of error.
- Define confidence interval.
Let's look a little bit closer at each step - what we do in each step and why we do it. When you understand what is really going on, it will be easier for you to apply formulas correctly and to interpret analytical findings.
Note: The formulas presented below are only appropriate for simple random sampling.
Estimating a Population Total
The main goal of the analysis is to develop a point estimate for the population total. Before we can accomplish this objective, we need to estimate the population mean or the population proportion.
The sample mean is an unbiased estimate of the population mean:
Sample mean = x = Σx / n
where Σx is the sum of all the sample observations, and n is the number of sample observations.
Once we know the sample mean, we can estimate the population total (t) from the following formula:
Population total = t = Nx
where N is the number of observations in the population, and x is the sample mean.
A proportion is a special case of the mean. It represents the number of observations that have a particular attribute divided by the total number of observations in the group. To estimate a population proportion (P), use this formula for the sample proportion (p):
|p =||Sample observations with attribute
Total sample size (n)
Once we know the sample proportion, we can estimate a population total:
Population total = t = N * p
where t is an estimate of the number of elements in the population that have a specified attribute, N is the number of observations in the population, and p is the sample proportion.
Whether you use a sample mean or a sample proportion to estimate a population total, you know that different samples can produce different point estimates of the population total. As a result, you can be fairly sure that the estimate from your sample will not equal the true population total exactly.
Therefore, you need a way to express the uncertainty inherent in your estimate. The remaining six steps in the analysis are geared toward quantifying the uncertainty in your estimate.
Estimating Population Variance
The variance is a numerical value used to measure the variability of observations in a group. If individual observations vary greatly from the group mean, the variance is big; and vice versa.
When you use a mean score to estimate a population total from a simple random sample, the best estimate of the population variance is:
s2 = Σ ( xi - x )2 / ( n - 1 )
where s2 is a sample estimate of population variance, x is the sample mean, xi is the ith element from the sample, and n is the number of elements in the sample.
When you use a proportion to estimate a population total, the population variance can be estimated from a sample as:
s2 = [ n / (n - 1) ] * p * (1 - p)
where s2 is a sample estimate of population variance, p is a sample estimate of the population proportion, and n is the number of elements in the sample.
Why do we care about population variance? The variance is needed to compute the standard error. And why do we care about the standard error? Read on.
Computing Standard Error
When we use a mean or a proportion to estimate a population total from a simple random sample, the standard error (SE) of the estimate is:
SE = sqrt [ N2 * (1 - n/N) * s2 / n ]
where N is the population size, n is the sample size, and s2 is a sample estimate of the population variance.
Think of the standard error as the standard deviation of a sample statistic. In survey sampling, there are usually many different subsets of the population that we might choose for analysis. Each different sample might produce a different estimate of the value of a population parameter. The standard error provides a quantitative measure of the variability of those estimates.
Specifying Confidence Level
In survey sampling, different samples can be randomly selected from the same population; and each sample can often produce a different confidence interval. Some confidence intervals include the true population parameter; others do not.
A confidence level refers to the percentage of all possible samples that produce confidence intervals that include the true population parameter. For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample. A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.
As part of the analysis, survey researchers choose a confidence level. Probably, the most frequently chosen confidence level is 95%.
Finding Critical Value
- Compute alpha (α): α = 1 - (confidence level / 100)
- Find the critical probability (p*): p* = 1 - α/2
- To express the critical value as a z-score, find the z-score having a cumulative probability equal to the critical probability (p*).
- To express the critical value as a t statistic, follow these steps:
- Find the degrees of freedom (df). When you estimate a mean or proportion from a simple random sample, degrees of freedom is equal to the sample size minus one.
- The critical t statistic (t*) is the t statistic having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).
Note: You can use a t-score or a z-score for the critical value. You don't have to compute both. Researchers use a t-score when sample size is small; a z-score when it is large (at least 30). You can use the Normal Distribution Calculator to find the critical z-score, and the t Distribution Calculator to find the critical t-score.
Computing Margin of Error
The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter.
Here is the formula for computing margin of error (ME):
ME = SE * CV
where SE is standard error, and CV is the critical value.
Defining Confidence Interval
Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.
Here is how to compute the minimum and maximum values for a confidence interval around an estimated population total.
CImin = t - SE * CV
CImax = t + SE * CV
where CImin is the minimum value in the confidence interval, CImax is the maximum value in the confidence interval, t is the sample estimate of the population total, SE is the standard error, and CV is the critical value (either a z-score or a t-score). Thus, the confidence interval is an interval estimate that ranges between CImin and CImax.
This section presents a sample problem that illustrates how to analyze survey data when the sampling method is simple random sampling, and the parameter of interest is a total score.
In a small community of 100 homes, the local Animal Control Department conducts a survey of 16 homes to estimate the number of four-legged pets (dogs, cats, pigs, etc.) living in the community. The department uses simple random sampling to select homes for the study. The number of pets from each sampled home is shown below:
0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 2, 2, 2, 3, 5,
Solution: To solve this problem, we follow the seven-step process described above.
- Estimate the population total. Before we can estimate the population total, we need to first estimate the population mean.
as the sample statistic. The sample mean is:
x = Σ ( xi ) / n
x = ( 0 + 0 + 0 + ... + 2 + 3 + 5 ) / 16 = 20/16 = 1.25Therefore, based on data from the simple random sample, we estimate that the average home in the community has 1.25 four-legged pets. Now, we can estimate the total number of four-legged pets as:
Population total = t = Nx = 100 * 1.25 = 125
- Estimate population variance. We need to estimate population variance (s2) now, so we can compute the standard error in the next step.
s2 = Σ ( xi - x )2 / ( n - 1 )
s2 = [ (0 - 1.25)2 + (0 - 1.25)2 + ... + (3 - 1.25)2 + (5 - 1.25)2 ] / 15 = 1.80
- Compute standard error. The standard error measures the variability of our sample estimate of the population mean. We
will use standard error to compute the margin of error and to define a confidence level.
SE = sqrt [ N2 * (1 - n/N) * s2 / n ]
SE = sqrt [ 1002 * ( 1 - 16/100 ) * 1.8 / 16 ]
SE = sqrt [ ( 10,000 ) * 0.84 * 0.1125 ]
SE = sqrt (682.5) = 30.74
- Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 95% confidence level.
- Find the critical value. The critical value is a factor used to compute the margin of error. To find the critical value, we take these steps.
- Compute alpha (α):
α = 1 - (confidence level / 100)
α = 1 - 95/100 = 0.05
- Find the critical probability (p*):
p* = 1 - α/2 = 1 - 0.05/2 = 0.975
- Since the sample size (n = 16) is less than 30, we will express the critical value as a t-score with 15 degrees of freedom (since n-1 equals 15). The critical value is the t-score with 15 degrees of freedom that has a cumulative probability equal to 0.975. From the t-Distribution Calculator, we find that the critical value is about 2.13.
- Compute alpha (α):
- Compute the margin of error (ME):
ME = critical value * standard error
ME = 2.13 * 30.74 = 65.5
- Specify the confidence interval. The minimum and maximum values of the confidence interval are:
CImin = x - SE * CV = 125 - 30.74 * 2.13 = 59.5
CImax = x + SE * CV = 125 + 30.74 * 2.13 = 190.5
In summary, here are the results of our analysis. Based on sample data, we estimate that the total number of four-legged pets in the community is 125. Given a 95% confidence level, the margin of error around that estimate is 65.5; and the 95% confidence interval is 59.5 to 190.5.