### Survey Sampling

#### Introduction

#### Simple Random Samples

#### Stratified Samples

#### Cluster Samples

#### Sample Planning

#### Hypothesis Testing

#### Small Samples

#### Appendix

### Survey Sampling: Table of Contents

#### Introduction

- About This Tutorial
- Survey Sampling Overview
- Survey Sampling Methods
- Bias in Survey Sampling
- Survey Analysis

#### Simple Random Samples

#### Stratified Samples

#### Cluster Samples

#### Sample Planning

#### Hypothesis Testing

#### Small Samples

#### Appendix

# How to Estimate a Population Total from a Stratified Sample

This lesson describes how to estimate a population total, given survey data from a stratified random sample. A good analysis should provide two outputs:

- A point estimate of the population total.
- A quantitative measure of uncertainty associated with the point estimate (e.g., a margin or error and/or a confidence interval).

First, we describe how to conduct a good analysis step-by-step. Then, we will illustrate the analysis with a sample problem.

## How to Analyze Survey Data

Any good analysis of survey data from a stratified sample includes the same seven steps:

- Estimate a population parameter (in this case, the population total).
- Compute sample variance within each stratum.
- Compute standard error.
- Specify a confidence level.
- Find the critical value (often a z-score or a t-score).
- Compute margin of error.
- Define confidence interval.

Let's look a little bit closer at each step - what we do in each step and why we do it. When you understand what is really going on, it will be easier for you to apply formulas correctly and to interpret analytical findings.

**Note:** The formulas presented below are only appropriate for stratified random sampling.

### Estimating a Population Total

The main goal of the analysis is to develop a point estimate
for the population total. Before we can accomplish this objective, we need to estimate the population mean or the population proportion
for *each* stratum.

The sample mean is an unbiased estimate of the population mean. Use the following formula to compute the sample mean in each stratum:

Sample mean in stratum *h* = x_{h} = Σx_{h} / n_{h}

where Σx_{h} is the sum of all the sample observations in stratum *h*, and n_{h} is the number of sample observations
in stratum h.

Once we know the sample mean in each stratum, we can estimate the population total (t) from the following formula:

Population total = t = ΣN_{h} * x_{h}

where N_{h} is the number of observations in the population from stratum *h*,
and x_{h} is the sample mean from stratum *h*.

A proportion is a special case of the mean. It represents the number of observations that have a particular attribute divided by the
total number of observations in the group. Use this formula to estimate the population proportion for *each* stratum:

p_{h} = n'_{h} / n_{h}

where p_{h} is a sample estimate of the population proportion for stratum *h*,
n'_{h} is the number of sample observations from stratum *h* that have the attribute,
and n_{h} is the total number of sample observations from stratum *h*.

Once we have estimated a sample proportion for each stratum, we can estimate a population total:

Population total = t = ΣN_{h} * p_{h}

where t is an estimate of the number of elements in the population that have a specified attribute,
N_{h} is the number of observations from stratum *h* in the population,
and p_{h} is the sample proportion from stratum *h*.

Whether you use a sample mean or a sample proportion to estimate a population total, you know that different samples can produce different point estimates of the population total. As a result, you can be fairly sure that the estimate from your sample will not equal the true population total exactly.

Therefore, you need a way to express the uncertainty inherent in your estimate. The remaining six steps in the analysis are geared toward quantifying the uncertainty in your estimate.

### Computing Variance Within Strata

The variance is a numerical value used to measure the variability of observations in a group. If individual observations vary greatly from the group mean, the variance is big; and vice versa.

Given a stratified random sample, we need to compute the sample variance within each stratum (s^{2}_{h}):

*s*^{2}_{h} = Σ ( x_{i}_{h} - x_{h} )^{2} / ( n_{h} - 1 )

where *s*^{2}_{h} is a sample estimate of population variance in stratum *h*,
x_{i}_{h} is the value of the *i*th element from stratum h,
x_{h} is the sample mean from stratum *h*,
and n_{h} is the number of sample observations from stratum *h*.

With a proportion, the variance within each stratum can be estimated from a sample as:

*s*^{2}_{h} = [ n_{h} / (n_{h} - 1) ] * p_{h} * (1 - p_{h})

where *s*^{2}_{h} is a sample estimate of the variance within stratum *h*,
n_{h} is the number of observations from stratum *h* in the sample,
and p_{h} is a sample estimate of the proportion is stratum *h*.

Why do we care about the variance within each stratum? Stratum variance is needed to compute the standard error. And why do we care about the standard error? Read on.

### Computing Standard Error

The standard error is possibly the most important output from our analysis. It allows us to compute the margin of error and the confidence interval.

When we estimate a total from a stratified random sample, the standard error (SE) of the estimate is:

SE = sqrt { Σ [ N_{h}^{2}
* ( 1 - n_{h}/N_{h} )
* s_{h}^{2} / n_{h} ] }

where N_{h} is the number of elements from stratum h in the population,
n_{h} is the number of sample observations from stratum h,
and *s*^{2}_{h} is a sample estimate of the population variance in stratum h.

Think of the standard error as the standard deviation of a sample statistic. In survey sampling, there are usually many different subsets of the population that we might choose for analysis. Each different sample might produce a different estimate of the value of a population parameter. The standard error provides a quantitative measure of the variability of those estimates.

### Specifying Confidence Level

In survey sampling, different samples can be randomly selected from the same population; and each sample can often produce a different confidence interval. Some confidence intervals include the true population parameter; others do not.

A confidence level refers to the percentage of all possible samples that produce confidence intervals that include the true population parameter. For example, suppose all possible samples were selected from the same population, and a confidence interval were computed for each sample. A 95% confidence level implies that 95% of the confidence intervals would include the true population parameter.

As part of the analysis, survey researchers choose a confidence level. Probably, the most frequently chosen confidence level is 95%.

### Finding Critical Value

Often expressed as a t-score or a z-score, the critical value is a factor used to compute the margin of error. To find the critical value, follow these steps:

- Compute alpha (α): α = 1 - (confidence level / 100)
- Find the critical probability (p*): p* = 1 - α/2
- To express the critical value as a z-score, find the z-score having a cumulative probability equal to the critical probability (p*).
- To express the critical value as a t-score, follow these steps:
- Find the degrees of freedom
(df). To compute degrees of freedom for a stratified random sample, use this equation:
df = Σ ( n

_{h}- 1 )where n

_{h}is the number of sample observations from stratum*h*. - The critical t-score is the t statistic having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).

- Find the degrees of freedom
(df). To compute degrees of freedom for a stratified random sample, use this equation:

**Note:** You can use a t-score __or__ a z-score for the critical value. You don't have to compute both.
Researchers use a t-score when sample size is small; a z-score when it is large (at least 30).
You can use the Normal Distribution Calculator to find the critical z-score, and the
t Distribution Calculator to find the critical t-score.

### Computing Margin of Error

The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter.

Here is the formula for computing margin of error (ME):

ME = SE * CV

where SE is standard error, and CV is the critical value.

### Defining Confidence Interval

Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic. A confidence interval is an interval estimate combined with a probability statement.

Here is how to compute the minimum and maximum values for a confidence interval around an estimated population total.

CI_{min} = t - SE * CV

CI_{max} = t + SE * CV

where CI_{min} is the minimum value in the confidence interval, CI_{max} is the maximum value in the confidence interval,
t is the sample estimate of the population total, SE is the standard error, and CV is the critical value (either a z-score or a t-score). Thus,
the confidence interval is an interval estimate that ranges between CI_{min} and CI_{max}.

## Sample Problem

This section presents a sample problem that illustrates how to analyze survey data when the sampling method is proportionate stratified sampling.

## Sample Size Calculator

The analysis of data collected via stratified sampling can be complex and time-consuming. Stat Trek's Sample Size Calculator can help. The calculator computes standard error, margin of error, and confidence intervals. It assesses sample size requirements, estimates population parameters, and tests hypotheses. The calculator is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat Tools tab. Or you can tap the button below.

Sample Size Calculator**Problem 1**

In a small community of 1000 families, the local Animal Control Department conducts a survey of 24 families to estimate the number of four-legged pets (dogs, cats, pigs, etc.) living in the community. In this community, 200 families live in single-family homes, 300 families live in condos, and 500 families live in apartments. The department uses stratified sampling to randomly select eight families from each group for the study.

The number of four-legged pets from each sampled household is shown below:

Home type | Number of pets |
---|---|

Single-family | 0, 0, 1, 1, 2, 2, 2, 4 |

Condo | 0, 0, 0, 0, 0, 1, 1, 2 |

Apartment | 0, 0, 0, 0, 0, 0, 1, 1 |

Using sample data, estimate the total number of four-legged pets living in the community. Find the margin of error and the confidence interval. Assume a 95% confidence level.

*Solution:* To solve this problem, we follow the seven-step process described above.

- Estimate the population total. Before we can estimate the population total, we need to first estimate the sample mean for
each stratum. The formula for a stratum mean is:
Sample mean in stratum

*h*= x_{h}= Σx_{h}/ n_{h}where Σx

_{h}is the sum of all the sample observations in stratum*h*, and n_{h}is the number of sample observations in stratum h.Using the above formula, we can compute a sample mean for each home type:

Mean

_{single-family}= x_{s}= Σx_{s}/ n_{s}= 12/8 = 1.5Mean

_{condo}= x_{c}= Σx_{c}/ n_{c}= 4/8 = 0.5Mean

_{apartment}= x_{a}= Σx_{a}/ n_{a}= 2/8 = 0.25where Σx

Given the sample means within strata, we can estimate the population total (t) from the following formula:_{s}is the number of pets sampled from single-family homes, Σx_{c}is the number of pets sampled from condos, Σx_{a}is the number of pets sampled from apartments, n_{s}is the number of single-family homes in the sample, n_{c}is the number of condos in the sample, and n_{a}is the number of apartments in the sample.t = ΣN

_{h}* x_{h}t = 200 * 1.5 + 300 * 0.5 + 500 * 0.25 = 575

Therefore, based on sampled data, we estimate that there are 675 four-legged pets living in the community. - Compute sample variance within strata. We need to compute the sample variance within each stratum, so we can compute the standard error in the next step.
For single-family homes, the within-stratum sample variance (s
^{2}_{s}) is equal to:s

^{2}_{s}= Σ ( x_{i}- x_{s})^{2}/ ( n_{s}- 1 )s

^{2}_{s}= [ (0 - 1.5)^{2}+ (0 - 1.5)^{2}+ ... + (1 - 1.5)^{2}+ (2 - 1.5)^{2}] / 7s

^{2}_{s}= 1.714where x

The within-stratum sample variance for condos and apartments is computed similarly. It is equal to 0.571 for condos; and, 0.214 for apartments._{i}is the number of pets sampled from home*i*, x_{s}is the mean number of pets sampled from single-family homes, and n_{s}is the number of single-family homes in the sample. - Compute standard error. The standard error measures the variability of our sample estimate of the population total. We
will use standard error to compute the margin of error and to define a confidence level.
SE = sqrt { Σ [ N

_{h}^{2}* ( 1 - n_{h}/N_{h}) * s_{h}^{2}/ n_{h}] }SE = sqrt { [ (200)

^{2}* ( 1 - 8/200 ) * 1.714 / 8 ] + [ (300)^{2}* ( 1 - 8/300 ) * 0.571 / 8 ] + [ (500)^{2}* ( 1 - 8/500 ) * 0.214 / 8 ] }SE = sqrt { [ 40,000 * 0.96 * 0.214 ] + [ 90,000 * 0.973 * 0.0714 ] + [ 250,000 * 0.984 * 0.027 ] }

SE = sqrt ( 8217.6 + 6252.5 + 6642 ) = sqrt (21,112.1) = 145.3

Thus, the standard error of the sampling distribution of the total is 145.3. - Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 95% confidence level.
- Find the critical value. The critical value is a factor used to compute the margin of error. To find the critical value, we take these steps.
- Compute alpha (α):
α = 1 - (confidence level / 100)

α = 1 - 95/100 = 0.05

- Find the critical probability (p*):
p* = 1 - α/2 = 1 - 0.05/2 = 0.975

- Since the sample size (n = 24) is less than 30, we will express the critical value as a t-score with degrees of freedom (df) equal to:
df = Σ ( n

Thus, the critical value is the t-score with 21 degrees of freedom that has a cumulative probability equal to 0.975. From the t-Distribution Calculator, we find that the critical value is 2.08._{h}- 1 ) = 7 + 7 + 7 = 21

- Compute alpha (α):
- Compute the margin of error (ME):
ME = critical value * standard error

ME = 2.08 * 145.3 = 302.2

- Specify the confidence interval. The minimum and maximum values of the confidence interval are:
CI

_{min}= x - SE * CV = 575 - 145.3 * 2.08 = 272.8CI

_{max}= x + SE * CV = 575 + 145.3 * 2.08 = 877.2

In summary, here are the results of our analysis. Based on sample data, we estimate that 575 four-legged pets live in the community. Given a 95% confidence level, the margin of error around that estimate is 302.2; and the 95% confidence interval is 272.8 to 877.2.