How to Analyze Stratified Random Samples

In this lesson, we describe how to analyze survey data from stratified random samples.

Notation

The following notation is helpful, when we talk about analyzing data from stratified samples.

  • H: The number of strata in the population.
  • N: The number of observations in the population.
  • Nh: The number of observations in stratum h of the population.
  • Ph: The true proportion in stratum h of the population.
  • σ2: The known variance of the population.
  • σ: The known standard deviation of the population.
  • σh: The known standard deviation in stratum h of the population.
  • x: The sample estimate of the population mean.
  • xh: The mean of observations from stratum h of the sample.
  • ph: The proportion of successes in stratum h of the sample.
  • sh: The sample estimate of the population standard deviation in stratum h.
  • sh2: The sample estimate of the population variance in stratum h.
  • n: The number of observations in the sample.
  • nh: The number of observations in stratum h of the sample.
  • SD: The standard deviation of the sampling distribution.
  • SE: The standard error. (This is an estimate of the standard deviation of the sampling distribution.)
  • Σ: Summation symbol. ( To illustrate the use of the symbol, Σ xh = x1 + x2 + ... + xH-1 + xH )

How to Analyze Data From Stratified Samples

When it comes to analyzing data from stratified samples, there is good new and there is bad news.

First, the bad news. Different sampling methods use different formulas to estimate population parameters and to estimate standard errors. The formulas that we have used so far in this tutorial work for simple random samples, but they are not right for stratified samples.

Now, the good news. Once you know the correct formulas, you can readily estimate population parameters and standard errors. And once you have the standard error, the procedures for computing other things (e.g., margin of error, confidence interval, and region of acceptance) are largely the same for stratified samples as for simple random samples. The next two sections provide formulas that can be used with stratified sampling. The sample problem at the end of this lesson shows how to use these formulas to analyze data from stratified samples.

Measures of Central Tendency

The table below shows formulas that can be used with stratified sampling to estimate a population mean and a population proportion.

Population parameter Formula for sample estimate
Mean Σ( Nh / N ) * xh
Proportion Σ( Nh / N ) * ph

Note that Nh/N is the sampling fraction. Thus, to compute a sample estimate of the population mean or population proportion, we need to know the sampling fraction (i.e., we need to know the relative size of each stratum).

The Variability of the Estimate

The precision of a sample design is directly related to the variability of the estimate, which is measured by the standard deviation or standard error. The tables below show how to compute the standard deviation (SD) and standard error (SE), assuming that the sample method is stratified random sampling.

The first table shows how to compute the varibility for a mean score. Note that the table shows four sample designs. In two of the designs, the true population variance is known; and in two, it is estimated from sample data. Also, in two of the designs, the researcher sampled with replacement; and in two, without replacement.

Population variance Replacement strategy Variability
Known With replacement SD = (1 / N) * sqrt [ Σ ( Nh2 * σh2 / nh ) ]
Known Without replacement SD = (1 / N) * sqrt { Σ [ Nh3/( Nh - 1) ] * ( 1 - nh / Nh ) * σh2 / nh }
Estimated With replacement SE = (1 / N) * sqrt [ Σ ( Nh2 * sh2 / nh ) ]
Estimated Without replacement SE = (1 / N) * sqrt { Σ [ Nh2 * ( 1 - nh/Nh ) * sh2 / nh ] }

The next table shows how to compute the variability for a proportion. Like the previous table, this table shows four sample designs. In this case, however, the designs are based on whether the true population proportion is known and whether the design calls for sampling with or without replacement.

Population
proportion
Replacement
strategy
Variability
Known With
replacement
SD = (1 / N) * sqrt { Σ [ Nh2 * Ph * ( 1 - Ph ) / nh ] }
Known Without
replacement
SD = (1 / N) * sqrt ( Σ { [ Nh3/( Nh - 1) ] * ( 1 - nh / Nh ) * Ph * ( 1 - Ph ) / nh } )
Estimated With
replacement
SE = (1 / N) * sqrt { Σ [ Nh2 * ph * ( 1 - ph ) / ( nh - 1 ) ] }
Estimated Without
replacement
SE = (1 / N) * sqrt { Σ [ Nh2 * ( 1 - nh/Nh ) * ph * ( 1 - ph ) / ( nh - 1 ) ] }

Sample Problem

This section presents a sample problem that illustrates how to analyze survey data when the sampling method is proportionate stratified sampling. (In a subsequent lesson, we re-visit this problem and see how stratified sampling compares to other sampling methods.)

Sample Planning Wizard

The analysis of data collected via stratified random sampling can be complex and time-consuming. Stat Trek's Sample Planning Wizard can help. The Wizard computes survey precision, sample size requirements, costs, etc., as well as estimates population parameters and tests hypotheses. It also creates a summary report that lists key findings and documents analytical techniques. Whenever you work with stratified random samples, consider using the Sample Planning Wizard. The Sample Planning Wizard is a premium tool available only to registered users. > Learn more

Register Now View Demo View Wizard

Problem 1

At the end of every school year, the state administers a reading test to a sample of third graders. The school system has 20,000 third graders, half boys and half girls.

This year, a proportionate stratified sample was used to select 36 students for testing. Because the population is half boy and half girl, one stratum consisted of 18 boys; the other, 18 girls. Test scores from each sampled student are shown below:

Boys 50, 55, 60, 62, 62, 65, 67, 67, 70, 70, 73, 73, 75, 78, 78, 80, 85, 90
Girls 70, 70, 72, 72, 75, 75, 78, 78, 80, 80, 82, 82, 85, 85, 88, 88, 90, 90

Using sample data, estimate the mean reading achievement level in the population. Find the margin of error and the confidence interval. Assume a 95% confidence level.

Solution: Previously we described how to compute the confidence interval for a mean score. We follow that process below.

  • Identify a sample statistic. For this problem, we use the overall sample mean to estimate the population mean. To compute the overall sample mean, we need to compute the sample means for each stratum. The stratum mean for boys is equal to:

    xboys = Σ ( xi ) / n
    xboys = ( 50 + 55 + 60 + ... + 80 + 85 + 90 ) / 18 = 70

    The stratum mean for girls is computed similarly. It is equal to 80. Therefore, overall sample mean is:

    x = Σ( Nh / N ) * xh
    x = ( 10,000 / 20,000 ) * 70 + ( 10,000 / 20,000 ) * 80 = 75

    Therefore, based on data from the sample strata, we estimate that the mean reading achievement level in the population is equal to 75.

  • Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 95% confidence level.

  • Find the margin of error. Elsewhere on this site, we show how to compute the margin of error when the sampling distribution is approximately normal. The key steps are shown below.

    • Find standard error of the sampling distribution. First, we estimate the variance of the test scores (sh2) within each stratum. And then, we compute the standard error (SE). For boys, the within-stratum sample variance is equal to:

      sh2 = Σ ( xi - xh )2 / ( n - 1 )
      sh2 = [ (50 - 70)2 + (55 - 70)2 + (60 - 70)2 + ... + (85 - 70)2 + (90 - 70)2 ] / 17 = 105.41

      The within-stratum sample variance for girls is computed similarly. It is equal to 45.41.

      Using results from the above computations, we compute the standard error (SE):

      SE = (1 / N) * sqrt { Σ [ Nh2 * ( 1 - nh/Nh ) * sh2 / nh ] }
      SE = (1 / 20,000) * sqrt { [ 100,000,000 * ( 1 - 18/10,000 ) * 105.41 / 18 ] + [ 100,000,000 * ( 1 - 18/10,000 ) * 45.41 / 18 ] }
      SE = (1 / 20,000) * sqrt { 99,820,000 * 105.41 / 18 ] + [ 99,820,000 * 45.41 / 18 ] } = 1.45

      Thus, the standard error of the sampling distribution of the mean is 1.45.

    • Find critical value. The critical value is a factor used to compute the margin of error. Based on the central limit theorem, we can assume that the sampling distribution of the mean is normally distributed. Therefore, we express the critical value as a z score. To find the critical value, we take these steps.

      • Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 95/100 = 0.05
      • Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975
      • The critical value is the z score having a cumulative probability equal to 0.975. From the Normal Distribution Calculator, we find that the critical value is 1.96.

    • Compute margin of error (ME): ME = critical value * standard error = 1.96 * 1.45 = 2.84

  • Specify the confidence interval. The range of the confidence interval is defined by the sample statistic + margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 95% confidence interval is 72.16 to 77.84. And the margin of error is equal to 2.84. That is, we are 95% confident that the true population mean is in the range defined by 75 + 2.84.