Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics

What Is Cluster Sampling?

Cluster sampling refers to a sampling method that has the following properties.

  • The population is divided into N groups, called clusters.
  • The researcher randomly selects n clusters to include in the sample.
  • The number of observations within each cluster Mi is known, and M = M1 + M2 + M3 + ... + MN-1 + MN.
  • Each element of the population can be assigned to one, and only one, cluster.

This tutorial covers two types of cluster sampling methods.

  • One-stage sampling. All of the elements within selected clusters are included in the sample.
  • Two-stage sampling. A subset of elements within selected clusters is randomly selected for inclusion in the sample.

Cluster Sampling: Advantages and Disadvantages

Assuming the sample size is constant across sampling methods, cluster sampling generally provides less precision than either simple random sampling or stratified sampling. This is the main disadvantage of cluster sampling.

Given this disadvantage, it is natural to ask: Why use cluster sampling? Sometimes, the cost per sample point is less for cluster sampling than for other sampling methods. Given a fixed budget, the researcher may be able to use a bigger sample with cluster sampling than with the other methods. When the increased sample size is sufficient to offset the loss in precision, cluster sampling may be the best choice.

When to Use Cluster Sampling

Cluster sampling should be used only when it is economically justified - when reduced costs can be used to overcome losses in precision. This is most likely to occur in the following situations.

  • Constructing a complete list of population elements is difficult, costly, or impossible. For example, it may not be possible to list all of the customers of a chain of hardware stores. However, it would be possible to randomly select a subset of stores (stage 1 of cluster sampling) and then interview a random sample of customers who visit those stores (stage 2 of cluster sampling).
  • The population is concentrated in "natural" clusters (city blocks, schools, hospitals, etc.). For example, to conduct personal interviews of operating room nurses, it might make sense to randomly select a sample of hospitals (stage 1 of cluster sampling) and then interview all of the operating room nurses at that hospital. Using cluster sampling, the interviewer could conduct many interviews in a single day at a single hospital. Simple random sampling, in contrast, might require the interviewer to spend all day traveling to conduct a single interview at a single hospital.

Even when the above situations exist, it is often unclear which sampling method should be used. Test different options, using hypothetical data if necessary. Choose the most cost-effective approach; that is, choose the sampling method that delivers the greatest precision for the least cost. (We will talk more about this in a future lesson.)

The Difference Between Strata and Clusters

Although strata and clusters are both non-overlapping subsets of the population, they differ in several ways.

  • All strata are represented in the sample; but only a subset of clusters are in the sample.
  • With stratified sampling, the best survey results occur when elements within strata are internally homogeneous. However, with cluster sampling, the best results occur when elements within clusters are internally heterogeneous.