Introduction to Survey Sampling

Sampling refers to the process of choosing a sample of elements from a total population of elements.

Probability vs. Non-Probability Sampling

Statisticians distinguish between two broad categories of sampling.

  • Probability sampling. With probability sampling, every element of the population has a known probability of being included in the sample.

  • Non-probability sampling. With non-probability sampling, we cannot specify the probability that each element will be included in the sample.

Each approach has advantages and disadvantages. The main advantages of non-probability sampling are convenience and cost. However, with non-probability samples, we cannot make probability statements about our sample statistics. For example, we cannot compute a confidence interval for an estimation problem or a region of acceptance for a hypothesis test.

Probability samples, in contrast, allow us to make probability statements about sample statistics. We can estimate the extent to which a sample statistic is likely to differ from a population parameter. The remainder of this tutorial focuses on probability sampling.

Quality of Survey Results

When researchers describe the quality of survey results, they may use one or more of the following terms.

  • Accuracy. Accuracy refers to how close a sample statistic is to a population parameter. Thus, if you know that a sample mean is 99 and the true population mean is 100, you can make a statement about the sample accuracy. For example, you might say the sample mean is accurate to within 1 unit.

  • Precision. Precision refers to how close estimates from different samples are to each other. For example, the standard error is a measure of precision. When the standard error is small, estimates from different samples will be close in value; and vice versa. Precision is inversely related to standard error. When the standard error is small, sample estimates are more precise; when the standard error is large, sample estimates are less precise.

  • Margin of error. The margin of error expresses the maximum expected difference between the true population parameter and a sample estimate of that parameter. To be meaningful, the margin of error should be qualified by a probability statement. For example, a pollster might report that 50% of voters will choose the Democratic candidate. To indicate the quality of the survey result, the pollster might add that the margin of error is +5%, with a confidence level of 90%. This means that if the same sampling method were applied to different samples, the true percentage of Democratic voters would fall within the margin of error 90% of the time.

    The margin of error is equal to half of the width of the confidence interval. In a previous lesson, the tutorial described how to construct a confidence interval.

Sample Design

A sample design can be described by two factors.

  • Sampling method. Sampling method refers to the rules and procedures by which some elements of the population are included in the sample. Some common sampling methods are described elsewhere in the tutorial (see simple random sampling, stratified sampling, and cluster sampling.)

  • Estimator. The estimation process for calculating sample statistics is called the estimator. Different sampling methods may use different estimators. For example, the formula for computing a mean score with a simple random sample is different from the formula for computing a mean score with a stratified sample. Similarly, the formula for the standard error may vary from one sampling method to the next.

The "best" sample design depends on survey objectives and on survey resources. For example, a researcher might select the most economical design that provides a desired level of precision. Or, if the budget is limited, a researcher might choose the design that provides the greatest precision without going over budget. Or other factors might guide the choice of sample design.