Statistics Tutorial: Introduction to Survey Sampling
Sampling refers to the process of choosing a
sample of elements from a total
population of elements.
Probability vs. Non-Probability Sampling
Statisticians distinguish between two broad categories of sampling.
-
Probability sampling. With probability sampling, every element
of the population has a known probability of being included in the sample.
-
Non-probability sampling. With non-probability sampling, we
cannot specify the probability that each element will be included in the
sample.
Each approach has advantages and disadvantages. The main advantages of
non-probability sampling are convenience and cost. However, with
non-probability samples, we cannot make probability statements about our sample
statistics. For example, we cannot compute a
confidence interval for an estimation problem or a
region of acceptance for a hypothesis test.
Probability samples, in contrast, allow us to make probability statements about
sample statistics. We can estimate the extent to which a sample
statistic is likely to differ from a population
parameter. The remainder of this tutorial focuses on probability
sampling.
Quality of Survey Results
When researchers describe the quality of survey results, they may use one or
more of the following terms.
-
Accuracy. Accuracy refers to how close a sample
statistic is to a population
parameter. Thus, if you know that a sample mean is 99 and the true
population mean is 100, you can make a statement about the sample accuracy. For
example, you might say the sample mean is accurate to within 1 unit.
-
Precision. Precision refers to how close estimates from
different samples are to each other. For example, the
standard error is a measure of precision. When the standard error is
small, estimates from different samples will be close in value; and vice versa.
Precision is inversely related to standard error. When the standard error is
small, sample estimates are more precise; when the standard error is large,
sample estimates are less precise.
-
Margin of error. The margin of error expresses the maximum
expected difference between the true population parameter and a sample estimate of that
parameter. To be meaningful, the margin of error should be qualified by a probability
statement. For example, a pollster might report that 50% of voters will choose
the Democratic candidate. To indicate the quality of the survey result, the
pollster might add that the margin of error is +5%, with a
confidence level of 90%. This means that if the same sampling method were
applied to different samples, the true percentage of Democratic voters
would fall within the margin of error 90% of the time.
The margin of error is equal to half of the width of the
confidence interval In a previous lesson, the tutorial
described
how to construct a confidence interval.
Sample Design
A sample design can be described by two factors.
-
Sampling method. Sampling method refers to the rules and
procedures by which some elements of the population are included in the sample.
Some common sampling methods are described elsewhere in the tutorial (see
simple random sampling, stratified sampling,
and cluster sampling.)
-
Estimator. The estimation process for calculating sample
statistics is called the estimator. Different sampling methods may use
different estimators. For example, the formula for computing a mean score with
a simple random sample is different from the formula for computing a mean score
with a stratified sample. Similarly, the formula for the
standard error may vary from one sampling method to the next.
The "best" sample design depends on survey objectives and on survey resources.
For example, a researcher might select the most economical design that provides
a desired level of precision. Or, if the budget is limited, a researcher might
choose the design that provides the greatest precision without going over
budget. Or other factors might guide the choice of sample design.
|