Teach yourself statistics

Teach yourself statistics

Populations and Samples

The study of statistics revolves around the study of datasets. This lesson describes two important types of datasets - populations and samples. Along the way, we'll introduce simple random sampling, the main method used in this tutorial to select samples.

Population vs Sample

The main difference between a population and sample has to do with how observations are assigned to the dataset.

A population includes all of the elements from a set of data.
A sample consists of one or more observations drawn from the population.

Depending on the sampling method, a sample can have fewer observations than the population, the same number of observations, or more observations. More than one sample can be derived from the same population.

Other differences have to do with nomenclature, notation, and computations. For example,

A measurable characteristic of a population, such as a mean or standard deviation, is called a parameter; but a measurable characteristic of a sample is called a statistic.
We will see in future lessons that the mean of a population is denoted by the symbol μ; but the mean of a sample is denoted by the symbol x.
We will also learn in future lessons that the formula for the standard deviation of a population is different from the formula for the standard deviation of a sample.

What is Simple Random Sampling?

A sampling method is a procedure for selecting sample elements from a population. Simple random sampling refers to a sampling method that has the following properties.

The population consists of N objects.
The sample consists of n objects.
All possible samples of n objects are equally likely to occur.

An important benefit of simple random sampling is that it allows researchers to use statistical methods to analyze sample results. For example, given a simple random sample, researchers can use statistical methods to define a confidence interval around a sample mean or to test hypotheses about population parameters. This kind of statistical analysis is weak when non-random sampling methods are used.

There are many ways to obtain a simple random sample. One way would be the lottery method. Each of the N population members is assigned a unique number. The numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population members having the selected numbers are included in the sample.

Sampling With Replacement and Without Replacement

Suppose we use the lottery method described above to select a simple random sample. After we pick a number from the bowl, we can put the number aside or we can put it back into the bowl. If we put the number back in the bowl, it may be selected more than once; if we put it aside, it can be selected only one time.

When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement.

Test Your Understanding

Problem 1

Which of the following statements are true?

I. The mean of a population is denoted by x.
II. Sample size is never bigger than population size.
III. The population mean is a statistic.

(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.

Solution

The correct answer is (E), none of the above.

The mean of a population is denoted by μ; the mean of a sample is denoted by x. When sampling with replacement, sample size can be greater than population size. And the population mean is a parameter; the sample mean is a statistic.

Last lesson Next lesson