Statistics Tutorial: Patterns in Data
Graphical displays are useful for seeing patterns in data. Patterns
in data are commonly described in terms of:
center, spread, shape, and unusual features.
Center
Graphically, the center of a distribution is located at the
median
of the distribution. This is the point in a graphic display where
about half of the observations are on either side. In the chart to the
right, the height of each column indicates the frequency of observations.
Here, the observations are centered over 4.
Spread
The spread of a distribution refers to the variability of the
data. If the observations cover a wide
range, the spread is
larger. If the observations are clustered around a single value, the
spread is smaller.
Consider the figures above. In the figure on
the left, data values range from 3 to 7; whereas in the figure on the right,
values range from 1 to 9. The figure on the right is more
variable, so it has the greater spread.
Shape
The shape of a distribution is described by the following
characteristics.
- Symmetry. When it is graphed, a symmetric
distribution can be divided at the center so that each half is
a mirror image of the other.
- Number of peaks. Distributions can have few or
many peaks. Distributions with one clear peak are called
unimodal, and distributions with two clear
peaks are called bimodal. When a symmetric
distribution has a single peak at the center, it is referred
to as bell-shaped.
- Skewness. When they are displayed graphically,
some distributions have many more observations on one side
of the graph than the other. Distributions with most of
their observations on the left (toward lower values) are said to be
skewed right; and distributions with most of
their observations on the right (toward higher values) are said to be
skewed left.
- Uniform. When the observations in a set of
data are equally spread across the range of the distribution,
the distribution is called a uniform distribution.
A uniform distribution has no clear peaks.
Here are some examples of distributions and shapes.
|
|
|
|
|
Symmetric, unimodal, bell-shaped |
|
Skewed right |
|
Non-symmetric, bimodal |
| |
|
|
|
|
|
|
|
|
|
| Uniform |
|
Skewed left |
|
Symmetric, bimodal |
Unusual Features
Sometimes, statisticians refer to unusual features in a set of data.
The two most common unusual features are gaps and outliers.
- Gaps. Gaps refer to areas of a distribution
where there are no observations. The first figure below has
a gap; there are no observations in the middle of the
distribution.
- Outliers. Sometimes, distributions are
characterized by extreme values that differ greatly from
the other observations. These extreme values are called
outliers. The second figure below illustrates a distribution
with an outlier. Except for one lonely observation (the outlier
on the extreme right), all of the observations fall between 0 and 4.
As a "rule of thumb", an extreme value is often considered to be an
outlier if it is at least 1.5
interquartile ranges below the first
quartile (Q1), or
at least 1.5 interquartile ranges above the third quartile (Q3).
|