What is a Scatterplot?
A scatterplot is a graphic tool used to display
the relationship between two
quantitative variables.
How to Read a Scatterplot
A scatterplot consists of an X axis (the horizontal axis), a
Y axis (the vertical axis), and a series of dots.
Each dot on the scatterplot represents one observation
from a data set. The position of the dot on the scatterplot
represents its X and Y values.
Let's work through an example. Here is a
table showing the height and weight of five starters on a high
school basketball team.
Height, inches |
Weight, pounds |
67 |
155 |
72 |
220 |
77 |
240 |
74 |
195 |
69 |
175 |
|
And here is the same data displayed in a scatterplot.
Each player in the table is represented by a dot on the scatterplot.
The first dot, for example, represents the shortest, lightest player.
From the scale on the X axis, you see that the shortest player is 67
inches tall; and from the scale on the Y axis, you see that he/she
weighs 155 pounds. In a similar way, you can read the height and
weight of every other player represented on the scatterplot.
Patterns of Data in Scatterplots
Scatterplots are used to analyze patterns in
bivariate data.
These patterns are described in terms of linearity, slope, and
strength.
- Strength refers to the degree of "scatter" in the plot. If the
dots are widely spread, the relationship between variables is
weak. If the dots are concentrated around a line, the
relationship is strong.
Additionally, scatterplots can reveal unusual features in data sets, such
as clusters, gaps, and
outliers.
The scatterplots below illustrate some common patterns.
Linear, positive slope, weak
Linear, zero slope, strong
Linear, negative slope, strong,
with outlier
Nonlinear, positive slope, weak
Nonlinear, negative slope, strong,
with gap
Nonlinear, zero slope, weak
The pattern in the last example (nonlinear, zero slope, weak) is
the pattern that is found when two variables are not related.
Test Your Understanding
Problem 1
The scatterplot below shows the relation between two variables.
Which of the following statements are true?
I. The relation is strong.
II. The slope is positive.
III. The slope is negative.
(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III
Solution
The correct answer is (A). The relation is strong because the dots
are tightly clustered around a line. Note that a line does not
have to be straight for a relationship to be strong. In this case,
the line is U-shaped.
Across the entire scatterplot, the slope is zero. In the first half
of the scatterplot, the Y variable gets smaller as the X variable gets
bigger; so the slope in the first half of the scatterplot is
negative. But in the second half of the scatterplot, just the
opposite occurs. The Y variable gets bigger as the X variable gets
bigger; so the slope in the second half is positive. When the
slope is positive in one half of a
symmetric scatterplot and negative in the
other half, the slope for the entire scatterplot is zero.