Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


What is a Scatterplot?

A scatterplot is a graphic tool used to display the relationship between two quantitative variables.

How to Read a Scatterplot

A scatterplot consists of an X axis (the horizontal axis), a Y axis (the vertical axis), and a series of dots. Each dot on the scatterplot represents one observation from a data set. The position of the dot on the scatterplot represents its X and Y values.

Let's work through an example. Here is a table showing the height and weight of five starters on a high school basketball team.

Height, inches Weight, pounds
67 155
72 220
77 240
74 195
69 175

And here is the same data displayed in a scatterplot.

Scatterplot

Each player in the table is represented by a dot on the scatterplot. The first dot, for example, represents the shortest, lightest player. From the scale on the X axis, you see that the shortest player is 67 inches tall; and from the scale on the Y axis, you see that he/she weighs 155 pounds. In a similar way, you can read the height and weight of every other player represented on the scatterplot.

Patterns of Data in Scatterplots

Scatterplots are used to analyze patterns in bivariate data. These patterns are described in terms of linearity, slope, and strength.

  • Linearity refers to whether a data pattern is linear (straight) or nonlinear (curved).
  • Slope refers to the direction of change in variable Y when variable X gets bigger. If variable Y also gets bigger, the slope is positive; but if variable Y gets smaller, the slope is negative.
  • Strength refers to the degree of "scatter" in the plot. If the dots are widely spread, the relationship between variables is weak. If the dots are concentrated around a line, the relationship is strong.

Additionally, scatterplots can reveal unusual features in data sets, such as clusters, gaps, and outliers. The scatterplots below illustrate some common patterns.

scatterplot

Linear, positive slope, weak

scatterplot

Linear, zero slope, strong

scatterplot

Linear, negative slope, strong,
with outlier

scatterplot

Nonlinear, positive slope, weak

scatterplot

Nonlinear, negative slope, strong,
with gap

scatterplot

Nonlinear, zero slope, weak

The pattern in the last example (nonlinear, zero slope, weak) is the pattern that is found when two variables are not related.

Test Your Understanding

Problem 1

The scatterplot below shows the relation between two variables.

scatterplot

Which of the following statements are true?

I. The relation is strong.
II. The slope is positive.
III. The slope is negative.

(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III

Solution

The correct answer is (A). The relation is strong because the dots are tightly clustered around a line. Note that a line does not have to be straight for a relationship to be strong. In this case, the line is U-shaped.

Across the entire scatterplot, the slope is zero. In the first half of the scatterplot, the Y variable gets smaller as the X variable gets bigger; so the slope in the first half of the scatterplot is negative. But in the second half of the scatterplot, just the opposite occurs. The Y variable gets bigger as the X variable gets bigger; so the slope in the second half is positive. When the slope is positive in one half of a symmetric scatterplot and negative in the other half, the slope for the entire scatterplot is zero.