Stat Trek Teach yourself statistics Contact Us   |   Tell a Friend   |   Newsletter
 
  Tutorials  
  AP Statistics  
  Stat Tables  
  Stat Tools  
  Calculators  
  Books  
  Help  
   
   
 

Statistics Tutorial: Comparing Distributions

Common graphical displays (e.g., dotplots, boxplots, stemplots, bar charts) can be effective tools for comparing data from two or more populations.

How to Compare Distributions

When you compare two or more data sets, focus on four features:

  • Center. Graphically, the center of a distribution is the point where about half of the observations are on either side.
  • Spread. The spread of a distribution refers to the variability of the data. If the observations cover a wide range, the spread is larger. If the observations are clustered around a single value, the spread is smaller.
  • Shape. The shape of a distribution is described by symmetry, skewness, number of peaks, etc.
  • Unusual features. Unusual features refer to gaps (areas of the distribution where there are no observations) and outliers.

The remainder of this lesson shows how to interpret various graphs in terms of center, spread, shape, and unusual features. This is a skill that will probably be tested on the Advanced Placement (AP) Statistics Exam.

Dotplots

*
*
*
*
*

*
*
*
*


*
*
*



*
*




*
'
'
'
'
'
'
'
Block A



*
*


*
*
*

*
*
*
*


*
*
*




*




*




*
'
'
'
'
'
'
'
Block B
0
1
2
3
4
5
6

When dotplots are used to compare distributions, they are positioned one above the other, using the same scale of measurement, as shown on the right.

The dotplot on the right shows pet ownership in homes on two city blocks. Pet ownership is a little lower in block A. In block A, most households have zero or one pet; in block B, most households have two or more pets. In block A, pet ownership is skewed right; in block B, it is roughly bell-shaped. In block B, pet ownership ranges from 0 to 6 pets per household versus 0 to 4 pets in block A; so there is more variability in the block B distribution. There are no outliers or gaps in either data set.

Back-to-Back Stemplots

Boys
 
Girls
7
1
1 4 6
4 5 8
1 2 2 2 8 9
3 4 7 9
2 5 8
1 3
0
1
2
3
4
5
6
7

1
2 6 8
3 4 4 6 6 8 9
4 3 6
4

The back-to-back stemplots are another graphic option for comparing data from two populations. The center of a back-to-back stemplot consists of a column of stems, with a vertical line on each side. Leaves representing one data set extend from the right, and leaves representing the other data set extend from the left.

The back-to-back stemplot on the right shows the amount of cash (in dollars) carried by a random sample of teenage boys and girls. The boys carried more cash than the girls - a median of $42 for the boys versus $36 for the girls. Both distributions were roughly bell-shaped, although there was more variation among the boys. And finally, there were neither gaps nor outliers in either group.

Parallel Boxplots

Control group
         
   
Treatment group
         
     
 
                 
2 4 6 8 10 12 14 16

With parallel boxplots (aka, side-by-side boxplots), data from two distributions are displayed on the same chart, using the same measurement scale.

The boxplot to the right summarizes results from a medical study. The treatment group received an experimental drug to relieve cold symptoms, and the control group received a placebo. The boxplot shows the number of days each group continued to report symptoms.

Neither distribution has unusual features, such as gaps or outliers. Both distributions are skewed to the right, although the skew is more prominent in the treatment group. Patient response was slightly less variable in the treatment group than in the control group. In the treatment group, cold symptoms lasted 1 to 14 days (range = 13) versus 3 to 17 days (range = 14) for the control group. The median recovery time is more telling - about 5 days for the treatment group versus about 9 days for the control group. It appears that the drug had a positive effect on patient recovery.

Double Bar Charts

A double bar chart is similar to a regular bar chart, except that it provides two pieces of information for each category rather than just one. Often, the charts are color-coded with a different colored bar representing each piece of information.

To the right, a double bar chart shows customer satisfaction ratings for different cars, broken out by gender. The blue rows represent males; the red rows, females.

Both groups prefer the Japanese cars to the American cars, with Honda receiving the highest ratings and Ford receiving the lowest ratings. Moreover, both genders agree on the rank order in which the cars are rated. As a group, the men seem to be tougher raters; they gave lower ratings to each car than the women gave.

Test Your Understanding of This Lesson

Problem

College
 
High school
7
3 6 6
1 2 3 4
6 8 8 9
2 8


3
0
1
2
3
4
5
6
7

0 0 3 5
1 2 4 4 6
1 8 9
0 1


The back-to-back stemplot on the right shows the number of books read in a year by a random sample of college and high school students. Which of the following statements are true?

I. Seven college students did not read any books.
II. The college median is equal to the high school median.
III. The mean is greater than the median in both groups.

(A) I only
(B) II only
(C) III only
(D) I and II
(E) II and III

Solution

The correct answer is (E). None of the college students failed to read a book during the year; the fewest read was seven. In both groups, the median is equal to 24. And the mean number of books read per year is 25.3 for high school students versus 30.4 for college students; so the mean is greater than the median in both groups.



    
HP 39G+ Graphing Calculator
List Price: $79.99
Buy Used: $39.00
Buy New: $59.00

Approved for AP Statistics and Calculus

5 Steps to a 5 on the AP: Statistics
Duane C Hinders
List Price: $16.95
Buy Used: $0.58
Buy New: $9.99



Cartoon Guide to Statistics
Larry Gonick, Woollcott Smith
List Price: $17.95
Buy Used: $4.35
Buy New: $12.50

Sampling: Design and Analysis
Sharon L. Lohr
List Price: $195.95
Buy Used: $60.59
Buy New: $143.71

Statistics for Dummies
Deborah Rumsey
List Price: $19.99
Buy Used: $5.19
Buy New: $13.59

Mathematical Methods in Sample Surveys (Series on Multivariate Analysis, Vol 3)
Howard G. Tucker
List Price: $30.00
Buy Used: $14.00
Buy New: $30.00


Site Information

About Us       Site Map       Privacy Policy       Terms of Use       Resources       Advertising   
The contents of this webpage are copyright © 2009 StatTrek.com. All Rights Reserved.