Unit 5: Distributions and the Normal curve

Text reading

Read chapters 16, 5, 17, and 18 (in that order) in the text.


1) From the book:
Chapter 16, pages 285-286: 1-4, 6-8, 10
Chapter 5, pages 93-96: 1-4, 8, 9, 11
Chapter 17, pages 304-306: 1, 2, 4, 7-9, 12
Chapter 18, pages 327-329: 2, 4-7, 12-14

2) Worksheet


Rick Weiss and Ka Leo O Hawaii, "Class status linked to determining of IQ", University Wire, Sept. 10, 2003.
Available from LexisNexis here.

Note: You may experience a browser crash with Safari on a Mac. Try Firefox if you have problems.

Use the following to search:
Text Box: Class status linked to determining of IQ
Source: All News (English, Full Text)
Date: Previous 10 years

Possible essay questions

Remark: A common IQ test produces scores that follow a bell (Normal) curve with average 100 and SD 15.

FYI: Here is an online IQ test.

Computer project

This week we work with payroll data (total of the players' salaries on a team) for 2007 Major League Baseball teams.

Preliminary Writeup
Before you do any computations with Excel, what percent of teams do you expect to have below-average payrolls? And what percent of teams do you expect to have payrolls below the median? Do you expect the data to be normally distributed?

Now, suppose the data is normally distributed. What percent of the teams would have salaries below the 84th percentile? What value would the 84th percentile be for a normally distributed dataset with average $82.63 million and an SD of $33.35 million?

On the Computer
Copy the 2007 baseball payroll data below into a spreadsheet program:
2007 Baseball payroll (salaries-only list): Move these into the spreadsheet.

  1. Create a histogram for the payroll data using class intervals of length $20,000,000, up to $200,000,000.
  2. Compute the value of the average, median, mode, 84th percentile and standard deviation for the salaries. (The Excel function for what we call SD is "stdevp" [the "p" stands for population]. The function "stdev" gives what we will later in the course call "SD+", the "sample standard deviation".) Which team would you say lies in the 84th percentile?
  3. Determine the percentage of teams with below average payroll.
  4. Compare the actual data with your predictions. Is payroll data for baseball players approximately normally distributed? Would you say that the payroll data is symmetric, left-sided, or right-sided. Does the rule, the average follows the tail seem to be valid in this situation?
  5. Find the correlation coefficient between payroll and number of wins. How would you describe the correlation?
  6. Create a scatterplot for payroll versus number of wins. Let payroll be on the x-axis. Is the use of the correlation coefficient valid, based on your scatterplot? While you're at it, insert the regression line in the scatterplot.
  7. What would you predict for the payroll of a team with 88 wins? There are 3 teams with 88 wins. Which teams comes closest to your prediction?
  8. There seems to be one outlier in the scatterplot. Remove this data point and recalculate the correlation coefficient. What do you get? Did the correlation coefficient change in the way (both in size and direction) you expected? Explain.
If you have trouble with the spreadsheet program, consult the supplement Using Excel 1: Excel Basics.

Last revised February, 2008. Mail to arobertson@mail.colgate.edu
Copyright 2008 © Colgate University. All rights reserved.