Excel Project #2

We will work with payroll data (total of the players' salaries on a team) for 2007 Major League Baseball teams.

Preliminary Writeup
Before you do any computations with Excel, what percent of teams do you expect to have below-average payrolls? And what percent of teams do you expect to have payrolls below the median? Do you expect the data to be normally distributed?

Now, suppose the data is normally distributed. What percent of the teams would have salaries below the 84th percentile? What value would the 84th percentile be for a normally distributed dataset with average $82.63 million and an SD of $33.35 million?

On the Computer
Copy the 2007 baseball payroll data below into a spreadsheet program:
2007 Baseball payroll (salaries-only list): Move these into the spreadsheet.

  1. Create a frequency histogram (Excel's standard histogram) for the payroll data using class intervals of length $20,000,000, up to $200,000,000.
  2. Compute the value of the average, median, 84th percentile (search Excel for the percentile function), and standard deviation (remember, STDEVP) for the salaries. (The Excel function for what we call SD is "stdevp" [the "p" stands for population]. The function "stdev" gives what we will later in the course call "SD+", the "sample standard deviation".) Which team would you say lies in the 84th percentile?
  3. Determine the percentage of teams with below average payroll.
  4. Compare the actual data with your predictions. Is payroll data for baseball players approximately normally distributed? Would you say that the payroll data is symmetric, left-sided, or right-sided. Does the rule, the average follows the tail seem to be valid in this situation?
  5. Find the correlation coefficient between payroll and number of wins. How would you describe the correlation?
  6. Create a scatterplot for payroll versus number of wins. Let payroll be on the x-axis. Is the use of the correlation coefficient valid, based on your scatterplot? While you're at it, insert the regression line in the scatterplot.
  7. What would you predict for the payroll of a team with 88 wins? There are 3 teams with 88 wins. Which teams comes closest to your prediction?
  8. There seems to be one outlier in the scatterplot. Remove this data point and recalculate the correlation coefficient. What do you get? Did the correlation coefficient change in the way (both in size and direction) you expected? Explain.
If you have trouble with the spreadsheet program, consult Excel Basics.


Last revised August, 2009. Mail to arobertson@colgate.edu
Copyright 2009 © Colgate University. All rights reserved.