Excel Project #2
We will work with payroll data (total of the
players' salaries on a team) for 2007 Major League Baseball teams.
Preliminary Writeup
Before you do any computations with Excel, what percent of teams
do you expect to have below-average payrolls? And what percent of teams
do you expect to have payrolls below the median? Do you expect the data
to be normally distributed?
Now, suppose the data is normally distributed. What percent of the teams
would have salaries below the 84th percentile? What value would
the 84th percentile be for a normally distributed dataset with
average $82.63 million and an SD of $33.35 million?
On the Computer
Copy the 2007 baseball payroll data below into a spreadsheet program:
2007
Baseball payroll (salaries-only list): Move these into the spreadsheet.
- Create a frequency histogram (Excel's standard histogram) for the payroll data using
class intervals of length $20,000,000,
up to $200,000,000.
- Compute the value of the average, median, 84th percentile
(search Excel for the percentile function),
and standard deviation (remember, STDEVP) for the salaries. (The Excel function for what we
call SD is "stdevp" [the "p" stands for population]. The function "stdev"
gives what we will later in the course call "SD+", the "sample
standard deviation".) Which team would you say lies in the
84th percentile?
- Determine the percentage of teams with below average payroll.
- Compare the actual data with your predictions. Is payroll data for baseball
players approximately normally distributed? Would you say
that the payroll data is symmetric,
left-sided, or right-sided. Does the rule, the average
follows the tail seem to be valid in this situation?
- Find the correlation coefficient between payroll and number
of wins. How would you describe the correlation?
- Create a scatterplot for payroll versus number of wins.
Let payroll be on the x-axis.
Is the
use of the correlation coefficient valid, based on your
scatterplot? While you're at it, insert the regression
line in the scatterplot.
- What would you predict for the payroll of a
team with 88 wins? There are 3 teams with 88 wins.
Which teams comes closest to your prediction?
- There seems to be one outlier in the scatterplot. Remove
this data point and recalculate the correlation coefficient.
What do you get? Did the correlation coefficient change in
the way (both in size and direction) you expected? Explain.
If you have trouble with the spreadsheet program,
consult
Excel
Basics.
Last revised August, 2009. Mail to
arobertson@colgate.edu
Copyright 2009 © Colgate University. All rights reserved.