Computer Project

Computer Project:

**This is to be turn in separately from your daily homework -- not on the same sheet of paper.**

This week we work with salary data for 2002 Major League Baseball players... again.

Preliminary Writeup
Recall the 2002 Baseball salaries from the first computer project. What was the population average and population standard deviation (stdevp) that you found? (It's very important that you use stdevp here, not stdev.)

For samples of size n=30, what would the mean (EV_ave) and standard deviation (SE_ave) for the sampling distribution be? What would you expect the shape of the sampling distribution to be?

On the Computer
(You might need to turn on auto update.)
Copy the 2002 baseball salary data below into a spreadsheet program:
2002 Baseball salaries (salaries-only list): Move these into the spreadsheet in Column A and label Column A "Salary" (A1).
These are the salary data we used for the first computer project.

Label and find the average and population standard deviation SD (stdevp) for the salaries. Place these in Column C.
Label Column B "Random" (B1). Type "=rand()" into cell B2 (without the quotes) and drag and copy for each salary. You should have a random number next to each salary.
Set up a cell that finds the average for the first 30 salaries. Place this in Column C.
Select Column A and Column B. Click Data, then Sort. A window should pop up. Under Column, make sure that Random is selected (not Salary). Click OK. This will rearrange both columns but sort Column B, and the average from the last step will be an average for a random sample of 30 salaries. Type the average into Column D. Label Column D "Sample Averages" (D1).
Do this 50 times so that you have a list of 50 sample averages from the salaries. (We need to have enough sample averages so that we can see something interesting happen. If we had fewer sample averages, then we might not have enough to see anything.)
Calculate the average and standard deviation (stdevp) of the list of sample averages. How do these compare with your original calculations for EV_ave and SE_ave in your preliminary writeup? How do you think the values of EV_ave and SE_ave would change if we had calculated more than just 50 sample averages (same sample size of n=30)?
Create a frequency table (like on Page 6 of the class notes) for the sample averages. Start with $1,000,000. Choose class intervals of length $200,000, up to maybe $4,000,000. Draw a histogram for this frequency table. Since the class intervals are so large, you do not have to adjust for height. You can just use the percentage directly for the height.
Describe the shape of your histogram. Is this what you expected in your preliminary writeup?

Print off your list of sample averages, your frequency table, and the outputs in Column C (from Number 1). To set a Print Area that prints just a part of your spreadsheet, click here.

Notes:
Don't email me your project and expect me to print it off.
Don't staple this to your daily homework.