Unit 5: Regression Analysis
Text reading and homework:
Read chapters 10, 11, and 12 of FPP and do the following review
exercises:
Chapter 10 (pages 176-178): 2, 3, 4, 7, 8, 9, 10
Chapter 11 (pages 198-201): 2, 3, 4, 6, 9, 10
Chapter 12 (pages 213-216): 2, 4, 5, 6
Reading:
"It's Always the Economy, Stupid: To win in
November, focus on paychecks, not polls" by
Ezra Klein, Newsweek, July 19, 2010, page 22.
Document source: LexisNexis. (Sign in via the Colgate portal.)
Possible essay questions:
- Describe the use of correlation and regression in this article.
- "... it seems awfully convenient that the three worst
candidates happened to end up in the three most impossible election
years." Discuss this statement from a "What causes what?" point of
view: Why might an impossible election year cause a bad candidate to
be chosen, or vice versa?
Computer Project:
Boy Scouts and Girl Scouts are taught that a way to measure temperature is
by counting cricket chirps and applying a formula to find the temperature.
The data in the reference below give the number of cricket chirps in a
15-second period and the temperature in degrees Farenheit at that time:
Data on cricket chirps
vs. temperature
- Copy the data above into a spreadsheet program, with the
temperatures in the left column and the numbers of chirps in
a 15-second period in the right column.
- Create a scatter plot for the data. Put Temperature
on the horizontal axis and Chirps per Period on the vertical axis.
- In a sentence or two, describe the association that exists
between the two variables. What type of association (linear,
curvilinear or none? positive or negative?) is present? How strong
is the association? (Type this into excel or write it on your printout
when you get it.)
- Have the spreadsheet compute the averages and standard
deviations of the columns and the correlation coefficient between
them. Label these values as such in the spreadsheet.
Have it place in a third column (say, next to the other two),
a list of "Chirps per Period" values that
lie on the SD-line directly above or below (i.e., at the same
"Temperature" values of) the data points. (See the link
below for help in getting this data from Excel.)
- Have the spreadsheet compute the slope and y-intercept of the
regression line for Chirps per Period on Temperature. Label these
values appropriately.
- Create a fourth column D consisting of a list of "predicted"
chirps per period values (lying on the regression line).
- Create a second scatter diagram of the data that includes plots of
the data, the SD-line and the regression line for Chirps per Period on
Temperature. Change the style for the data points that represent
the lines to lines (double-click on the data point and turn on
lines and turn off symbols).
- Which is steeper, the SD-line or the regression line? Does
that make sense? Why?
Excel instructions for finding the
SD-line and regression line.