Using Excel 2: Slope and Intercept of Regression Lines
Overview
These instructions walk you through the steps needed to complete
the assignment in Unit 5. Similar steps will work to create
predicted values
for any dataset, although Excel offers many useful alternatives to the
methods below. This page discusses the following topics:
A note on organization
It is a good idea to set up your spreadsheet in some logical fashion.
While
it is possible to create a nice graph from data with the X-component
data
at the top of the worksheet and the Y-component 1000 cells below, it's
not a good way to set things up. Organization of the worksheet is an
art,
but any attempts are very helpful especially when you have to come back
to a spreadsheet you created last year and update a report before your
boss's meeting at noon. [And who knows when your boss will want to use
your spreadsheet when you're gone.] In addition to logical arrangement
of data, labels and comments are very helpful additions to any complex
spreadsheet.
Here we set up two extra columns of data to produce straight lines
that
pass through the original data. One line is the SD-line and the other
is
the regression line. So, we will set up columns of data with X-values
next
to Y-values. I suggest leaving some room on your spreadsheet
to put comments, calculated values, like the average etc, and labels .
Creating the SD line data
-
Copy the data into the worksheet and move the columns [cut and paste]
so
that the variables are lined up next to each other.
-
Find the averages and standard deviations for each column and the
correlation
coefficient; place them in cells adjacent to labels so that you [and
others]
know what they are.
- We now have all the information needed to create the slope and
intercept for the SD-line. Have Excel compute the slope of the
SD-line in a cell by creating a formula that computes (SD Y)/(SD
X). In another cell create a formula for the intercept value
using the slope you just determined and the point of averages.
See the supplement The
Equation of the Regression Line for an idea of what this formula
should look like.
- You can now use the slope and intercept values to obtain points
on the SD-line. Put an
appropriate label at the top of a new column adjacent to your X and Y
data. Enter a formula in the
first row of data which uses the SD-line slope and intercept to
calculate the correct SD-line value for the X-value of that row.
Each row corresponds to an X-value
and its related SD-line value. Remember to use dollar signs
before the letter
and number of any cells that shouldn't change when you copy from one
row
to the next. For example, the slope and intercept are the same in
each calculation. If one of these appears in cell D4, for
example, then use $D$4 in place of D4.
- Check that your formula gives the correct answer. Then copy the
formula
down the column to the other rows. Check to make sure that the cell
references for the X-values are being incremented while those for the
slope and intercept are not. Add more dollar signs if needed and
copy again.
Creating the regression line data
To obtain regression line predictions, we can adapt the methodology
used to obtain the SD-line values. There are only two changes
that you have to be aware of: the slope and intercept for the
regression line will be different from the slope and intercept for the
SD-line.
- By entering your own formulas into cells, use Excel to compute
the slope and intercept for the regression line. Be sure to label
this information. If necessary, see the supplement The
Equation of the Regression Line for more information on formulas
that can be used to obtain the slope and intercept for the regression
line.
- In a column adjacent to your SD-line values, use the slope and
intercept for the regression line to create a formula that computes the
regression line prediction for each X-value. If you're careful
entering this formula, you can copy and paste it into other cells in
this column.
- Be sure to add an appropriate label to this column of output.
Making a scatter plot with SD and regression
lines
You now have four columns of data! You can make a scatter plot with
lines
for the SD-line and regression line.
-
Highlight the four data columns [with their labels at the top] and
choose
the "chart wizard" [a button at the top with a bar chart]. Choose the
scatter
plot format without lines connecting the points.
-
The next screen asks you to verify that the four columns that Excel
thinks
you want are really the ones you want. Select the
Series tab at
the top and verify that the X and Y ranges are correct for each of the
three data series [data series seems to mean: Y values] All OK? move to
Next screen.
-
Put titles as you wish and/or remove the legend. There will be three
sets
of data points. Click FINISH to put the chart into your worksheet.
-
By clicking on the emphasized points on the rim of the chart and moving
the mouse, you can change the size or position of the chart.
-
Initially, the axes will cross at (0,0), while all the data points
appear
well above and to the right. To make the graph focus on the data
points,
change the scale of your plot by double-clicking on the x [or y] axis
and
then choosing the "Scale" Tab in the window that appears. Choose
appropriate
values for the minimum and maximum values of the x and y variables,
near
the minima and maxima for the data points. You may also want to change
the major units to some smaller value, to add more grid lines to the
graph.
-
Make the second [SD line] data points and third [predicted] data points
appear as lines: Double-click on one of the SD-line points
[respectively,
"predicted" data points]. Then, under the "Pattern" Tab, set the line
style
to "automatic" and the marker style to "none". Click OK, then click
somewhere
off the chart [to unselect the data] and look at your chart.
Last revised: February 2004. Mail to kvalente@mail.colgate.edu
Copyright 1999 © Colgate University. All rights reserved.