On this web page you will investigate some of the
properties of residuals and see how you can evaluate a line of fit.
This exploration will reinforce the concepts
in Lesson 3.5 of Discovering Advanced Algebra: An
This sketch shows two data points, A and
and a line through points P and Q. You can move any
of the points by dragging them.
The difference between the y-value of the data
point and the y-value on the vertical line is the residual.
For a point above the line the residual is positive, and for a point
below the line the residual is negative. In the sketch, the residuals
of points A and B are the lengths of the vertical
segments connecting these points to line PQ.
You often want to find a line that's a good fit to a
data set so you can make predictions. If the line is a good fit for the
data, there should be about as many data points above the line as below
it, at about the same distances. In that case the sum of the residuals
is close to 0.
- Drag points P and Q so
(shown below the graph) have the same absolute values but opposite
signs. How does the line lie with respect to the data points?
- Now press Show All Data Points. Below the
graph you can see the coordinates and residuals of all the points and
the sum of the residuals. Drag points P and Q
to make the sum of the residuals as close to 0 as possible. How close
can you come to 0?
- Press Hide Data Points. How close are the
residuals of A and B to being opposites? Explain.
- Explain why knowing the residuals and their sum can
help you find a line of best fit for a set of data points.
- Press Show All Data Points and click to see
Data Set 2. Move P and Q to make the sum of the
residuals is as close to 0 as possible. How close to 0 can you get?
- Make the line very different but also with a sum of
residuals that's very close to 0.
- Explain why sums of residuals might be a misleading
way to measure how well a line fits a data set.
This sketch also shows ten data points and a line
through points P and Q. You can move any of the
points by dragging them. Again, the residuals are represented in the
lengths of vertical segments.
Below the graph you see not only the y-coordinates
and residuals of the data points but also the squares of the residuals.
Given at the bottom of the table is the root mean square error,
which is defined as
where yi represents the y-value
of a data point,
represents the y-value
point on the line, and n is the number of data points. (Each yi
- is a
- Drag points P and Q so
root mean square error is as
small as possible. Does the line seem to be a good fit for the data?
- Change the data set by dragging the points into a
straight line. Now try to find a line of fit that doesn't go through
any data point. How small can you get the value of the root mean square
- Explain why the root mean square error is better than
the sum of
residuals for measuring how well the line fits the data.
- How is the root mean square error like and unlike the
standard deviation of the residuals?
- Suppose you've found a line a fit and are predicting
a y-value for a particular x-value. What can the
root mean square error tell you about your prediction?
- What's an example of a real-world data set for which
you might use the root mean square error?