logo

Residuals

On this web page you will investigate some of the properties of residuals and see how you can evaluate a line of fit. This exploration will reinforce the concepts in Lesson 3.5 of Discovering Advanced Algebra: An Investigative Approach.

Sketch

This sketch shows two data points,  A and B, and a line through points P and Q. You can move any of the points by dragging them.

The difference between the y-value of the data point and the y-value on the vertical line is the residual. For a point above the line the residual is positive, and for a point below the line the residual is negative. In the sketch, the residuals of points A and B are the lengths of the vertical segments connecting these points to line PQ.

You often want to find a line that's a good fit to a data set so you can make predictions. If the line is a good fit for the data, there should be about as many data points above the line as below it, at about the same distances. In that case the sum of the residuals is close to 0.

Sorry, this page requires a Java-compatible web browser.

Investigate

  1. Drag points P and Q so that the residuals (shown below the graph) have the same absolute values but opposite signs. How does the line lie with respect to the data points?
  2. Now press Show All Data Points. Below the graph you can see the coordinates and residuals of all the points and the sum of the residuals. Drag points P and Q to make the sum of the residuals as close to 0 as possible. How close can you come to 0?
  3. Press Hide Data Points. How close are the residuals of A and B to being opposites? Explain.
  4. Explain why knowing the residuals and their sum can help you find a line of best fit for a set of data points.
  5. Press Show All Data Points and click to see Data Set 2. Move P and Q to make the sum of the residuals is as close to 0 as possible. How close to 0 can you get?
  6. Make the line very different but also with a sum of residuals that's very close to 0.
  7. Explain why sums of residuals might be a misleading way to measure how well a line fits a data set.

Sketch

This sketch also shows ten data points and a line through points P and Q. You can move any of the points by dragging them. Again, the residuals are represented in the lengths of vertical segments.

Below the graph you see not only the y-coordinates and residuals of the data points but also the squares of the residuals. Given at the bottom of the table is the root mean square error, which is defined as

Root Mean Square Error

where yi represents the y-value of a data point, represents the y-value of the corresponding point on the line, and n is the number of data points. (Each yi - y-hat is a residual.)

Sorry, this page requires a Java-compatible web browser.

Investigate

  1. Drag points P and Q so that the root mean square error is as small as possible. Does the line seem to be a good fit for the data?
  2. Change the data set by dragging the points into a straight line. Now try to find a line of fit that doesn't go through any data point. How small can you get the value of the root mean square error?
  3. Explain why the root mean square error is better than the sum of residuals for measuring how well the line fits the data.
  4. How is the root mean square error like and unlike the standard deviation of the residuals?
  5. Suppose you've found a line a fit and are predicting a y-value for a particular x-value. What can the root mean square error tell you about your prediction?
  6. What's an example of a real-world data set for which you might use the root mean square error?