Section 4.2

Least-Squares Regression

Learn how to find the best-fit line that minimizes prediction errors and use it to make predictions for the response variable.

1

The Regression Equation

Definition

The Least-Squares Regression Line is the line that minimizes the sum of the squared residuals (errors). It's the "best-fit" line through the data.

The Regression Equation
Predicted value
Slope
Y-Intercept

LogicLens: ŷ vs y

(y-hat)

The predicted value — what the regression line says y should be for a given x.

The observed (actual) value — what the data point really is.

2

Interpreting Slope and Intercept

The Slope ()

The slope represents the average change in the response variable () for every 1-unit increase in the explanatory variable ().

Example: If in a model predicting salary from years of education:
"For each additional year of education, salary increases by $4,200 on average."

The Y-Intercept ()

The y-intercept is the predicted value of y when x = 0.

Example: If in the salary model:
"With 0 years of education, the predicted salary is $25,000."

LogicLens: When is the Y-Intercept Meaningful?

The y-intercept is only meaningful when BOTH conditions are met:

1. is a reasonable value in the context
2. is within the observed data range
Example: In a tree height vs. age model, does x = 0 (a tree at age 0) make sense? Probably not — the intercept has no practical meaning here.
3

Prediction and Extrapolation

Making Predictions

To predict a value, simply plug in the x-value into the regression equation:

If and we want to predict when :

Danger: Extrapolation

Extrapolation is predicting values for x that are far outside the observed range. This is dangerous because the linear relationship may not hold beyond the data!

Example: The Breaking Model

A model predicts test scores based on study hours using data from 1-8 hours.
Predicting for x = 100 hours doesn't make sense — at some point, more studying doesn't help, and the linear pattern breaks down!

LogicLens: Safe Prediction Zone

Interpolation (predicting within the data range) is generally safe.
Extrapolation (predicting outside the data range) is risky and should be done with extreme caution.

4

Residuals (Error Analysis)

Definition

A Residual is the difference between the observed value and the predicted value.

Observed minus Predicted

LogicLens: Interpreting Residuals

Positive Residual
The actual value was higher than predicted.
The model underestimated.
Negative Residual
The actual value was lower than predicted.
The model overestimated.
Example: Calculating a Residual

Given: and observed point (4, 15)

Predicted:

Residual:

✓ Positive residual — the model underestimated by 2 units.

Try It Yourself

Regression Calculator

Least-Squares Regression Line
Slope: 5.0476Intercept: 48.5357R²: 99.13%
Hours StudiedExam Score

Make a Prediction

When x =

Slope Interpretation

For every 1-unit increase in Hours Studied, Exam Score increases by about 5.05 units on average.

R² Interpretation

99.1% of the variation in Exam Score can be explained by the linear relationship with Hours Studied.

LogicLens Practice

Adaptive Assessment

Unlock Your Personalized Quiz

Sign in to access AI-generated practice problems tailored to this section.