Least-Squares Regression
Learn how to find the best-fit line that minimizes prediction errors and use it to make predictions for the response variable.
The Regression Equation
Definition
The Least-Squares Regression Line is the line that minimizes the sum of the squared residuals (errors). It's the "best-fit" line through the data.
LogicLens: ŷ vs y
The predicted value — what the regression line says y should be for a given x.
The observed (actual) value — what the data point really is.
Interpreting Slope and Intercept
The Slope ()
The slope represents the average change in the response variable () for every 1-unit increase in the explanatory variable ().
"For each additional year of education, salary increases by $4,200 on average."
The Y-Intercept ()
The y-intercept is the predicted value of y when x = 0.
"With 0 years of education, the predicted salary is $25,000."
LogicLens: When is the Y-Intercept Meaningful?
The y-intercept is only meaningful when BOTH conditions are met:
Prediction and Extrapolation
Making Predictions
To predict a value, simply plug in the x-value into the regression equation:
If and we want to predict when :
Danger: Extrapolation
Extrapolation is predicting values for x that are far outside the observed range. This is dangerous because the linear relationship may not hold beyond the data!
Example: The Breaking Model
A model predicts test scores based on study hours using data from 1-8 hours.
Predicting for x = 100 hours doesn't make sense — at some point, more studying doesn't help, and the linear pattern breaks down!
LogicLens: Safe Prediction Zone
Interpolation (predicting within the data range) is generally safe.
Extrapolation (predicting outside the data range) is risky and should be done with extreme caution.
Residuals (Error Analysis)
Definition
A Residual is the difference between the observed value and the predicted value.
Observed minus Predicted
LogicLens: Interpreting Residuals
Example: Calculating a Residual
Given: and observed point (4, 15)
Predicted:
Residual:
✓ Positive residual — the model underestimated by 2 units.
Try It Yourself
Regression Calculator
Make a Prediction
Slope Interpretation
For every 1-unit increase in Hours Studied, Exam Score increases by about 5.05 units on average.
R² Interpretation
99.1% of the variation in Exam Score can be explained by the linear relationship with Hours Studied.
Adaptive Assessment
Unlock Your Personalized Quiz
Sign in to access AI-generated practice problems tailored to this section.