Section 4.1

Scatter Diagrams and Correlation

Explore the relationship between two quantitative variables using scatter plots and measure the strength of linear associations with the correlation coefficient.

1

Visualizing Relationships

Explanatory Variable

The explanatory variable (or independent variable) is the variable we believe may explain or influence changes in another variable.

Response Variable

The response variable (or dependent variable) is the variable we measure to see how it responds to changes in the explanatory variable.

LogicLens: Reading a Scatter Plot

When examining a scatter diagram, look for two key features:

Direction
  • Positive: As increases, increases
  • Negative: As increases, decreases
  • None: No clear pattern
Form
  • Linear: Points follow a straight-line pattern
  • Nonlinear: Points follow a curved pattern
2

The Linear Correlation Coefficient (r)

Definition

The linear correlation coefficient () is a numerical measure of the strength and direction of the linear relationship between two quantitative variables.

Sample Linear Correlation Coefficient Formula

is the ith observation of the explanatory variable
is the sample mean of the explanatory variable
is the sample standard deviation of the explanatory variable
is the ith observation of the response variable
is the sample mean of the response variable
is the sample standard deviation of the response variable
is the number of individuals in the sample
−1
Perfect Negative
0
No Linear Relation
+1
Perfect Positive

LogicLens: Critical Properties of r

r is always between −1 and +1

r is unitless

No units (like meters or dollars)

r = +1 or −1

Perfect linear relationship

r is NOT resistant

Sensitive to outliers!

|r| ValueInterpretation
0.90 − 1.00Very Strong
0.70 − 0.89Strong
0.50 − 0.69Moderate
0.30 − 0.49Weak
0.00 − 0.29Very Weak / None
3

Determining Linearity

To formally test whether a linear relationship exists, we compare our calculated value to a critical value from a statistical table based on the sample size .

LogicLens: The Linearity Test

✓ A linear relation EXISTS

✗ NO linear relation concluded

Example Critical Values
n (sample size)510152025
Critical Value0.8780.6320.5140.4440.396
4

Correlation vs. Causation

Critical Warning

Correlation does NOT imply Causation!

Just because two variables are strongly correlated does NOT mean one causes the other.

LogicLens: The Lurking Variable

A lurking variable is a variable that is not included in the study but affects both the explanatory and response variables, creating a false appearance of a direct relationship.

Classic Example: Ice Cream & Shark Attacks
🍦
Ice Cream Sales
🦈
Shark Attacks
Lurking Variable: ☀️ Summer / Hot Weather

Hot weather causes both increased ice cream consumption AND more people swimming (leading to more shark encounters).

Another Example: Fire Trucks & Damage

There's a strong positive correlation between the number of fire trucks at a fire and the amount of property damage. Does this mean fire trucks cause damage?

No! The lurking variable is the size of the fire. Larger fires require more trucks AND cause more damage.

Try It Yourself

Scatter Plot Explorer

Hours Studied (explanatory) vs Exam Score (response): Study hours vs. exam scores
Hours Studied (x)Exam Score (y)
Data Points Trend Line
Correlation Coefficient
Direction
Positive
Strength
Very Strong

LogicLens Interpretation

This scatter plot shows a very strong positive linear relationship. As Hours Studied increases, Exam Score tends to increase in a fairly predictable pattern.

LogicLens Practice

Adaptive Assessment

Unlock Your Personalized Quiz

Sign in to access AI-generated practice problems tailored to this section.