Section 4.4

Contingency Tables and Association

Explore the relationship between two categorical variables using two-way tables, marginal and conditional distributions, and tests for association.

1

Marginal Distributions

Contingency Table (Two-Way Table)

A contingency table summarizes the relationship between two categorical variables. Each cell shows the count of observations in that category combination.

SmokerNon-SmokerRow Total
Male120280400
Female80320400
Col Total200600800

LogicLens: Marginal Distribution

The marginal distribution shows the distribution of one variable, ignoring the other. It uses the row or column totals.

Example: Marginal proportion of Males = 400/800 = 50%
2

Conditional Distributions

Definition

A conditional distribution shows the distribution of one variable given (within) a specific category of the other variable.

LogicLens: The "Given" Filter

Instead of dividing by the grand total, you divide by the specific row or column total of the "given" category.

Example: "Given Male"

What proportion of males are smokers?

We use the Male row total (400), not the grand total.

Example: "Given Female"

What proportion of females are smokers?

We use the Female row total (400), not the grand total.

3

Identifying Association

How to Detect Association

Compare the conditional distributions across categories. If they're nearly identical, there's no association. If they differ significantly, an association likely exists.

✓ No Association

Conditional distributions are similar:

Males: 25% smokers, 75% non-smokers
Females: 24% smokers, 76% non-smokers

The distributions are nearly identical → no association

✗ Association Exists

Conditional distributions are different:

Males: 30% smokers, 70% non-smokers
Females: 20% smokers, 80% non-smokers

The proportions differ → suggests association between gender and smoking

Why Use Proportions, Not Counts?

Raw counts can be misleading when group sizes differ. If we have 1000 males and only 100 females, comparing "number of smokers" doesn't make sense. Proportions level the playing field for fair comparison.

4

Simpson's Paradox

Definition

Simpson's Paradox occurs when an association between two variables reverses or disappears when a third (lurking) variable is introduced.

LogicLens: Classic Example

Hospital Survival Rates
Overall:
Hospital A: 80% survival
Hospital B: 70% survival
→ Hospital A looks better!
By Case Type:
Easy Cases: A = 95%, B = 98%
Hard Cases: A = 30%, B = 40%
→ Hospital B is better in BOTH!

What happened? Hospital A gets more "easy" cases, inflating its overall rate. The lurking variable (case difficulty) reverses the conclusion when accounted for.

Key Takeaway

Always consider whether there could be a lurking variable that might change the story. Aggregated data can hide important patterns in subgroups.

Try It Yourself

Contingency Table Explorer

SmokerNon-SmokerRow Total
Male120280400
Female80320400
Col Total200600800

Association Check

Male
Smoker: 30.0%
Female
Smoker: 20.0%

The distributions are different — suggests an ASSOCIATION (Dependent).

Conditional Distributions
LogicLens Practice

Adaptive Assessment

Unlock Your Personalized Quiz

Sign in to access AI-generated practice problems tailored to this section.