Contingency Tables and Association
Explore the relationship between two categorical variables using two-way tables, marginal and conditional distributions, and tests for association.
Marginal Distributions
Contingency Table (Two-Way Table)
A contingency table summarizes the relationship between two categorical variables. Each cell shows the count of observations in that category combination.
| Smoker | Non-Smoker | Row Total | |
|---|---|---|---|
| Male | 120 | 280 | 400 |
| Female | 80 | 320 | 400 |
| Col Total | 200 | 600 | 800 |
LogicLens: Marginal Distribution
The marginal distribution shows the distribution of one variable, ignoring the other. It uses the row or column totals.
Conditional Distributions
Definition
A conditional distribution shows the distribution of one variable given (within) a specific category of the other variable.
LogicLens: The "Given" Filter
Instead of dividing by the grand total, you divide by the specific row or column total of the "given" category.
Example: "Given Male"
What proportion of males are smokers?
We use the Male row total (400), not the grand total.
Example: "Given Female"
What proportion of females are smokers?
We use the Female row total (400), not the grand total.
Identifying Association
How to Detect Association
Compare the conditional distributions across categories. If they're nearly identical, there's no association. If they differ significantly, an association likely exists.
✓ No Association
Conditional distributions are similar:
The distributions are nearly identical → no association
✗ Association Exists
Conditional distributions are different:
The proportions differ → suggests association between gender and smoking
Why Use Proportions, Not Counts?
Raw counts can be misleading when group sizes differ. If we have 1000 males and only 100 females, comparing "number of smokers" doesn't make sense. Proportions level the playing field for fair comparison.
Simpson's Paradox
Definition
Simpson's Paradox occurs when an association between two variables reverses or disappears when a third (lurking) variable is introduced.
LogicLens: Classic Example
Hospital Survival Rates
What happened? Hospital A gets more "easy" cases, inflating its overall rate. The lurking variable (case difficulty) reverses the conclusion when accounted for.
Key Takeaway
Always consider whether there could be a lurking variable that might change the story. Aggregated data can hide important patterns in subgroups.
Try It Yourself
Contingency Table Explorer
| Smoker | Non-Smoker | Row Total | |
|---|---|---|---|
| Male | 120 | 280 | 400 |
| Female | 80 | 320 | 400 |
| Col Total | 200 | 600 | 800 |
Association Check
The distributions are different — suggests an ASSOCIATION (Dependent).
Adaptive Assessment
Unlock Your Personalized Quiz
Sign in to access AI-generated practice problems tailored to this section.