121 lines
3.9 KiB
Markdown
121 lines
3.9 KiB
Markdown
# Analytical Pitfalls — Detailed Examples
|
|
|
|
## Simpson's Paradox
|
|
|
|
**What it is:** A trend that appears in aggregated data reverses when you segment by a key variable.
|
|
|
|
**Example:**
|
|
- Overall: Treatment A has 80% success, Treatment B has 85% -> "B is better"
|
|
- But segmented by severity:
|
|
- Mild cases: A=90%, B=85% -> A is better
|
|
- Severe cases: A=70%, B=65% -> A is better
|
|
- Paradox: A is better in BOTH groups, but B looks better overall because B got more mild cases
|
|
|
|
**How to catch:** Always segment by obvious confounders (user type, time period, source, severity) before concluding.
|
|
|
|
---
|
|
|
|
## Survivorship Bias
|
|
|
|
**What it is:** Drawing conclusions only from "survivors" while ignoring those who dropped out.
|
|
|
|
**Example:**
|
|
- "Users who completed onboarding have 80% retention!"
|
|
- Problem: You're only looking at users who already demonstrated commitment by completing onboarding
|
|
- The 60% who abandoned onboarding aren't in your "user" dataset
|
|
|
|
**How to catch:** Ask "Who is NOT in this dataset that should be?" Include churned users, failed attempts, non-converters.
|
|
|
|
---
|
|
|
|
## Comparing Unequal Periods
|
|
|
|
**What it is:** Comparing metrics across time periods of different lengths or characteristics.
|
|
|
|
**Examples:**
|
|
- February (28 days) vs January (31 days) revenue
|
|
- Holiday week vs normal week traffic
|
|
- Q4 (holiday season) vs Q1 for e-commerce
|
|
|
|
**How to catch:**
|
|
- Normalize to per-day, per-user, or per-session
|
|
- Compare same period last year (YoY) not sequential months
|
|
- Flag seasonal factors explicitly
|
|
|
|
---
|
|
|
|
## p-Hacking (Multiple Comparisons)
|
|
|
|
**What it is:** Running many statistical tests until finding a "significant" result, then reporting only that one.
|
|
|
|
**Example:**
|
|
- Test 20 different user segments for conversion difference
|
|
- At p=0.05, expect 1 "significant" result by chance alone
|
|
- Report: "Segment X shows significant improvement!" (cherry-picked)
|
|
|
|
**How to catch:**
|
|
- Apply Bonferroni correction (divide alpha by number of tests)
|
|
- Pre-register hypotheses before looking at data
|
|
- Report ALL tests run, not just significant ones
|
|
|
|
---
|
|
|
|
## Spurious Correlation in Time Series
|
|
|
|
**What it is:** Two variables both trending over time appear correlated, but the relationship is meaningless.
|
|
|
|
**Example:**
|
|
- "Revenue and employee count are 95% correlated!"
|
|
- Both grew over time. Controlling for time, there's no relationship.
|
|
- Classic: "Ice cream sales correlate with drowning deaths" (both rise in summer)
|
|
|
|
**How to catch:**
|
|
- Detrend both series before correlating
|
|
- Check if relationship holds within time periods
|
|
- Ask: "Is there a causal mechanism, or just shared time trend?"
|
|
|
|
---
|
|
|
|
## Aggregating Percentages
|
|
|
|
**What it is:** Averaging percentages instead of recalculating from underlying totals.
|
|
|
|
**Example:**
|
|
- Store A: 10/100 = 10% conversion
|
|
- Store B: 5/10 = 50% conversion
|
|
- Wrong: "Average conversion is 30%"
|
|
- Right: 15/110 = 13.6% conversion
|
|
|
|
**How to catch:** Never average percentages. Sum numerators, sum denominators, recalculate.
|
|
|
|
---
|
|
|
|
## Selection Bias in A/B Tests
|
|
|
|
**What it is:** Treatment and control groups differ systematically before treatment is applied.
|
|
|
|
**Examples:**
|
|
- Users who opted into new feature vs those who didn't
|
|
- Early adopters (Monday signups) vs late week (Friday signups)
|
|
- Users who saw the experiment (loaded fast enough) vs those who didn't
|
|
|
|
**How to catch:**
|
|
- Verify pre-experiment metrics are balanced
|
|
- Use intention-to-treat analysis
|
|
- Check for differential attrition
|
|
|
|
---
|
|
|
|
## Confusing Causation
|
|
|
|
**What it is:** Assuming X causes Y when the relationship might be: Y causes X, Z causes both, or it's coincidental.
|
|
|
|
**Example:**
|
|
- "Power users have higher retention"
|
|
- Did power usage cause retention? Or did retained users become power users over time? Or does a third factor (job role) drive both?
|
|
|
|
**How to catch:**
|
|
- Can you run an experiment? (randomize treatment)
|
|
- Is there a natural experiment? (policy change, feature rollout)
|
|
- At minimum: control for obvious confounders
|