228 lines
6.4 KiB
Markdown
228 lines
6.4 KiB
Markdown
# Backtesting Methodology Reference
|
||
|
||
## Table of Contents
|
||
|
||
1. Core Testing Techniques
|
||
2. Stress Testing Methods
|
||
3. Parameter Sensitivity Analysis
|
||
4. Slippage and Friction Modeling
|
||
5. Sample Size Guidelines
|
||
6. Market Regime Analysis
|
||
7. Common Pitfalls and Biases
|
||
|
||
## 1. Core Testing Techniques
|
||
|
||
### "Beat Ideas to Death" Approach
|
||
|
||
**Core principle**: Add friction and punishment to find strategies that break the least, not those that profit the most on paper.
|
||
|
||
**Key techniques**:
|
||
- Multiple stop loss variations
|
||
- Different profit targets
|
||
- Realistic + exaggerated commissions
|
||
- Worst-case fills
|
||
- Extended time periods
|
||
- Multiple market regimes
|
||
|
||
### The 80/20 Rule for R&D Time
|
||
|
||
- 20% generating and codifying ideas
|
||
- 80% stress testing and trying to break them
|
||
|
||
## 2. Stress Testing Methods
|
||
|
||
### Execution Friction Tests
|
||
|
||
**Required friction additions**:
|
||
- Realistic commissions (actual broker rates)
|
||
- Pessimistic slippage (1.5-2x typical)
|
||
- Worst-case entry fills (ask + 1-2 ticks)
|
||
- Worst-case exit fills (bid - 1-2 ticks)
|
||
- Order rejection scenarios
|
||
- Partial fills
|
||
|
||
### Parameter Robustness Tests
|
||
|
||
Test across multiple configurations:
|
||
- Entry timing variations (±15-30 minutes)
|
||
- Stop loss distances (50%, 75%, 100%, 125%, 150% of baseline)
|
||
- Profit targets (80%, 90%, 100%, 110%, 120% of baseline)
|
||
- Position sizing rules
|
||
- Filter thresholds
|
||
|
||
**Goal**: Find "plateau" performance where small parameter changes don't drastically alter results.
|
||
|
||
### Time-Based Robustness
|
||
|
||
**Minimum requirements**:
|
||
- Test across at least 5-10 years
|
||
- Include multiple market regimes:
|
||
- Bull markets
|
||
- Bear markets
|
||
- High volatility periods
|
||
- Low volatility periods
|
||
- Trending markets
|
||
- Range-bound markets
|
||
|
||
**Year-by-year analysis**: Strategy should show positive expectancy in majority of years, not rely on 1-2 exceptional years.
|
||
|
||
## 3. Parameter Sensitivity Analysis
|
||
|
||
### Heat Map Analysis
|
||
|
||
Create 2D heat maps varying two parameters simultaneously:
|
||
- Profit target (rows) × Stop loss (columns)
|
||
- Entry time (rows) × Exit time (columns)
|
||
- Volatility filter (rows) × Volume filter (columns)
|
||
|
||
**Interpretation**:
|
||
- Robust strategies show "plateaus" of consistent performance
|
||
- Fragile strategies show "spikes" or narrow optimal ranges
|
||
- Avoid strategies with performance cliffs at parameter boundaries
|
||
|
||
### Walk-Forward Analysis
|
||
|
||
1. Optimize parameters on training period (e.g., Year 1-2)
|
||
2. Test with those parameters on validation period (Year 3)
|
||
3. Roll forward and repeat
|
||
4. Compare in-sample vs out-of-sample performance
|
||
|
||
**Warning signs**:
|
||
- Out-of-sample performance <50% of in-sample
|
||
- Frequent need to re-optimize parameters
|
||
- Parameters that change dramatically between periods
|
||
|
||
## 4. Slippage and Friction Modeling
|
||
|
||
### Realistic Slippage Assumptions
|
||
|
||
**By market capitalization**:
|
||
- Mega cap (>$200B): 0.01-0.02%
|
||
- Large cap ($10B-$200B): 0.02-0.05%
|
||
- Mid cap ($2B-$10B): 0.05-0.10%
|
||
- Small cap ($300M-$2B): 0.10-0.20%
|
||
- Micro cap (<$300M): 0.20-0.50%+
|
||
|
||
**By order type**:
|
||
- Market orders: Higher slippage
|
||
- Limit orders: Lower slippage but potential non-fills
|
||
- Stop orders: Significant slippage in volatile conditions
|
||
|
||
### Conservative Testing Approach
|
||
|
||
Use 1.5-2x typical slippage estimates for stress testing:
|
||
- If typical slippage is 0.05%, test with 0.075-0.10%
|
||
- If typical is 0.10%, test with 0.15-0.20%
|
||
|
||
**Rationale**: Strategies that survive pessimistic assumptions often perform better in practice than in backtests.
|
||
|
||
## 5. Sample Size Guidelines
|
||
|
||
### Minimum Trade Requirements
|
||
|
||
**Statistical significance thresholds**:
|
||
- Absolute minimum: 30 trades
|
||
- Preferred minimum: 100 trades
|
||
- High confidence: 200+ trades
|
||
|
||
**Why large samples matter**:
|
||
- Reduces impact of outliers
|
||
- Provides statistical confidence
|
||
- Reveals true edge vs luck
|
||
|
||
### Time Period Considerations
|
||
|
||
**Minimum testing period**: 5 years
|
||
**Preferred testing period**: 10+ years
|
||
|
||
**Must include**:
|
||
- At least one full market cycle
|
||
- Multiple volatility regimes
|
||
- Different Federal Reserve policy environments
|
||
|
||
## 6. Market Regime Analysis
|
||
|
||
### Regime Classification
|
||
|
||
**Volatility-based regimes**:
|
||
- Low volatility: VIX <15
|
||
- Normal volatility: VIX 15-25
|
||
- High volatility: VIX 25-35
|
||
- Extreme volatility: VIX >35
|
||
|
||
**Trend-based regimes**:
|
||
- Strong uptrend: Market +10%+ over 6 months
|
||
- Moderate uptrend: Market +5% to +10% over 6 months
|
||
- Sideways: Market -5% to +5% over 6 months
|
||
- Downtrend: Market <-5% over 6 months
|
||
|
||
### Performance Requirements by Regime
|
||
|
||
**Robust strategy characteristics**:
|
||
- Positive expectancy in majority of regimes
|
||
- Acceptable (not necessarily best) in all regimes
|
||
- No catastrophic failures in any single regime
|
||
- Understanding of which regime causes weakness
|
||
|
||
## 7. Common Pitfalls and Biases
|
||
|
||
### Survivorship Bias
|
||
|
||
**Issue**: Testing only on currently-trading stocks ignores delisted/bankrupt companies.
|
||
|
||
**Solution**: Use survivorship-bias-free datasets that include historical delistings.
|
||
|
||
### Look-Ahead Bias
|
||
|
||
**Issue**: Using information not available at the time of trade.
|
||
|
||
**Examples**:
|
||
- Using EOD data for intraday decisions
|
||
- Using next-day's open for today's close decisions
|
||
- Calculating indicators with future data points
|
||
|
||
**Prevention**: Strict timestamp control and data alignment checks.
|
||
|
||
### Curve-Fitting (Over-Optimization)
|
||
|
||
**Warning signs**:
|
||
- Too many parameters (>5-7)
|
||
- Highly specific parameter values (e.g., RSI = 37.3)
|
||
- Perfect backtest results
|
||
- Large performance drop in validation period
|
||
|
||
**Prevention techniques**:
|
||
- Limit parameters to essential ones only
|
||
- Use round numbers when possible
|
||
- Require out-of-sample testing
|
||
- Analyze parameter sensitivity
|
||
|
||
### Sample Selection Bias
|
||
|
||
**Issue**: Testing only on hand-picked examples (e.g., known market leaders).
|
||
|
||
**Problem**: Ignoring all stocks that met criteria but failed creates false impression of strategy quality.
|
||
|
||
**Solution**: Test on ALL historical examples meeting the criteria, not just successful outcomes.
|
||
|
||
### Hindsight Bias
|
||
|
||
**Issue**: Using outcome knowledge to influence decisions.
|
||
|
||
**Prevention for systematic trading**:
|
||
- Define all rules in advance
|
||
- No manual intervention based on hindsight
|
||
- Test rules across all cases, not cherry-picked examples
|
||
|
||
### Data Mining Bias
|
||
|
||
**Issue**: Testing hundreds of strategies until finding one that "works" by random chance.
|
||
|
||
**Risk**: With enough attempts, random data will produce seemingly profitable patterns.
|
||
|
||
**Mitigation**:
|
||
- Have hypothesis before testing
|
||
- Require economic logic for the edge
|
||
- Use Bonferroni correction for multiple comparisons
|
||
- Demand higher significance thresholds (p < 0.01 instead of p < 0.05)
|