veeramanikandanr48_backtest…/references/methodology.md

# Backtesting Methodology Reference

## Table of Contents

1. Core Testing Techniques
2. Stress Testing Methods
3. Parameter Sensitivity Analysis
4. Slippage and Friction Modeling
5. Sample Size Guidelines
6. Market Regime Analysis
7. Common Pitfalls and Biases

## 1. Core Testing Techniques

### "Beat Ideas to Death" Approach

**Core principle**: Add friction and punishment to find strategies that break the least, not those that profit the most on paper.

**Key techniques**:
- Multiple stop loss variations
- Different profit targets
- Realistic + exaggerated commissions
- Worst-case fills
- Extended time periods
- Multiple market regimes

### The 80/20 Rule for R&D Time

- 20% generating and codifying ideas
- 80% stress testing and trying to break them

## 2. Stress Testing Methods

### Execution Friction Tests

**Required friction additions**:
- Realistic commissions (actual broker rates)
- Pessimistic slippage (1.5-2x typical)
- Worst-case entry fills (ask + 1-2 ticks)
- Worst-case exit fills (bid - 1-2 ticks)
- Order rejection scenarios
- Partial fills

### Parameter Robustness Tests

Test across multiple configurations:
- Entry timing variations (±15-30 minutes)
- Stop loss distances (50%, 75%, 100%, 125%, 150% of baseline)
- Profit targets (80%, 90%, 100%, 110%, 120% of baseline)
- Position sizing rules
- Filter thresholds

**Goal**: Find "plateau" performance where small parameter changes don't drastically alter results.

### Time-Based Robustness

**Minimum requirements**:
- Test across at least 5-10 years
- Include multiple market regimes:
  - Bull markets
  - Bear markets
  - High volatility periods
  - Low volatility periods
  - Trending markets
  - Range-bound markets

**Year-by-year analysis**: Strategy should show positive expectancy in majority of years, not rely on 1-2 exceptional years.

## 3. Parameter Sensitivity Analysis

### Heat Map Analysis

Create 2D heat maps varying two parameters simultaneously:
- Profit target (rows) × Stop loss (columns)
- Entry time (rows) × Exit time (columns)
- Volatility filter (rows) × Volume filter (columns)

**Interpretation**:
- Robust strategies show "plateaus" of consistent performance
- Fragile strategies show "spikes" or narrow optimal ranges
- Avoid strategies with performance cliffs at parameter boundaries

### Walk-Forward Analysis

1. Optimize parameters on training period (e.g., Year 1-2)
2. Test with those parameters on validation period (Year 3)
3. Roll forward and repeat
4. Compare in-sample vs out-of-sample performance

**Warning signs**:
- Out-of-sample performance <50% of in-sample
- Frequent need to re-optimize parameters
- Parameters that change dramatically between periods

## 4. Slippage and Friction Modeling

### Realistic Slippage Assumptions

**By market capitalization**:
- Mega cap (>$200B): 0.01-0.02%
- Large cap ($10B-$200B): 0.02-0.05%
- Mid cap ($2B-$10B): 0.05-0.10%
- Small cap ($300M-$2B): 0.10-0.20%
- Micro cap (<$300M): 0.20-0.50%+

**By order type**:
- Market orders: Higher slippage
- Limit orders: Lower slippage but potential non-fills
- Stop orders: Significant slippage in volatile conditions

### Conservative Testing Approach

Use 1.5-2x typical slippage estimates for stress testing:
- If typical slippage is 0.05%, test with 0.075-0.10%
- If typical is 0.10%, test with 0.15-0.20%

**Rationale**: Strategies that survive pessimistic assumptions often perform better in practice than in backtests.

## 5. Sample Size Guidelines

### Minimum Trade Requirements

**Statistical significance thresholds**:
- Absolute minimum: 30 trades
- Preferred minimum: 100 trades
- High confidence: 200+ trades

**Why large samples matter**:
- Reduces impact of outliers
- Provides statistical confidence
- Reveals true edge vs luck

### Time Period Considerations

**Minimum testing period**: 5 years
**Preferred testing period**: 10+ years

**Must include**:
- At least one full market cycle
- Multiple volatility regimes
- Different Federal Reserve policy environments

## 6. Market Regime Analysis

### Regime Classification

**Volatility-based regimes**:
- Low volatility: VIX <15
- Normal volatility: VIX 15-25
- High volatility: VIX 25-35
- Extreme volatility: VIX >35

**Trend-based regimes**:
- Strong uptrend: Market +10%+ over 6 months
- Moderate uptrend: Market +5% to +10% over 6 months
- Sideways: Market -5% to +5% over 6 months
- Downtrend: Market <-5% over 6 months

### Performance Requirements by Regime

**Robust strategy characteristics**:
- Positive expectancy in majority of regimes
- Acceptable (not necessarily best) in all regimes
- No catastrophic failures in any single regime
- Understanding of which regime causes weakness

## 7. Common Pitfalls and Biases

### Survivorship Bias

**Issue**: Testing only on currently-trading stocks ignores delisted/bankrupt companies.

**Solution**: Use survivorship-bias-free datasets that include historical delistings.

### Look-Ahead Bias

**Issue**: Using information not available at the time of trade.

**Examples**:
- Using EOD data for intraday decisions
- Using next-day's open for today's close decisions
- Calculating indicators with future data points

**Prevention**: Strict timestamp control and data alignment checks.

### Curve-Fitting (Over-Optimization)

**Warning signs**:
- Too many parameters (>5-7)
- Highly specific parameter values (e.g., RSI = 37.3)
- Perfect backtest results
- Large performance drop in validation period

**Prevention techniques**:
- Limit parameters to essential ones only
- Use round numbers when possible
- Require out-of-sample testing
- Analyze parameter sensitivity

### Sample Selection Bias

**Issue**: Testing only on hand-picked examples (e.g., known market leaders).

**Problem**: Ignoring all stocks that met criteria but failed creates false impression of strategy quality.

**Solution**: Test on ALL historical examples meeting the criteria, not just successful outcomes.

### Hindsight Bias

**Issue**: Using outcome knowledge to influence decisions.

**Prevention for systematic trading**:
- Define all rules in advance
- No manual intervention based on hindsight
- Test rules across all cases, not cherry-picked examples

### Data Mining Bias

**Issue**: Testing hundreds of strategies until finding one that "works" by random chance.

**Risk**: With enough attempts, random data will produce seemingly profitable patterns.

**Mitigation**:
- Have hypothesis before testing
- Require economic logic for the edge
- Use Bonferroni correction for multiple comparisons
- Demand higher significance thresholds (p < 0.01 instead of p < 0.05)