Initial commit with translated description

2026-03-29 14:34:36 +08:00
commit f727ce26b6
4 changed files with 675 additions and 0 deletions
--- a/references/failed_tests.md
+++ b/references/failed_tests.md
@@ -0,0 +1,236 @@
+# Learning from Failed Backtests
+
+## Table of Contents
+
+1. Why Failed Ideas Are Valuable
+2. Common Failure Patterns
+3. Case Study Framework
+4. Red Flags Checklist
+
+## 1. Why Failed Ideas Are Valuable
+
+### The Value of Failures
+
+**Key insights**:
+- Failed tests save capital by preventing live implementation
+- Failure patterns reveal which assumptions don't hold
+- Understanding what doesn't work narrows the search space
+- Failed tests build experience in recognizing fragile strategies
+
+### Documentation Discipline
+
+**Record for each failed idea**:
+- The hypothesis being tested
+- Why you thought it would work
+- What the data showed
+- Specific breaking points
+- Lessons learned
+
+**Purpose**: Build a library of "anti-patterns" to avoid repeating mistakes.
+
+## 2. Common Failure Patterns
+
+### Pattern 1: Parameter Sensitivity
+
+**Symptom**: Strategy only works with very specific parameter values.
+
+**Example scenario**:
+- Strategy profitable with stop loss at exactly 2.5%
+- Increasing to 3% or decreasing to 2% causes significant performance drop
+- No "plateau" of stable performance
+
+**Why it fails**: Real markets have noise; if small changes break the strategy, it likely captured noise, not signal.
+
+**Lesson**: Seek strategies with stable performance across parameter ranges.
+
+### Pattern 2: Regime-Specific Performance
+
+**Symptom**: Strategy works brilliantly in some years, terribly in others.
+
+**Example scenario**:
+- Great performance in 2017-2019 (low volatility bull market)
+- Catastrophic losses in 2020 (high volatility)
+- Poor performance in 2022 (downtrend)
+
+**Why it fails**: Strategy dependent on specific market conditions, not robust enough for diverse environments.
+
+**Lesson**: Require acceptable (not necessarily best) performance across all regimes.
+
+### Pattern 3: Slippage Sensitivity
+
+**Symptom**: Strategy becomes unprofitable when realistic trading costs added.
+
+**Example scenario**:
+- Backtest shows 0.5% average gain per trade
+- Adding 0.1% slippage per side (0.2% round-trip) eliminates profits
+- Strategy requires unrealistic fills to be profitable
+
+**Why it fails**: Edge too small to survive real-world friction.
+
+**Lesson**: Edge must be large enough to survive pessimistic assumptions about costs.
+
+### Pattern 4: Sample Size Issues
+
+**Symptom**: Strong results based on small number of trades.
+
+**Example scenario**:
+- Backtest shows 80% win rate
+- Only 15 total trades in 5 years
+- A few different outcomes would dramatically change results
+
+**Why it fails**: Insufficient data to distinguish edge from luck.
+
+**Lesson**: Require minimum 100 trades for meaningful conclusions, preferably 200+.
+
+### Pattern 5: Look-Ahead Bias
+
+**Symptom**: Perfect or near-perfect backtest results.
+
+**Example scenario**:
+- Strategy shows 95%+ win rate
+- Unrealistically good entry/exit timing
+- Performance too good to be realistic
+
+**Why it fails**: Likely using information not available at time of trade.
+
+**Lesson**: Be suspicious of "too good to be true" results; audit data alignment carefully.
+
+### Pattern 6: Over-Optimization (Curve Fitting)
+
+**Symptom**: Complex strategy with many parameters shows excellent in-sample results but poor out-of-sample.
+
+**Example scenario**:
+- Strategy uses 8-10 different indicators with specific thresholds
+- In-sample performance: 40% annual return
+- Out-of-sample performance: -5% annual return
+- Parameters needed constant re-optimization
+
+**Why it fails**: Fitted to historical noise rather than genuine market structure.
+
+**Lesson**: Prefer simple strategies with fewer parameters; demand strong out-of-sample results.
+
+## 3. Case Study Framework
+
+### Template for Documenting Failed Ideas
+
+Use this framework when a backtest fails:
+
+#### 1. Initial Hypothesis
+- **What edge were you trying to capture?**
+- **Why did you think this would work?**
+- **What was the logical basis?**
+
+#### 2. Implementation Details
+- **Entry rules** (specific and complete)
+- **Exit rules** (stop loss, profit target, time-based)
+- **Position sizing**
+- **Filters or conditions**
+
+#### 3. Test Results
+- **Basic metrics**:
+  - Total trades
+  - Win rate
+  - Average win/loss
+  - Max drawdown
+  - Annual returns by year
+  
+- **Parameter sensitivity**:
+  - How results changed with parameter variations
+  - Whether "plateau" of stable performance existed
+
+- **Regime analysis**:
+  - Performance in different market conditions
+  - Which regimes caused problems
+
+#### 4. Breaking Points
+- **What specifically caused the strategy to fail?**
+  - Slippage too high?
+  - Parameter sensitivity?
+  - Regime-specific?
+  - Insufficient sample size?
+
+#### 5. Lessons Learned
+- **What assumptions were wrong?**
+- **What would you test differently next time?**
+- **Are there salvageable elements?**
+
+### Example: Failed Momentum Reversal Strategy
+
+#### 1. Initial Hypothesis
+Tried to capture mean reversion after strong momentum moves. Hypothesis: Stocks that gap up 5%+ on earnings often pull back 2-3% before continuing, providing short-term reversal opportunity.
+
+#### 2. Implementation
+- Entry: Short when stock gaps up 5%+ on earnings at market open
+- Exit: Cover at 2% profit or 3% stop loss
+- Holding period: Maximum 3 days
+- Filters: Market cap >$2B, average volume >500K shares
+
+#### 3. Test Results
+- 67 trades over 5 years
+- Win rate: 58%
+- Avg win: 2.1%, Avg loss: 3.2%
+- Max drawdown: 18%
+- 2019-2021: Profitable
+- 2022-2023: Significant losses
+
+#### 4. Breaking Points
+- Strategy failed during strong momentum environments (2021 meme stocks)
+- Stop losses hit frequently during continued upward momentum
+- Gap-ups that continued higher immediately caused outsized losses
+- Small sample size (67 trades) provided low statistical confidence
+- Slippage on short entries during high volatility eliminated thin edge
+
+#### 5. Lessons Learned
+- Mean reversion strategies vulnerable during momentum regimes
+- Need regime filter (e.g., only trade during high VIX or weak market)
+- 5-year test insufficient for momentum strategies; need 10+ years
+- Edge too small (2% target vs 3% stop) to survive slippage
+- Better approach: Wait for actual pullback, then enter, rather than fade immediately
+
+## 4. Red Flags Checklist
+
+Use this checklist when evaluating any backtest:
+
+### Data Quality Issues
+- [ ] Has survivorship bias been addressed?
+- [ ] Are delisted stocks included in test?
+- [ ] Is data alignment correct (no look-ahead bias)?
+- [ ] Are corporate actions (splits, dividends) handled correctly?
+
+### Sample Size Concerns
+- [ ] At least 100 trades? (Preferably 200+)
+- [ ] At least 5 years of data? (Preferably 10+)
+- [ ] Includes full market cycle?
+- [ ] Tested across multiple market regimes?
+
+### Parameter Robustness
+- [ ] Does strategy work with nearby parameter values?
+- [ ] Are there "plateaus" of stable performance?
+- [ ] Minimal parameters (ideally <5)?
+- [ ] Parameters based on logical reasoning, not pure optimization?
+
+### Execution Realism
+- [ ] Realistic commissions included?
+- [ ] Slippage modeled conservatively (1.5-2x typical)?
+- [ ] Worst-case fills considered?
+- [ ] Order rejection/partial fills addressed?
+
+### Performance Characteristics
+- [ ] Positive expectancy in majority of years?
+- [ ] Acceptable performance in all major regimes?
+- [ ] No catastrophic drawdowns (>50%)?
+- [ ] Edge large enough to survive friction?
+
+### Bias Prevention
+- [ ] Strategy defined before testing?
+- [ ] Hypothesis has economic logic?
+- [ ] Results aren't "too good to be true"?
+- [ ] Out-of-sample testing performed?
+- [ ] No cherry-picking of examples?
+
+### Tool Limitations
+- [ ] Aware of testing platform's interpolation methods?
+- [ ] Understand how platform handles low-liquidity situations?
+- [ ] Know quirks specific to data provider?
+
+**If more than 2-3 items aren't checked, the backtest requires additional work before considering live implementation.**
--- a/references/methodology.md
+++ b/references/methodology.md
@@ -0,0 +1,227 @@
+# Backtesting Methodology Reference
+
+## Table of Contents
+
+1. Core Testing Techniques
+2. Stress Testing Methods
+3. Parameter Sensitivity Analysis
+4. Slippage and Friction Modeling
+5. Sample Size Guidelines
+6. Market Regime Analysis
+7. Common Pitfalls and Biases
+
+## 1. Core Testing Techniques
+
+### "Beat Ideas to Death" Approach
+
+**Core principle**: Add friction and punishment to find strategies that break the least, not those that profit the most on paper.
+
+**Key techniques**:
+- Multiple stop loss variations
+- Different profit targets
+- Realistic + exaggerated commissions
+- Worst-case fills
+- Extended time periods
+- Multiple market regimes
+
+### The 80/20 Rule for R&D Time
+
+- 20% generating and codifying ideas
+- 80% stress testing and trying to break them
+
+## 2. Stress Testing Methods
+
+### Execution Friction Tests
+
+**Required friction additions**:
+- Realistic commissions (actual broker rates)
+- Pessimistic slippage (1.5-2x typical)
+- Worst-case entry fills (ask + 1-2 ticks)
+- Worst-case exit fills (bid - 1-2 ticks)
+- Order rejection scenarios
+- Partial fills
+
+### Parameter Robustness Tests
+
+Test across multiple configurations:
+- Entry timing variations (±15-30 minutes)
+- Stop loss distances (50%, 75%, 100%, 125%, 150% of baseline)
+- Profit targets (80%, 90%, 100%, 110%, 120% of baseline)
+- Position sizing rules
+- Filter thresholds
+
+**Goal**: Find "plateau" performance where small parameter changes don't drastically alter results.
+
+### Time-Based Robustness
+
+**Minimum requirements**:
+- Test across at least 5-10 years
+- Include multiple market regimes:
+  - Bull markets
+  - Bear markets
+  - High volatility periods
+  - Low volatility periods
+  - Trending markets
+  - Range-bound markets
+
+**Year-by-year analysis**: Strategy should show positive expectancy in majority of years, not rely on 1-2 exceptional years.
+
+## 3. Parameter Sensitivity Analysis
+
+### Heat Map Analysis
+
+Create 2D heat maps varying two parameters simultaneously:
+- Profit target (rows) × Stop loss (columns)
+- Entry time (rows) × Exit time (columns)
+- Volatility filter (rows) × Volume filter (columns)
+
+**Interpretation**:
+- Robust strategies show "plateaus" of consistent performance
+- Fragile strategies show "spikes" or narrow optimal ranges
+- Avoid strategies with performance cliffs at parameter boundaries
+
+### Walk-Forward Analysis
+
+1. Optimize parameters on training period (e.g., Year 1-2)
+2. Test with those parameters on validation period (Year 3)
+3. Roll forward and repeat
+4. Compare in-sample vs out-of-sample performance
+
+**Warning signs**:
+- Out-of-sample performance <50% of in-sample
+- Frequent need to re-optimize parameters
+- Parameters that change dramatically between periods
+
+## 4. Slippage and Friction Modeling
+
+### Realistic Slippage Assumptions
+
+**By market capitalization**:
+- Mega cap (>$200B): 0.01-0.02%
+- Large cap ($10B-$200B): 0.02-0.05%
+- Mid cap ($2B-$10B): 0.05-0.10%
+- Small cap ($300M-$2B): 0.10-0.20%
+- Micro cap (<$300M): 0.20-0.50%+
+
+**By order type**:
+- Market orders: Higher slippage
+- Limit orders: Lower slippage but potential non-fills
+- Stop orders: Significant slippage in volatile conditions
+
+### Conservative Testing Approach
+
+Use 1.5-2x typical slippage estimates for stress testing:
+- If typical slippage is 0.05%, test with 0.075-0.10%
+- If typical is 0.10%, test with 0.15-0.20%
+
+**Rationale**: Strategies that survive pessimistic assumptions often perform better in practice than in backtests.
+
+## 5. Sample Size Guidelines
+
+### Minimum Trade Requirements
+
+**Statistical significance thresholds**:
+- Absolute minimum: 30 trades
+- Preferred minimum: 100 trades
+- High confidence: 200+ trades
+
+**Why large samples matter**:
+- Reduces impact of outliers
+- Provides statistical confidence
+- Reveals true edge vs luck
+
+### Time Period Considerations
+
+**Minimum testing period**: 5 years
+**Preferred testing period**: 10+ years
+
+**Must include**:
+- At least one full market cycle
+- Multiple volatility regimes
+- Different Federal Reserve policy environments
+
+## 6. Market Regime Analysis
+
+### Regime Classification
+
+**Volatility-based regimes**:
+- Low volatility: VIX <15
+- Normal volatility: VIX 15-25
+- High volatility: VIX 25-35
+- Extreme volatility: VIX >35
+
+**Trend-based regimes**:
+- Strong uptrend: Market +10%+ over 6 months
+- Moderate uptrend: Market +5% to +10% over 6 months
+- Sideways: Market -5% to +5% over 6 months
+- Downtrend: Market <-5% over 6 months
+
+### Performance Requirements by Regime
+
+**Robust strategy characteristics**:
+- Positive expectancy in majority of regimes
+- Acceptable (not necessarily best) in all regimes
+- No catastrophic failures in any single regime
+- Understanding of which regime causes weakness
+
+## 7. Common Pitfalls and Biases
+
+### Survivorship Bias
+
+**Issue**: Testing only on currently-trading stocks ignores delisted/bankrupt companies.
+
+**Solution**: Use survivorship-bias-free datasets that include historical delistings.
+
+### Look-Ahead Bias
+
+**Issue**: Using information not available at the time of trade.
+
+**Examples**:
+- Using EOD data for intraday decisions
+- Using next-day's open for today's close decisions
+- Calculating indicators with future data points
+
+**Prevention**: Strict timestamp control and data alignment checks.
+
+### Curve-Fitting (Over-Optimization)
+
+**Warning signs**:
+- Too many parameters (>5-7)
+- Highly specific parameter values (e.g., RSI = 37.3)
+- Perfect backtest results
+- Large performance drop in validation period
+
+**Prevention techniques**:
+- Limit parameters to essential ones only
+- Use round numbers when possible
+- Require out-of-sample testing
+- Analyze parameter sensitivity
+
+### Sample Selection Bias
+
+**Issue**: Testing only on hand-picked examples (e.g., known market leaders).
+
+**Problem**: Ignoring all stocks that met criteria but failed creates false impression of strategy quality.
+
+**Solution**: Test on ALL historical examples meeting the criteria, not just successful outcomes.
+
+### Hindsight Bias
+
+**Issue**: Using outcome knowledge to influence decisions.
+
+**Prevention for systematic trading**:
+- Define all rules in advance
+- No manual intervention based on hindsight
+- Test rules across all cases, not cherry-picked examples
+
+### Data Mining Bias
+
+**Issue**: Testing hundreds of strategies until finding one that "works" by random chance.
+
+**Risk**: With enough attempts, random data will produce seemingly profitable patterns.
+
+**Mitigation**:
+- Have hypothesis before testing
+- Require economic logic for the edge
+- Use Bonferroni correction for multiple comparisons
+- Demand higher significance thresholds (p < 0.01 instead of p < 0.05)