Initial commit with translated description
This commit is contained in:
206
SKILL.md
Normal file
206
SKILL.md
Normal file
@@ -0,0 +1,206 @@
|
||||
---
|
||||
name: backtest-expert
|
||||
description: "交易策略系统回测的专家指导。"
|
||||
---
|
||||
|
||||
# Backtest Expert
|
||||
|
||||
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
|
||||
|
||||
## Core Philosophy
|
||||
|
||||
**Goal**: Find strategies that "break the least", not strategies that "profit the most" on paper.
|
||||
|
||||
**Principle**: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when:
|
||||
- Developing or validating systematic trading strategies
|
||||
- Evaluating whether a trading idea is robust enough for live implementation
|
||||
- Troubleshooting why a backtest might be misleading
|
||||
- Learning proper backtesting methodology
|
||||
- Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
|
||||
- Assessing parameter sensitivity and regime dependence
|
||||
- Setting realistic expectations for slippage and execution costs
|
||||
|
||||
## Backtesting Workflow
|
||||
|
||||
### 1. State the Hypothesis
|
||||
|
||||
Define the edge in one sentence.
|
||||
|
||||
**Example**: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
|
||||
|
||||
If you can't articulate the edge clearly, don't proceed to testing.
|
||||
|
||||
### 2. Codify Rules with Zero Discretion
|
||||
|
||||
Define with complete specificity:
|
||||
- **Entry**: Exact conditions, timing, price type
|
||||
- **Exit**: Stop loss, profit target, time-based exit
|
||||
- **Position sizing**: Fixed $$, % of portfolio, volatility-adjusted
|
||||
- **Filters**: Market cap, volume, sector, volatility conditions
|
||||
- **Universe**: What instruments are eligible
|
||||
|
||||
**Critical**: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
|
||||
|
||||
### 3. Run Initial Backtest
|
||||
|
||||
Test over:
|
||||
- **Minimum 5 years** (preferably 10+)
|
||||
- **Multiple market regimes** (bull, bear, high/low volatility)
|
||||
- **Realistic costs**: Commissions + conservative slippage
|
||||
|
||||
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
|
||||
|
||||
### 4. Stress Test the Strategy
|
||||
|
||||
This is where 80% of testing time should be spent.
|
||||
|
||||
**Parameter sensitivity**:
|
||||
- Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
|
||||
- Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
|
||||
- Vary entry/exit timing by ±15-30 minutes
|
||||
- Look for "plateaus" of stable performance, not narrow spikes
|
||||
|
||||
**Execution friction**:
|
||||
- Increase slippage to 1.5-2x typical estimates
|
||||
- Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
|
||||
- Add realistic order rejection scenarios
|
||||
- Test with pessimistic commission structures
|
||||
|
||||
**Time robustness**:
|
||||
- Analyze year-by-year performance
|
||||
- Require positive expectancy in majority of years
|
||||
- Ensure strategy doesn't rely on 1-2 exceptional periods
|
||||
- Test in different market regimes separately
|
||||
|
||||
**Sample size**:
|
||||
- Absolute minimum: 30 trades
|
||||
- Preferred: 100+ trades
|
||||
- High confidence: 200+ trades
|
||||
|
||||
### 5. Out-of-Sample Validation
|
||||
|
||||
**Walk-forward analysis**:
|
||||
1. Optimize on training period (e.g., Year 1-3)
|
||||
2. Test on validation period (Year 4)
|
||||
3. Roll forward and repeat
|
||||
4. Compare in-sample vs out-of-sample performance
|
||||
|
||||
**Warning signs**:
|
||||
- Out-of-sample <50% of in-sample performance
|
||||
- Need frequent parameter re-optimization
|
||||
- Parameters change dramatically between periods
|
||||
|
||||
### 6. Evaluate Results
|
||||
|
||||
**Questions to answer**:
|
||||
- Does edge survive pessimistic assumptions?
|
||||
- Is performance stable across parameter variations?
|
||||
- Does strategy work in multiple market regimes?
|
||||
- Is sample size sufficient for statistical confidence?
|
||||
- Are results realistic, not "too good to be true"?
|
||||
|
||||
**Decision criteria**:
|
||||
- ✅ **Deploy**: Survives all stress tests with acceptable performance
|
||||
- 🔄 **Refine**: Core logic sound but needs parameter adjustment
|
||||
- ❌ **Abandon**: Fails stress tests or relies on fragile assumptions
|
||||
|
||||
## Key Testing Principles
|
||||
|
||||
### Punish the Strategy
|
||||
|
||||
Add friction everywhere:
|
||||
- Commissions higher than reality
|
||||
- Slippage 1.5-2x typical
|
||||
- Worst-case fills
|
||||
- Order rejections
|
||||
- Partial fills
|
||||
|
||||
**Rationale**: Strategies that survive pessimistic assumptions often outperform in live trading.
|
||||
|
||||
### Seek Plateaus, Not Peaks
|
||||
|
||||
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
|
||||
|
||||
**Good**: Strategy profitable with stop loss anywhere from 1.5% to 3.0%
|
||||
**Bad**: Strategy only works with stop loss at exactly 2.13%
|
||||
|
||||
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
|
||||
|
||||
### Test All Cases, Not Cherry-Picked Examples
|
||||
|
||||
**Wrong approach**: Study hand-picked "market leaders" that worked
|
||||
**Right approach**: Test every stock that met criteria, including those that failed
|
||||
|
||||
Selective examples create survivorship bias and overestimate strategy quality.
|
||||
|
||||
### Separate Idea Generation from Validation
|
||||
|
||||
**Intuition**: Useful for generating hypotheses
|
||||
**Validation**: Must be purely data-driven
|
||||
|
||||
Never let attachment to an idea influence interpretation of test results.
|
||||
|
||||
## Common Failure Patterns
|
||||
|
||||
Recognize these patterns early to save time:
|
||||
|
||||
1. **Parameter sensitivity**: Only works with exact parameter values
|
||||
2. **Regime-specific**: Great in some years, terrible in others
|
||||
3. **Slippage sensitivity**: Unprofitable when realistic costs added
|
||||
4. **Small sample**: Too few trades for statistical confidence
|
||||
5. **Look-ahead bias**: "Too good to be true" results
|
||||
6. **Over-optimization**: Many parameters, poor out-of-sample results
|
||||
|
||||
See `references/failed_tests.md` for detailed examples and diagnostic framework.
|
||||
|
||||
## Available Reference Documentation
|
||||
|
||||
### Methodology Reference
|
||||
**File**: `references/methodology.md`
|
||||
|
||||
**When to read**: For detailed guidance on specific testing techniques.
|
||||
|
||||
**Contents**:
|
||||
- Stress testing methods
|
||||
- Parameter sensitivity analysis
|
||||
- Slippage and friction modeling
|
||||
- Sample size requirements
|
||||
- Market regime classification
|
||||
- Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
|
||||
|
||||
### Failed Tests Reference
|
||||
**File**: `references/failed_tests.md`
|
||||
|
||||
**When to read**: When strategy fails tests, or learning from past mistakes.
|
||||
|
||||
**Contents**:
|
||||
- Why failures are valuable
|
||||
- Common failure patterns with examples
|
||||
- Case study documentation framework
|
||||
- Red flags checklist for evaluating backtests
|
||||
|
||||
## Critical Reminders
|
||||
|
||||
**Time allocation**: Spend 20% generating ideas, 80% trying to break them.
|
||||
|
||||
**Context-free requirement**: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
|
||||
|
||||
**Red flag**: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
|
||||
|
||||
**Tool limitations**: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
|
||||
|
||||
**Statistical significance**: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
|
||||
|
||||
## Discretionary vs Systematic Differences
|
||||
|
||||
This skill focuses on **systematic/quantitative** backtesting where:
|
||||
- All rules are codified in advance
|
||||
- No discretion or "feel" in execution
|
||||
- Testing happens on all historical examples, not cherry-picked cases
|
||||
- Context (news, macro) is deliberately stripped out
|
||||
|
||||
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
|
||||
6
_meta.json
Normal file
6
_meta.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"ownerId": "kn7agf701n3afzzbq8ge0wa8k1809wm4",
|
||||
"slug": "backtest-expert",
|
||||
"version": "0.1.0",
|
||||
"publishedAt": 1769870095738
|
||||
}
|
||||
236
references/failed_tests.md
Normal file
236
references/failed_tests.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Learning from Failed Backtests
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. Why Failed Ideas Are Valuable
|
||||
2. Common Failure Patterns
|
||||
3. Case Study Framework
|
||||
4. Red Flags Checklist
|
||||
|
||||
## 1. Why Failed Ideas Are Valuable
|
||||
|
||||
### The Value of Failures
|
||||
|
||||
**Key insights**:
|
||||
- Failed tests save capital by preventing live implementation
|
||||
- Failure patterns reveal which assumptions don't hold
|
||||
- Understanding what doesn't work narrows the search space
|
||||
- Failed tests build experience in recognizing fragile strategies
|
||||
|
||||
### Documentation Discipline
|
||||
|
||||
**Record for each failed idea**:
|
||||
- The hypothesis being tested
|
||||
- Why you thought it would work
|
||||
- What the data showed
|
||||
- Specific breaking points
|
||||
- Lessons learned
|
||||
|
||||
**Purpose**: Build a library of "anti-patterns" to avoid repeating mistakes.
|
||||
|
||||
## 2. Common Failure Patterns
|
||||
|
||||
### Pattern 1: Parameter Sensitivity
|
||||
|
||||
**Symptom**: Strategy only works with very specific parameter values.
|
||||
|
||||
**Example scenario**:
|
||||
- Strategy profitable with stop loss at exactly 2.5%
|
||||
- Increasing to 3% or decreasing to 2% causes significant performance drop
|
||||
- No "plateau" of stable performance
|
||||
|
||||
**Why it fails**: Real markets have noise; if small changes break the strategy, it likely captured noise, not signal.
|
||||
|
||||
**Lesson**: Seek strategies with stable performance across parameter ranges.
|
||||
|
||||
### Pattern 2: Regime-Specific Performance
|
||||
|
||||
**Symptom**: Strategy works brilliantly in some years, terribly in others.
|
||||
|
||||
**Example scenario**:
|
||||
- Great performance in 2017-2019 (low volatility bull market)
|
||||
- Catastrophic losses in 2020 (high volatility)
|
||||
- Poor performance in 2022 (downtrend)
|
||||
|
||||
**Why it fails**: Strategy dependent on specific market conditions, not robust enough for diverse environments.
|
||||
|
||||
**Lesson**: Require acceptable (not necessarily best) performance across all regimes.
|
||||
|
||||
### Pattern 3: Slippage Sensitivity
|
||||
|
||||
**Symptom**: Strategy becomes unprofitable when realistic trading costs added.
|
||||
|
||||
**Example scenario**:
|
||||
- Backtest shows 0.5% average gain per trade
|
||||
- Adding 0.1% slippage per side (0.2% round-trip) eliminates profits
|
||||
- Strategy requires unrealistic fills to be profitable
|
||||
|
||||
**Why it fails**: Edge too small to survive real-world friction.
|
||||
|
||||
**Lesson**: Edge must be large enough to survive pessimistic assumptions about costs.
|
||||
|
||||
### Pattern 4: Sample Size Issues
|
||||
|
||||
**Symptom**: Strong results based on small number of trades.
|
||||
|
||||
**Example scenario**:
|
||||
- Backtest shows 80% win rate
|
||||
- Only 15 total trades in 5 years
|
||||
- A few different outcomes would dramatically change results
|
||||
|
||||
**Why it fails**: Insufficient data to distinguish edge from luck.
|
||||
|
||||
**Lesson**: Require minimum 100 trades for meaningful conclusions, preferably 200+.
|
||||
|
||||
### Pattern 5: Look-Ahead Bias
|
||||
|
||||
**Symptom**: Perfect or near-perfect backtest results.
|
||||
|
||||
**Example scenario**:
|
||||
- Strategy shows 95%+ win rate
|
||||
- Unrealistically good entry/exit timing
|
||||
- Performance too good to be realistic
|
||||
|
||||
**Why it fails**: Likely using information not available at time of trade.
|
||||
|
||||
**Lesson**: Be suspicious of "too good to be true" results; audit data alignment carefully.
|
||||
|
||||
### Pattern 6: Over-Optimization (Curve Fitting)
|
||||
|
||||
**Symptom**: Complex strategy with many parameters shows excellent in-sample results but poor out-of-sample.
|
||||
|
||||
**Example scenario**:
|
||||
- Strategy uses 8-10 different indicators with specific thresholds
|
||||
- In-sample performance: 40% annual return
|
||||
- Out-of-sample performance: -5% annual return
|
||||
- Parameters needed constant re-optimization
|
||||
|
||||
**Why it fails**: Fitted to historical noise rather than genuine market structure.
|
||||
|
||||
**Lesson**: Prefer simple strategies with fewer parameters; demand strong out-of-sample results.
|
||||
|
||||
## 3. Case Study Framework
|
||||
|
||||
### Template for Documenting Failed Ideas
|
||||
|
||||
Use this framework when a backtest fails:
|
||||
|
||||
#### 1. Initial Hypothesis
|
||||
- **What edge were you trying to capture?**
|
||||
- **Why did you think this would work?**
|
||||
- **What was the logical basis?**
|
||||
|
||||
#### 2. Implementation Details
|
||||
- **Entry rules** (specific and complete)
|
||||
- **Exit rules** (stop loss, profit target, time-based)
|
||||
- **Position sizing**
|
||||
- **Filters or conditions**
|
||||
|
||||
#### 3. Test Results
|
||||
- **Basic metrics**:
|
||||
- Total trades
|
||||
- Win rate
|
||||
- Average win/loss
|
||||
- Max drawdown
|
||||
- Annual returns by year
|
||||
|
||||
- **Parameter sensitivity**:
|
||||
- How results changed with parameter variations
|
||||
- Whether "plateau" of stable performance existed
|
||||
|
||||
- **Regime analysis**:
|
||||
- Performance in different market conditions
|
||||
- Which regimes caused problems
|
||||
|
||||
#### 4. Breaking Points
|
||||
- **What specifically caused the strategy to fail?**
|
||||
- Slippage too high?
|
||||
- Parameter sensitivity?
|
||||
- Regime-specific?
|
||||
- Insufficient sample size?
|
||||
|
||||
#### 5. Lessons Learned
|
||||
- **What assumptions were wrong?**
|
||||
- **What would you test differently next time?**
|
||||
- **Are there salvageable elements?**
|
||||
|
||||
### Example: Failed Momentum Reversal Strategy
|
||||
|
||||
#### 1. Initial Hypothesis
|
||||
Tried to capture mean reversion after strong momentum moves. Hypothesis: Stocks that gap up 5%+ on earnings often pull back 2-3% before continuing, providing short-term reversal opportunity.
|
||||
|
||||
#### 2. Implementation
|
||||
- Entry: Short when stock gaps up 5%+ on earnings at market open
|
||||
- Exit: Cover at 2% profit or 3% stop loss
|
||||
- Holding period: Maximum 3 days
|
||||
- Filters: Market cap >$2B, average volume >500K shares
|
||||
|
||||
#### 3. Test Results
|
||||
- 67 trades over 5 years
|
||||
- Win rate: 58%
|
||||
- Avg win: 2.1%, Avg loss: 3.2%
|
||||
- Max drawdown: 18%
|
||||
- 2019-2021: Profitable
|
||||
- 2022-2023: Significant losses
|
||||
|
||||
#### 4. Breaking Points
|
||||
- Strategy failed during strong momentum environments (2021 meme stocks)
|
||||
- Stop losses hit frequently during continued upward momentum
|
||||
- Gap-ups that continued higher immediately caused outsized losses
|
||||
- Small sample size (67 trades) provided low statistical confidence
|
||||
- Slippage on short entries during high volatility eliminated thin edge
|
||||
|
||||
#### 5. Lessons Learned
|
||||
- Mean reversion strategies vulnerable during momentum regimes
|
||||
- Need regime filter (e.g., only trade during high VIX or weak market)
|
||||
- 5-year test insufficient for momentum strategies; need 10+ years
|
||||
- Edge too small (2% target vs 3% stop) to survive slippage
|
||||
- Better approach: Wait for actual pullback, then enter, rather than fade immediately
|
||||
|
||||
## 4. Red Flags Checklist
|
||||
|
||||
Use this checklist when evaluating any backtest:
|
||||
|
||||
### Data Quality Issues
|
||||
- [ ] Has survivorship bias been addressed?
|
||||
- [ ] Are delisted stocks included in test?
|
||||
- [ ] Is data alignment correct (no look-ahead bias)?
|
||||
- [ ] Are corporate actions (splits, dividends) handled correctly?
|
||||
|
||||
### Sample Size Concerns
|
||||
- [ ] At least 100 trades? (Preferably 200+)
|
||||
- [ ] At least 5 years of data? (Preferably 10+)
|
||||
- [ ] Includes full market cycle?
|
||||
- [ ] Tested across multiple market regimes?
|
||||
|
||||
### Parameter Robustness
|
||||
- [ ] Does strategy work with nearby parameter values?
|
||||
- [ ] Are there "plateaus" of stable performance?
|
||||
- [ ] Minimal parameters (ideally <5)?
|
||||
- [ ] Parameters based on logical reasoning, not pure optimization?
|
||||
|
||||
### Execution Realism
|
||||
- [ ] Realistic commissions included?
|
||||
- [ ] Slippage modeled conservatively (1.5-2x typical)?
|
||||
- [ ] Worst-case fills considered?
|
||||
- [ ] Order rejection/partial fills addressed?
|
||||
|
||||
### Performance Characteristics
|
||||
- [ ] Positive expectancy in majority of years?
|
||||
- [ ] Acceptable performance in all major regimes?
|
||||
- [ ] No catastrophic drawdowns (>50%)?
|
||||
- [ ] Edge large enough to survive friction?
|
||||
|
||||
### Bias Prevention
|
||||
- [ ] Strategy defined before testing?
|
||||
- [ ] Hypothesis has economic logic?
|
||||
- [ ] Results aren't "too good to be true"?
|
||||
- [ ] Out-of-sample testing performed?
|
||||
- [ ] No cherry-picking of examples?
|
||||
|
||||
### Tool Limitations
|
||||
- [ ] Aware of testing platform's interpolation methods?
|
||||
- [ ] Understand how platform handles low-liquidity situations?
|
||||
- [ ] Know quirks specific to data provider?
|
||||
|
||||
**If more than 2-3 items aren't checked, the backtest requires additional work before considering live implementation.**
|
||||
227
references/methodology.md
Normal file
227
references/methodology.md
Normal file
@@ -0,0 +1,227 @@
|
||||
# Backtesting Methodology Reference
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. Core Testing Techniques
|
||||
2. Stress Testing Methods
|
||||
3. Parameter Sensitivity Analysis
|
||||
4. Slippage and Friction Modeling
|
||||
5. Sample Size Guidelines
|
||||
6. Market Regime Analysis
|
||||
7. Common Pitfalls and Biases
|
||||
|
||||
## 1. Core Testing Techniques
|
||||
|
||||
### "Beat Ideas to Death" Approach
|
||||
|
||||
**Core principle**: Add friction and punishment to find strategies that break the least, not those that profit the most on paper.
|
||||
|
||||
**Key techniques**:
|
||||
- Multiple stop loss variations
|
||||
- Different profit targets
|
||||
- Realistic + exaggerated commissions
|
||||
- Worst-case fills
|
||||
- Extended time periods
|
||||
- Multiple market regimes
|
||||
|
||||
### The 80/20 Rule for R&D Time
|
||||
|
||||
- 20% generating and codifying ideas
|
||||
- 80% stress testing and trying to break them
|
||||
|
||||
## 2. Stress Testing Methods
|
||||
|
||||
### Execution Friction Tests
|
||||
|
||||
**Required friction additions**:
|
||||
- Realistic commissions (actual broker rates)
|
||||
- Pessimistic slippage (1.5-2x typical)
|
||||
- Worst-case entry fills (ask + 1-2 ticks)
|
||||
- Worst-case exit fills (bid - 1-2 ticks)
|
||||
- Order rejection scenarios
|
||||
- Partial fills
|
||||
|
||||
### Parameter Robustness Tests
|
||||
|
||||
Test across multiple configurations:
|
||||
- Entry timing variations (±15-30 minutes)
|
||||
- Stop loss distances (50%, 75%, 100%, 125%, 150% of baseline)
|
||||
- Profit targets (80%, 90%, 100%, 110%, 120% of baseline)
|
||||
- Position sizing rules
|
||||
- Filter thresholds
|
||||
|
||||
**Goal**: Find "plateau" performance where small parameter changes don't drastically alter results.
|
||||
|
||||
### Time-Based Robustness
|
||||
|
||||
**Minimum requirements**:
|
||||
- Test across at least 5-10 years
|
||||
- Include multiple market regimes:
|
||||
- Bull markets
|
||||
- Bear markets
|
||||
- High volatility periods
|
||||
- Low volatility periods
|
||||
- Trending markets
|
||||
- Range-bound markets
|
||||
|
||||
**Year-by-year analysis**: Strategy should show positive expectancy in majority of years, not rely on 1-2 exceptional years.
|
||||
|
||||
## 3. Parameter Sensitivity Analysis
|
||||
|
||||
### Heat Map Analysis
|
||||
|
||||
Create 2D heat maps varying two parameters simultaneously:
|
||||
- Profit target (rows) × Stop loss (columns)
|
||||
- Entry time (rows) × Exit time (columns)
|
||||
- Volatility filter (rows) × Volume filter (columns)
|
||||
|
||||
**Interpretation**:
|
||||
- Robust strategies show "plateaus" of consistent performance
|
||||
- Fragile strategies show "spikes" or narrow optimal ranges
|
||||
- Avoid strategies with performance cliffs at parameter boundaries
|
||||
|
||||
### Walk-Forward Analysis
|
||||
|
||||
1. Optimize parameters on training period (e.g., Year 1-2)
|
||||
2. Test with those parameters on validation period (Year 3)
|
||||
3. Roll forward and repeat
|
||||
4. Compare in-sample vs out-of-sample performance
|
||||
|
||||
**Warning signs**:
|
||||
- Out-of-sample performance <50% of in-sample
|
||||
- Frequent need to re-optimize parameters
|
||||
- Parameters that change dramatically between periods
|
||||
|
||||
## 4. Slippage and Friction Modeling
|
||||
|
||||
### Realistic Slippage Assumptions
|
||||
|
||||
**By market capitalization**:
|
||||
- Mega cap (>$200B): 0.01-0.02%
|
||||
- Large cap ($10B-$200B): 0.02-0.05%
|
||||
- Mid cap ($2B-$10B): 0.05-0.10%
|
||||
- Small cap ($300M-$2B): 0.10-0.20%
|
||||
- Micro cap (<$300M): 0.20-0.50%+
|
||||
|
||||
**By order type**:
|
||||
- Market orders: Higher slippage
|
||||
- Limit orders: Lower slippage but potential non-fills
|
||||
- Stop orders: Significant slippage in volatile conditions
|
||||
|
||||
### Conservative Testing Approach
|
||||
|
||||
Use 1.5-2x typical slippage estimates for stress testing:
|
||||
- If typical slippage is 0.05%, test with 0.075-0.10%
|
||||
- If typical is 0.10%, test with 0.15-0.20%
|
||||
|
||||
**Rationale**: Strategies that survive pessimistic assumptions often perform better in practice than in backtests.
|
||||
|
||||
## 5. Sample Size Guidelines
|
||||
|
||||
### Minimum Trade Requirements
|
||||
|
||||
**Statistical significance thresholds**:
|
||||
- Absolute minimum: 30 trades
|
||||
- Preferred minimum: 100 trades
|
||||
- High confidence: 200+ trades
|
||||
|
||||
**Why large samples matter**:
|
||||
- Reduces impact of outliers
|
||||
- Provides statistical confidence
|
||||
- Reveals true edge vs luck
|
||||
|
||||
### Time Period Considerations
|
||||
|
||||
**Minimum testing period**: 5 years
|
||||
**Preferred testing period**: 10+ years
|
||||
|
||||
**Must include**:
|
||||
- At least one full market cycle
|
||||
- Multiple volatility regimes
|
||||
- Different Federal Reserve policy environments
|
||||
|
||||
## 6. Market Regime Analysis
|
||||
|
||||
### Regime Classification
|
||||
|
||||
**Volatility-based regimes**:
|
||||
- Low volatility: VIX <15
|
||||
- Normal volatility: VIX 15-25
|
||||
- High volatility: VIX 25-35
|
||||
- Extreme volatility: VIX >35
|
||||
|
||||
**Trend-based regimes**:
|
||||
- Strong uptrend: Market +10%+ over 6 months
|
||||
- Moderate uptrend: Market +5% to +10% over 6 months
|
||||
- Sideways: Market -5% to +5% over 6 months
|
||||
- Downtrend: Market <-5% over 6 months
|
||||
|
||||
### Performance Requirements by Regime
|
||||
|
||||
**Robust strategy characteristics**:
|
||||
- Positive expectancy in majority of regimes
|
||||
- Acceptable (not necessarily best) in all regimes
|
||||
- No catastrophic failures in any single regime
|
||||
- Understanding of which regime causes weakness
|
||||
|
||||
## 7. Common Pitfalls and Biases
|
||||
|
||||
### Survivorship Bias
|
||||
|
||||
**Issue**: Testing only on currently-trading stocks ignores delisted/bankrupt companies.
|
||||
|
||||
**Solution**: Use survivorship-bias-free datasets that include historical delistings.
|
||||
|
||||
### Look-Ahead Bias
|
||||
|
||||
**Issue**: Using information not available at the time of trade.
|
||||
|
||||
**Examples**:
|
||||
- Using EOD data for intraday decisions
|
||||
- Using next-day's open for today's close decisions
|
||||
- Calculating indicators with future data points
|
||||
|
||||
**Prevention**: Strict timestamp control and data alignment checks.
|
||||
|
||||
### Curve-Fitting (Over-Optimization)
|
||||
|
||||
**Warning signs**:
|
||||
- Too many parameters (>5-7)
|
||||
- Highly specific parameter values (e.g., RSI = 37.3)
|
||||
- Perfect backtest results
|
||||
- Large performance drop in validation period
|
||||
|
||||
**Prevention techniques**:
|
||||
- Limit parameters to essential ones only
|
||||
- Use round numbers when possible
|
||||
- Require out-of-sample testing
|
||||
- Analyze parameter sensitivity
|
||||
|
||||
### Sample Selection Bias
|
||||
|
||||
**Issue**: Testing only on hand-picked examples (e.g., known market leaders).
|
||||
|
||||
**Problem**: Ignoring all stocks that met criteria but failed creates false impression of strategy quality.
|
||||
|
||||
**Solution**: Test on ALL historical examples meeting the criteria, not just successful outcomes.
|
||||
|
||||
### Hindsight Bias
|
||||
|
||||
**Issue**: Using outcome knowledge to influence decisions.
|
||||
|
||||
**Prevention for systematic trading**:
|
||||
- Define all rules in advance
|
||||
- No manual intervention based on hindsight
|
||||
- Test rules across all cases, not cherry-picked examples
|
||||
|
||||
### Data Mining Bias
|
||||
|
||||
**Issue**: Testing hundreds of strategies until finding one that "works" by random chance.
|
||||
|
||||
**Risk**: With enough attempts, random data will produce seemingly profitable patterns.
|
||||
|
||||
**Mitigation**:
|
||||
- Have hypothesis before testing
|
||||
- Require economic logic for the edge
|
||||
- Use Bonferroni correction for multiple comparisons
|
||||
- Demand higher significance thresholds (p < 0.01 instead of p < 0.05)
|
||||
Reference in New Issue
Block a user