Files
veeramanikandanr48_backtest…/SKILL.md

207 lines
7.2 KiB
Markdown
Raw Normal View History

---
name: backtest-expert
description: "交易策略系统回测的专家指导。"
---
# Backtest Expert
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
## Core Philosophy
**Goal**: Find strategies that "break the least", not strategies that "profit the most" on paper.
**Principle**: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
## When to Use This Skill
Use this skill when:
- Developing or validating systematic trading strategies
- Evaluating whether a trading idea is robust enough for live implementation
- Troubleshooting why a backtest might be misleading
- Learning proper backtesting methodology
- Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
- Assessing parameter sensitivity and regime dependence
- Setting realistic expectations for slippage and execution costs
## Backtesting Workflow
### 1. State the Hypothesis
Define the edge in one sentence.
**Example**: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
### 2. Codify Rules with Zero Discretion
Define with complete specificity:
- **Entry**: Exact conditions, timing, price type
- **Exit**: Stop loss, profit target, time-based exit
- **Position sizing**: Fixed $$, % of portfolio, volatility-adjusted
- **Filters**: Market cap, volume, sector, volatility conditions
- **Universe**: What instruments are eligible
**Critical**: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
### 3. Run Initial Backtest
Test over:
- **Minimum 5 years** (preferably 10+)
- **Multiple market regimes** (bull, bear, high/low volatility)
- **Realistic costs**: Commissions + conservative slippage
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
### 4. Stress Test the Strategy
This is where 80% of testing time should be spent.
**Parameter sensitivity**:
- Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
- Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
- Vary entry/exit timing by ±15-30 minutes
- Look for "plateaus" of stable performance, not narrow spikes
**Execution friction**:
- Increase slippage to 1.5-2x typical estimates
- Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
- Add realistic order rejection scenarios
- Test with pessimistic commission structures
**Time robustness**:
- Analyze year-by-year performance
- Require positive expectancy in majority of years
- Ensure strategy doesn't rely on 1-2 exceptional periods
- Test in different market regimes separately
**Sample size**:
- Absolute minimum: 30 trades
- Preferred: 100+ trades
- High confidence: 200+ trades
### 5. Out-of-Sample Validation
**Walk-forward analysis**:
1. Optimize on training period (e.g., Year 1-3)
2. Test on validation period (Year 4)
3. Roll forward and repeat
4. Compare in-sample vs out-of-sample performance
**Warning signs**:
- Out-of-sample <50% of in-sample performance
- Need frequent parameter re-optimization
- Parameters change dramatically between periods
### 6. Evaluate Results
**Questions to answer**:
- Does edge survive pessimistic assumptions?
- Is performance stable across parameter variations?
- Does strategy work in multiple market regimes?
- Is sample size sufficient for statistical confidence?
- Are results realistic, not "too good to be true"?
**Decision criteria**:
-**Deploy**: Survives all stress tests with acceptable performance
- 🔄 **Refine**: Core logic sound but needs parameter adjustment
-**Abandon**: Fails stress tests or relies on fragile assumptions
## Key Testing Principles
### Punish the Strategy
Add friction everywhere:
- Commissions higher than reality
- Slippage 1.5-2x typical
- Worst-case fills
- Order rejections
- Partial fills
**Rationale**: Strategies that survive pessimistic assumptions often outperform in live trading.
### Seek Plateaus, Not Peaks
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
**Good**: Strategy profitable with stop loss anywhere from 1.5% to 3.0%
**Bad**: Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
### Test All Cases, Not Cherry-Picked Examples
**Wrong approach**: Study hand-picked "market leaders" that worked
**Right approach**: Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
### Separate Idea Generation from Validation
**Intuition**: Useful for generating hypotheses
**Validation**: Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
## Common Failure Patterns
Recognize these patterns early to save time:
1. **Parameter sensitivity**: Only works with exact parameter values
2. **Regime-specific**: Great in some years, terrible in others
3. **Slippage sensitivity**: Unprofitable when realistic costs added
4. **Small sample**: Too few trades for statistical confidence
5. **Look-ahead bias**: "Too good to be true" results
6. **Over-optimization**: Many parameters, poor out-of-sample results
See `references/failed_tests.md` for detailed examples and diagnostic framework.
## Available Reference Documentation
### Methodology Reference
**File**: `references/methodology.md`
**When to read**: For detailed guidance on specific testing techniques.
**Contents**:
- Stress testing methods
- Parameter sensitivity analysis
- Slippage and friction modeling
- Sample size requirements
- Market regime classification
- Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
### Failed Tests Reference
**File**: `references/failed_tests.md`
**When to read**: When strategy fails tests, or learning from past mistakes.
**Contents**:
- Why failures are valuable
- Common failure patterns with examples
- Case study documentation framework
- Red flags checklist for evaluating backtests
## Critical Reminders
**Time allocation**: Spend 20% generating ideas, 80% trying to break them.
**Context-free requirement**: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
**Red flag**: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
**Tool limitations**: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
**Statistical significance**: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
## Discretionary vs Systematic Differences
This skill focuses on **systematic/quantitative** backtesting where:
- All rules are codified in advance
- No discretion or "feel" in execution
- Testing happens on all historical examples, not cherry-picked cases
- Context (news, macro) is deliberately stripped out
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.