Initial commit with translated description

2026-03-29 14:34:36 +08:00
commit f727ce26b6
4 changed files with 675 additions and 0 deletions
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,206 @@
+---
+name: backtest-expert
+description: "交易策略系统回测的专家指导。"
+---
+
+# Backtest Expert
+
+Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
+
+## Core Philosophy
+
+**Goal**: Find strategies that "break the least", not strategies that "profit the most" on paper.
+
+**Principle**: Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
+
+## When to Use This Skill
+
+Use this skill when:
+- Developing or validating systematic trading strategies
+- Evaluating whether a trading idea is robust enough for live implementation  
+- Troubleshooting why a backtest might be misleading
+- Learning proper backtesting methodology
+- Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
+- Assessing parameter sensitivity and regime dependence
+- Setting realistic expectations for slippage and execution costs
+
+## Backtesting Workflow
+
+### 1. State the Hypothesis
+
+Define the edge in one sentence.
+
+**Example**: "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
+
+If you can't articulate the edge clearly, don't proceed to testing.
+
+### 2. Codify Rules with Zero Discretion
+
+Define with complete specificity:
+- **Entry**: Exact conditions, timing, price type
+- **Exit**: Stop loss, profit target, time-based exit
+- **Position sizing**: Fixed $$, % of portfolio, volatility-adjusted
+- **Filters**: Market cap, volume, sector, volatility conditions
+- **Universe**: What instruments are eligible
+
+**Critical**: No subjective judgment allowed. Every decision must be rule-based and unambiguous.
+
+### 3. Run Initial Backtest
+
+Test over:
+- **Minimum 5 years** (preferably 10+)
+- **Multiple market regimes** (bull, bear, high/low volatility)
+- **Realistic costs**: Commissions + conservative slippage
+
+Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
+
+### 4. Stress Test the Strategy
+
+This is where 80% of testing time should be spent.
+
+**Parameter sensitivity**:
+- Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
+- Test profit target at 80%, 90%, 100%, 110%, 120% of baseline  
+- Vary entry/exit timing by ±15-30 minutes
+- Look for "plateaus" of stable performance, not narrow spikes
+
+**Execution friction**:
+- Increase slippage to 1.5-2x typical estimates
+- Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
+- Add realistic order rejection scenarios
+- Test with pessimistic commission structures
+
+**Time robustness**:
+- Analyze year-by-year performance
+- Require positive expectancy in majority of years
+- Ensure strategy doesn't rely on 1-2 exceptional periods
+- Test in different market regimes separately
+
+**Sample size**:
+- Absolute minimum: 30 trades
+- Preferred: 100+ trades
+- High confidence: 200+ trades
+
+### 5. Out-of-Sample Validation
+
+**Walk-forward analysis**:
+1. Optimize on training period (e.g., Year 1-3)
+2. Test on validation period (Year 4)
+3. Roll forward and repeat
+4. Compare in-sample vs out-of-sample performance
+
+**Warning signs**:
+- Out-of-sample <50% of in-sample performance
+- Need frequent parameter re-optimization
+- Parameters change dramatically between periods
+
+### 6. Evaluate Results
+
+**Questions to answer**:
+- Does edge survive pessimistic assumptions?
+- Is performance stable across parameter variations?
+- Does strategy work in multiple market regimes?
+- Is sample size sufficient for statistical confidence?
+- Are results realistic, not "too good to be true"?
+
+**Decision criteria**:
+- ✅ **Deploy**: Survives all stress tests with acceptable performance
+- 🔄 **Refine**: Core logic sound but needs parameter adjustment
+- ❌ **Abandon**: Fails stress tests or relies on fragile assumptions
+
+## Key Testing Principles
+
+### Punish the Strategy
+
+Add friction everywhere:
+- Commissions higher than reality
+- Slippage 1.5-2x typical
+- Worst-case fills
+- Order rejections
+- Partial fills
+
+**Rationale**: Strategies that survive pessimistic assumptions often outperform in live trading.
+
+### Seek Plateaus, Not Peaks
+
+Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
+
+**Good**: Strategy profitable with stop loss anywhere from 1.5% to 3.0%
+**Bad**: Strategy only works with stop loss at exactly 2.13%
+
+Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
+
+### Test All Cases, Not Cherry-Picked Examples
+
+**Wrong approach**: Study hand-picked "market leaders" that worked
+**Right approach**: Test every stock that met criteria, including those that failed
+
+Selective examples create survivorship bias and overestimate strategy quality.
+
+### Separate Idea Generation from Validation
+
+**Intuition**: Useful for generating hypotheses
+**Validation**: Must be purely data-driven
+
+Never let attachment to an idea influence interpretation of test results.
+
+## Common Failure Patterns
+
+Recognize these patterns early to save time:
+
+1. **Parameter sensitivity**: Only works with exact parameter values
+2. **Regime-specific**: Great in some years, terrible in others  
+3. **Slippage sensitivity**: Unprofitable when realistic costs added
+4. **Small sample**: Too few trades for statistical confidence
+5. **Look-ahead bias**: "Too good to be true" results
+6. **Over-optimization**: Many parameters, poor out-of-sample results
+
+See `references/failed_tests.md` for detailed examples and diagnostic framework.
+
+## Available Reference Documentation
+
+### Methodology Reference
+**File**: `references/methodology.md`
+
+**When to read**: For detailed guidance on specific testing techniques.
+
+**Contents**:
+- Stress testing methods
+- Parameter sensitivity analysis  
+- Slippage and friction modeling
+- Sample size requirements
+- Market regime classification
+- Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)
+
+### Failed Tests Reference
+**File**: `references/failed_tests.md`
+
+**When to read**: When strategy fails tests, or learning from past mistakes.
+
+**Contents**:
+- Why failures are valuable
+- Common failure patterns with examples
+- Case study documentation framework
+- Red flags checklist for evaluating backtests
+
+## Critical Reminders
+
+**Time allocation**: Spend 20% generating ideas, 80% trying to break them.
+
+**Context-free requirement**: If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
+
+**Red flag**: If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
+
+**Tool limitations**: Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
+
+**Statistical significance**: Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
+
+## Discretionary vs Systematic Differences
+
+This skill focuses on **systematic/quantitative** backtesting where:
+- All rules are codified in advance
+- No discretion or "feel" in execution  
+- Testing happens on all historical examples, not cherry-picked cases
+- Context (news, macro) is deliberately stripped out
+
+Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
--- a/_meta.json
+++ b/_meta.json
@@ -0,0 +1,6 @@
+{
+  "ownerId": "kn7agf701n3afzzbq8ge0wa8k1809wm4",
+  "slug": "backtest-expert",
+  "version": "0.1.0",
+  "publishedAt": 1769870095738
+}
--- a/references/failed_tests.md
+++ b/references/failed_tests.md
@@ -0,0 +1,236 @@
+# Learning from Failed Backtests
+
+## Table of Contents
+
+1. Why Failed Ideas Are Valuable
+2. Common Failure Patterns
+3. Case Study Framework
+4. Red Flags Checklist
+
+## 1. Why Failed Ideas Are Valuable
+
+### The Value of Failures
+
+**Key insights**:
+- Failed tests save capital by preventing live implementation
+- Failure patterns reveal which assumptions don't hold
+- Understanding what doesn't work narrows the search space
+- Failed tests build experience in recognizing fragile strategies
+
+### Documentation Discipline
+
+**Record for each failed idea**:
+- The hypothesis being tested
+- Why you thought it would work
+- What the data showed
+- Specific breaking points
+- Lessons learned
+
+**Purpose**: Build a library of "anti-patterns" to avoid repeating mistakes.
+
+## 2. Common Failure Patterns
+
+### Pattern 1: Parameter Sensitivity
+
+**Symptom**: Strategy only works with very specific parameter values.
+
+**Example scenario**:
+- Strategy profitable with stop loss at exactly 2.5%
+- Increasing to 3% or decreasing to 2% causes significant performance drop
+- No "plateau" of stable performance
+
+**Why it fails**: Real markets have noise; if small changes break the strategy, it likely captured noise, not signal.
+
+**Lesson**: Seek strategies with stable performance across parameter ranges.
+
+### Pattern 2: Regime-Specific Performance
+
+**Symptom**: Strategy works brilliantly in some years, terribly in others.
+
+**Example scenario**:
+- Great performance in 2017-2019 (low volatility bull market)
+- Catastrophic losses in 2020 (high volatility)
+- Poor performance in 2022 (downtrend)
+
+**Why it fails**: Strategy dependent on specific market conditions, not robust enough for diverse environments.
+
+**Lesson**: Require acceptable (not necessarily best) performance across all regimes.
+
+### Pattern 3: Slippage Sensitivity
+
+**Symptom**: Strategy becomes unprofitable when realistic trading costs added.
+
+**Example scenario**:
+- Backtest shows 0.5% average gain per trade
+- Adding 0.1% slippage per side (0.2% round-trip) eliminates profits
+- Strategy requires unrealistic fills to be profitable
+
+**Why it fails**: Edge too small to survive real-world friction.
+
+**Lesson**: Edge must be large enough to survive pessimistic assumptions about costs.
+
+### Pattern 4: Sample Size Issues
+
+**Symptom**: Strong results based on small number of trades.
+
+**Example scenario**:
+- Backtest shows 80% win rate
+- Only 15 total trades in 5 years
+- A few different outcomes would dramatically change results
+
+**Why it fails**: Insufficient data to distinguish edge from luck.
+
+**Lesson**: Require minimum 100 trades for meaningful conclusions, preferably 200+.
+
+### Pattern 5: Look-Ahead Bias
+
+**Symptom**: Perfect or near-perfect backtest results.
+
+**Example scenario**:
+- Strategy shows 95%+ win rate
+- Unrealistically good entry/exit timing
+- Performance too good to be realistic
+
+**Why it fails**: Likely using information not available at time of trade.
+
+**Lesson**: Be suspicious of "too good to be true" results; audit data alignment carefully.
+
+### Pattern 6: Over-Optimization (Curve Fitting)
+
+**Symptom**: Complex strategy with many parameters shows excellent in-sample results but poor out-of-sample.
+
+**Example scenario**:
+- Strategy uses 8-10 different indicators with specific thresholds
+- In-sample performance: 40% annual return
+- Out-of-sample performance: -5% annual return
+- Parameters needed constant re-optimization
+
+**Why it fails**: Fitted to historical noise rather than genuine market structure.
+
+**Lesson**: Prefer simple strategies with fewer parameters; demand strong out-of-sample results.
+
+## 3. Case Study Framework
+
+### Template for Documenting Failed Ideas
+
+Use this framework when a backtest fails:
+
+#### 1. Initial Hypothesis
+- **What edge were you trying to capture?**
+- **Why did you think this would work?**
+- **What was the logical basis?**
+
+#### 2. Implementation Details
+- **Entry rules** (specific and complete)
+- **Exit rules** (stop loss, profit target, time-based)
+- **Position sizing**
+- **Filters or conditions**
+
+#### 3. Test Results
+- **Basic metrics**:
+  - Total trades
+  - Win rate
+  - Average win/loss
+  - Max drawdown
+  - Annual returns by year
+  
+- **Parameter sensitivity**:
+  - How results changed with parameter variations
+  - Whether "plateau" of stable performance existed
+
+- **Regime analysis**:
+  - Performance in different market conditions
+  - Which regimes caused problems
+
+#### 4. Breaking Points
+- **What specifically caused the strategy to fail?**
+  - Slippage too high?
+  - Parameter sensitivity?
+  - Regime-specific?
+  - Insufficient sample size?
+
+#### 5. Lessons Learned
+- **What assumptions were wrong?**
+- **What would you test differently next time?**
+- **Are there salvageable elements?**
+
+### Example: Failed Momentum Reversal Strategy
+
+#### 1. Initial Hypothesis
+Tried to capture mean reversion after strong momentum moves. Hypothesis: Stocks that gap up 5%+ on earnings often pull back 2-3% before continuing, providing short-term reversal opportunity.
+
+#### 2. Implementation
+- Entry: Short when stock gaps up 5%+ on earnings at market open
+- Exit: Cover at 2% profit or 3% stop loss
+- Holding period: Maximum 3 days
+- Filters: Market cap >$2B, average volume >500K shares
+
+#### 3. Test Results
+- 67 trades over 5 years
+- Win rate: 58%
+- Avg win: 2.1%, Avg loss: 3.2%
+- Max drawdown: 18%
+- 2019-2021: Profitable
+- 2022-2023: Significant losses
+
+#### 4. Breaking Points
+- Strategy failed during strong momentum environments (2021 meme stocks)
+- Stop losses hit frequently during continued upward momentum
+- Gap-ups that continued higher immediately caused outsized losses
+- Small sample size (67 trades) provided low statistical confidence
+- Slippage on short entries during high volatility eliminated thin edge
+
+#### 5. Lessons Learned
+- Mean reversion strategies vulnerable during momentum regimes
+- Need regime filter (e.g., only trade during high VIX or weak market)
+- 5-year test insufficient for momentum strategies; need 10+ years
+- Edge too small (2% target vs 3% stop) to survive slippage
+- Better approach: Wait for actual pullback, then enter, rather than fade immediately
+
+## 4. Red Flags Checklist
+
+Use this checklist when evaluating any backtest:
+
+### Data Quality Issues
+- [ ] Has survivorship bias been addressed?
+- [ ] Are delisted stocks included in test?
+- [ ] Is data alignment correct (no look-ahead bias)?
+- [ ] Are corporate actions (splits, dividends) handled correctly?
+
+### Sample Size Concerns
+- [ ] At least 100 trades? (Preferably 200+)
+- [ ] At least 5 years of data? (Preferably 10+)
+- [ ] Includes full market cycle?
+- [ ] Tested across multiple market regimes?
+
+### Parameter Robustness
+- [ ] Does strategy work with nearby parameter values?
+- [ ] Are there "plateaus" of stable performance?
+- [ ] Minimal parameters (ideally <5)?
+- [ ] Parameters based on logical reasoning, not pure optimization?
+
+### Execution Realism
+- [ ] Realistic commissions included?
+- [ ] Slippage modeled conservatively (1.5-2x typical)?
+- [ ] Worst-case fills considered?
+- [ ] Order rejection/partial fills addressed?
+
+### Performance Characteristics
+- [ ] Positive expectancy in majority of years?
+- [ ] Acceptable performance in all major regimes?
+- [ ] No catastrophic drawdowns (>50%)?
+- [ ] Edge large enough to survive friction?
+
+### Bias Prevention
+- [ ] Strategy defined before testing?
+- [ ] Hypothesis has economic logic?
+- [ ] Results aren't "too good to be true"?
+- [ ] Out-of-sample testing performed?
+- [ ] No cherry-picking of examples?
+
+### Tool Limitations
+- [ ] Aware of testing platform's interpolation methods?
+- [ ] Understand how platform handles low-liquidity situations?
+- [ ] Know quirks specific to data provider?
+
+**If more than 2-3 items aren't checked, the backtest requires additional work before considering live implementation.**
--- a/references/methodology.md
+++ b/references/methodology.md
@@ -0,0 +1,227 @@
+# Backtesting Methodology Reference
+
+## Table of Contents
+
+1. Core Testing Techniques
+2. Stress Testing Methods
+3. Parameter Sensitivity Analysis
+4. Slippage and Friction Modeling
+5. Sample Size Guidelines
+6. Market Regime Analysis
+7. Common Pitfalls and Biases
+
+## 1. Core Testing Techniques
+
+### "Beat Ideas to Death" Approach
+
+**Core principle**: Add friction and punishment to find strategies that break the least, not those that profit the most on paper.
+
+**Key techniques**:
+- Multiple stop loss variations
+- Different profit targets
+- Realistic + exaggerated commissions
+- Worst-case fills
+- Extended time periods
+- Multiple market regimes
+
+### The 80/20 Rule for R&D Time
+
+- 20% generating and codifying ideas
+- 80% stress testing and trying to break them
+
+## 2. Stress Testing Methods
+
+### Execution Friction Tests
+
+**Required friction additions**:
+- Realistic commissions (actual broker rates)
+- Pessimistic slippage (1.5-2x typical)
+- Worst-case entry fills (ask + 1-2 ticks)
+- Worst-case exit fills (bid - 1-2 ticks)
+- Order rejection scenarios
+- Partial fills
+
+### Parameter Robustness Tests
+
+Test across multiple configurations:
+- Entry timing variations (±15-30 minutes)
+- Stop loss distances (50%, 75%, 100%, 125%, 150% of baseline)
+- Profit targets (80%, 90%, 100%, 110%, 120% of baseline)
+- Position sizing rules
+- Filter thresholds
+
+**Goal**: Find "plateau" performance where small parameter changes don't drastically alter results.
+
+### Time-Based Robustness
+
+**Minimum requirements**:
+- Test across at least 5-10 years
+- Include multiple market regimes:
+  - Bull markets
+  - Bear markets
+  - High volatility periods
+  - Low volatility periods
+  - Trending markets
+  - Range-bound markets
+
+**Year-by-year analysis**: Strategy should show positive expectancy in majority of years, not rely on 1-2 exceptional years.
+
+## 3. Parameter Sensitivity Analysis
+
+### Heat Map Analysis
+
+Create 2D heat maps varying two parameters simultaneously:
+- Profit target (rows) × Stop loss (columns)
+- Entry time (rows) × Exit time (columns)
+- Volatility filter (rows) × Volume filter (columns)
+
+**Interpretation**:
+- Robust strategies show "plateaus" of consistent performance
+- Fragile strategies show "spikes" or narrow optimal ranges
+- Avoid strategies with performance cliffs at parameter boundaries
+
+### Walk-Forward Analysis
+
+1. Optimize parameters on training period (e.g., Year 1-2)
+2. Test with those parameters on validation period (Year 3)
+3. Roll forward and repeat
+4. Compare in-sample vs out-of-sample performance
+
+**Warning signs**:
+- Out-of-sample performance <50% of in-sample
+- Frequent need to re-optimize parameters
+- Parameters that change dramatically between periods
+
+## 4. Slippage and Friction Modeling
+
+### Realistic Slippage Assumptions
+
+**By market capitalization**:
+- Mega cap (>$200B): 0.01-0.02%
+- Large cap ($10B-$200B): 0.02-0.05%
+- Mid cap ($2B-$10B): 0.05-0.10%
+- Small cap ($300M-$2B): 0.10-0.20%
+- Micro cap (<$300M): 0.20-0.50%+
+
+**By order type**:
+- Market orders: Higher slippage
+- Limit orders: Lower slippage but potential non-fills
+- Stop orders: Significant slippage in volatile conditions
+
+### Conservative Testing Approach
+
+Use 1.5-2x typical slippage estimates for stress testing:
+- If typical slippage is 0.05%, test with 0.075-0.10%
+- If typical is 0.10%, test with 0.15-0.20%
+
+**Rationale**: Strategies that survive pessimistic assumptions often perform better in practice than in backtests.
+
+## 5. Sample Size Guidelines
+
+### Minimum Trade Requirements
+
+**Statistical significance thresholds**:
+- Absolute minimum: 30 trades
+- Preferred minimum: 100 trades
+- High confidence: 200+ trades
+
+**Why large samples matter**:
+- Reduces impact of outliers
+- Provides statistical confidence
+- Reveals true edge vs luck
+
+### Time Period Considerations
+
+**Minimum testing period**: 5 years
+**Preferred testing period**: 10+ years
+
+**Must include**:
+- At least one full market cycle
+- Multiple volatility regimes
+- Different Federal Reserve policy environments
+
+## 6. Market Regime Analysis
+
+### Regime Classification
+
+**Volatility-based regimes**:
+- Low volatility: VIX <15
+- Normal volatility: VIX 15-25
+- High volatility: VIX 25-35
+- Extreme volatility: VIX >35
+
+**Trend-based regimes**:
+- Strong uptrend: Market +10%+ over 6 months
+- Moderate uptrend: Market +5% to +10% over 6 months
+- Sideways: Market -5% to +5% over 6 months
+- Downtrend: Market <-5% over 6 months
+
+### Performance Requirements by Regime
+
+**Robust strategy characteristics**:
+- Positive expectancy in majority of regimes
+- Acceptable (not necessarily best) in all regimes
+- No catastrophic failures in any single regime
+- Understanding of which regime causes weakness
+
+## 7. Common Pitfalls and Biases
+
+### Survivorship Bias
+
+**Issue**: Testing only on currently-trading stocks ignores delisted/bankrupt companies.
+
+**Solution**: Use survivorship-bias-free datasets that include historical delistings.
+
+### Look-Ahead Bias
+
+**Issue**: Using information not available at the time of trade.
+
+**Examples**:
+- Using EOD data for intraday decisions
+- Using next-day's open for today's close decisions
+- Calculating indicators with future data points
+
+**Prevention**: Strict timestamp control and data alignment checks.
+
+### Curve-Fitting (Over-Optimization)
+
+**Warning signs**:
+- Too many parameters (>5-7)
+- Highly specific parameter values (e.g., RSI = 37.3)
+- Perfect backtest results
+- Large performance drop in validation period
+
+**Prevention techniques**:
+- Limit parameters to essential ones only
+- Use round numbers when possible
+- Require out-of-sample testing
+- Analyze parameter sensitivity
+
+### Sample Selection Bias
+
+**Issue**: Testing only on hand-picked examples (e.g., known market leaders).
+
+**Problem**: Ignoring all stocks that met criteria but failed creates false impression of strategy quality.
+
+**Solution**: Test on ALL historical examples meeting the criteria, not just successful outcomes.
+
+### Hindsight Bias
+
+**Issue**: Using outcome knowledge to influence decisions.
+
+**Prevention for systematic trading**:
+- Define all rules in advance
+- No manual intervention based on hindsight
+- Test rules across all cases, not cherry-picked examples
+
+### Data Mining Bias
+
+**Issue**: Testing hundreds of strategies until finding one that "works" by random chance.
+
+**Risk**: With enough attempts, random data will produce seemingly profitable patterns.
+
+**Mitigation**:
+- Have hypothesis before testing
+- Require economic logic for the edge
+- Use Bonferroni correction for multiple comparisons
+- Demand higher significance thresholds (p < 0.01 instead of p < 0.05)