Initial commit with translated description

2026-03-29 08:32:46 +08:00
commit 6637dd34f9
11 changed files with 2613 additions and 0 deletions
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,651 @@
+---
+name: token-optimizer
+description: "通过智能模型路由、心跳优化、预算跟踪和原生2026.2.15功能（会话修剪、引导大小限制、缓存TTL对齐）降低OpenClaw令牌使用和API成本。在令牌成本高、遇到API速率限制或大规模托管多个代理时使用。4个可执行脚本（context_optimizer、model_router、heartbeat_optimizer、token_tracker）仅为本地使用——无网络请求、无子进程调用、无系统修改。"
+version: 3.0.0
+homepage: https://github.com/Asif2BD/OpenClaw-Token-Optimizer
+source: https://github.com/Asif2BD/OpenClaw-Token-Optimizer
+author: Asif2BD
+security:
+  verified: true
+  auditor: Oracle (Matrix Zion)
+  audit_date: 2026-02-18
+  scripts_no_network: true
+  scripts_no_code_execution: true
+  scripts_no_subprocess: true
+  scripts_data_local_only: true
+  reference_files_describe_external_services: true
+  optimize_sh_is_convenience_wrapper: true
+  optimize_sh_only_calls_bundled_python_scripts: true
+---
+
+# Token Optimizer
+
+Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.
+
+## Quick Start
+
+**Immediate actions** (no config changes needed):
+
+1. **Generate optimized AGENTS.md (BIGGEST WIN!):**
+   ```bash
+   python3 scripts/context_optimizer.py generate-agents
+   # Creates AGENTS.md.optimized — review and replace your current AGENTS.md
+   ```
+
+2. **Check what context you ACTUALLY need:**
+   ```bash
+   python3 scripts/context_optimizer.py recommend "hi, how are you?"
+   # Shows: Only 2 files needed (not 50+!)
+   ```
+
+3. **Install optimized heartbeat:**
+   ```bash
+   cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
+   ```
+
+4. **Enforce cheaper models for casual chat:**
+   ```bash
+   python3 scripts/model_router.py "thanks!"
+   # Single-provider Anthropic setup: Use Sonnet, not Opus
+   # Multi-provider setup (OpenRouter/Together): Use Haiku for max savings
+   ```
+
+5. **Check current token budget:**
+   ```bash
+   python3 scripts/token_tracker.py check
+   ```
+
+**Expected savings:** 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).
+
+## Core Capabilities
+
+### 0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)
+
+**The single highest-impact optimization available.** Most agents burn 3,000–15,000 tokens per session loading skill files they never use. Stop that first.
+
+**The pattern:**
+
+1. Create a lightweight `SKILLS.md` catalog in your workspace (~300 tokens — list of skills + when to load them)
+2. Only load individual SKILL.md files when a task actually needs them
+3. Apply the same logic to memory files — load MEMORY.md at startup, daily logs only on demand
+
+**Token savings:**
+
+| Library size | Before (eager) | After (lazy) | Savings |
+|---|---|---|---|
+| 5 skills | ~3,000 tokens | ~600 tokens | **80%** |
+| 10 skills | ~6,500 tokens | ~750 tokens | **88%** |
+| 20 skills | ~13,000 tokens | ~900 tokens | **93%** |
+
+**Quick implementation in AGENTS.md:**
+
+```markdown
+## Skills
+
+At session start: Read SKILLS.md (the index only — ~300 tokens).
+Load individual skill files ONLY when a task requires them.
+Never load all skills upfront.
+```
+
+**Full implementation (with catalog template + optimizer script):**
+
+```bash
+clawhub install openclaw-skill-lazy-loader
+```
+
+The companion skill `openclaw-skill-lazy-loader` includes a `SKILLS.md.template`, an `AGENTS.md.template` lazy-loading section, and a `context_optimizer.py` CLI that recommends exactly which skills to load for any given task.
+
+**Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs.** Together they cover the full token lifecycle.
+
+---
+
+### 1. Context Optimization (NEW!)
+
+**Biggest token saver** — Only load files you actually need, not everything upfront.
+
+**Problem:** Default OpenClaw loads ALL context files every session:
+- SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
+- docs/**/*.md (hundreds of files)
+- memory/2026-*.md (daily logs)
+- Total: Often 50K+ tokens before user even speaks!
+
+**Solution:** Lazy loading based on prompt complexity.
+
+**Usage:**
+```bash
+python3 scripts/context_optimizer.py recommend "<user prompt>"
+```
+
+**Examples:**
+```bash
+# Simple greeting → minimal context (2 files only!)
+context_optimizer.py recommend "hi"
+→ Load: SOUL.md, IDENTITY.md
+→ Skip: Everything else
+→ Savings: ~80% of context
+
+# Standard work → selective loading
+context_optimizer.py recommend "write a function"
+→ Load: SOUL.md, IDENTITY.md, memory/TODAY.md
+→ Skip: docs, old memory, knowledge base
+→ Savings: ~50% of context
+
+# Complex task → full context
+context_optimizer.py recommend "analyze our entire architecture"
+→ Load: SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
+→ Conditionally load: Relevant docs only
+→ Savings: ~30% of context
+```
+
+**Output format:**
+```json
+{
+  "complexity": "simple",
+  "context_level": "minimal",
+  "recommended_files": ["SOUL.md", "IDENTITY.md"],
+  "file_count": 2,
+  "savings_percent": 80,
+  "skip_patterns": ["docs/**/*.md", "memory/20*.md"]
+}
+```
+
+**Integration pattern:**
+Before loading context for a new session:
+```python
+from context_optimizer import recommend_context_bundle
+
+user_prompt = "thanks for your help"
+recommendation = recommend_context_bundle(user_prompt)
+
+if recommendation["context_level"] == "minimal":
+    # Load only SOUL.md + IDENTITY.md
+    # Skip everything else
+    # Save ~80% tokens!
+```
+
+**Generate optimized AGENTS.md:**
+```bash
+context_optimizer.py generate-agents
+# Creates AGENTS.md.optimized with lazy loading instructions
+# Review and replace your current AGENTS.md
+```
+
+**Expected savings:** 50-80% reduction in context tokens.
+
+### 2. Smart Model Routing (ENHANCED!)
+
+Automatically classify tasks and route to appropriate model tiers.
+
+**NEW: Communication pattern enforcement** — Never waste Opus tokens on "hi" or "thanks"!
+
+**Usage:**
+```bash
+python3 scripts/model_router.py "<user prompt>" [current_model] [force_tier]
+```
+
+**Examples:**
+```bash
+# Communication (NEW!) → ALWAYS Haiku
+python3 scripts/model_router.py "thanks!"
+python3 scripts/model_router.py "hi"
+python3 scripts/model_router.py "ok got it"
+→ Enforced: Haiku (NEVER Sonnet/Opus for casual chat)
+
+# Simple task → suggests Haiku
+python3 scripts/model_router.py "read the log file"
+
+# Medium task → suggests Sonnet
+python3 scripts/model_router.py "write a function to parse JSON"
+
+# Complex task → suggests Opus
+python3 scripts/model_router.py "design a microservices architecture"
+```
+
+**Patterns enforced to Haiku (NEVER Sonnet/Opus):**
+
+*Communication:*
+- Greetings: hi, hey, hello, yo
+- Thanks: thanks, thank you, thx
+- Acknowledgments: ok, sure, got it, understood
+- Short responses: yes, no, yep, nope
+- Single words or very short phrases
+
+*Background tasks:*
+- Heartbeat checks: "check email", "monitor servers"
+- Cronjobs: "scheduled task", "periodic check", "reminder"
+- Document parsing: "parse CSV", "extract data from log", "read JSON"
+- Log scanning: "scan error logs", "process logs"
+
+**Integration pattern:**
+```python
+from model_router import route_task
+
+user_prompt = "show me the config"
+routing = route_task(user_prompt)
+
+if routing["should_switch"]:
+    # Use routing["recommended_model"]
+    # Save routing["cost_savings_percent"]
+```
+
+**Customization:**
+Edit `ROUTING_RULES` or `COMMUNICATION_PATTERNS` in `scripts/model_router.py` to adjust patterns and keywords.
+
+### 3. Heartbeat Optimization
+
+Reduce API calls from heartbeat polling with smart interval tracking:
+
+**Setup:**
+```bash
+# Copy template to workspace
+cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
+
+# Plan which checks should run
+python3 scripts/heartbeat_optimizer.py plan
+```
+
+**Commands:**
+```bash
+# Check if specific type should run now
+heartbeat_optimizer.py check email
+heartbeat_optimizer.py check calendar
+
+# Record that a check was performed
+heartbeat_optimizer.py record email
+
+# Update check interval (seconds)
+heartbeat_optimizer.py interval email 7200  # 2 hours
+
+# Reset state
+heartbeat_optimizer.py reset
+```
+
+**How it works:**
+- Tracks last check time for each type (email, calendar, weather, etc.)
+- Enforces minimum intervals before re-checking
+- Respects quiet hours (23:00-08:00) — skips all checks
+- Returns `HEARTBEAT_OK` when nothing needs attention (saves tokens)
+
+**Default intervals:**
+- Email: 60 minutes
+- Calendar: 2 hours
+- Weather: 4 hours
+- Social: 2 hours
+- Monitoring: 30 minutes
+
+**Integration in HEARTBEAT.md:**
+```markdown
+## Email Check
+Run only if: `heartbeat_optimizer.py check email` → `should_check: true`
+After checking: `heartbeat_optimizer.py record email`
+```
+
+**Expected savings:** 50% reduction in heartbeat API calls.
+
+**Model enforcement:** Heartbeat should ALWAYS use Haiku — see updated `HEARTBEAT.template.md` for model override instructions.
+
+### 4. Cronjob Optimization (NEW!)
+
+**Problem:** Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.
+
+**Solution:** Always specify Haiku for 90% of scheduled tasks.
+
+**See:** `assets/cronjob-model-guide.md` for comprehensive guide with examples.
+
+**Quick reference:**
+
+| Task Type | Model | Example |
+|-----------|-------|---------|
+| Monitoring/alerts | Haiku | Check server health, disk space |
+| Data parsing | Haiku | Extract CSV/JSON/logs |
+| Reminders | Haiku | Daily standup, backup reminders |
+| Simple reports | Haiku | Status summaries |
+| Content generation | Sonnet | Blog summaries (quality matters) |
+| Deep analysis | Sonnet | Weekly insights |
+| Complex reasoning | Never use Opus for cronjobs |
+
+**Example (good):**
+```bash
+# Parse daily logs with Haiku
+cron add --schedule "0 2 * * *" \
+  --payload '{
+    "kind":"agentTurn",
+    "message":"Parse yesterday error logs and summarize",
+    "model":"anthropic/claude-haiku-4"
+  }' \
+  --sessionTarget isolated
+```
+
+**Example (bad):**
+```bash
+# ❌ Using Opus for simple check (60x more expensive!)
+cron add --schedule "*/15 * * * *" \
+  --payload '{
+    "kind":"agentTurn",
+    "message":"Check email",
+    "model":"anthropic/claude-opus-4"
+  }' \
+  --sessionTarget isolated
+```
+
+**Savings:** Using Haiku instead of Opus for 10 daily cronjobs = **$17.70/month saved per agent**.
+
+**Integration with model_router:**
+```bash
+# Test if your cronjob should use Haiku
+model_router.py "parse daily error logs"
+# → Output: Haiku (background task pattern detected)
+```
+
+### 5. Token Budget Tracking
+
+Monitor usage and alert when approaching limits:
+
+**Setup:**
+```bash
+# Check current daily usage
+python3 scripts/token_tracker.py check
+
+# Get model suggestions
+python3 scripts/token_tracker.py suggest general
+
+# Reset daily tracking
+python3 scripts/token_tracker.py reset
+```
+
+**Output format:**
+```json
+{
+  "date": "2026-02-06",
+  "cost": 2.50,
+  "tokens": 50000,
+  "limit": 5.00,
+  "percent_used": 50,
+  "status": "ok",
+  "alert": null
+}
+```
+
+**Status levels:**
+- `ok`: Below 80% of daily limit
+- `warning`: 80-99% of daily limit
+- `exceeded`: Over daily limit
+
+**Integration pattern:**
+Before starting expensive operations, check budget:
+```python
+import json
+import subprocess
+
+result = subprocess.run(
+    ["python3", "scripts/token_tracker.py", "check"],
+    capture_output=True, text=True
+)
+budget = json.loads(result.stdout)
+
+if budget["status"] == "exceeded":
+    # Switch to cheaper model or defer non-urgent work
+    use_model = "anthropic/claude-haiku-4"
+elif budget["status"] == "warning":
+    # Use balanced model
+    use_model = "anthropic/claude-sonnet-4-5"
+```
+
+**Customization:**
+Edit `daily_limit_usd` and `warn_threshold` parameters in function calls.
+
+### 6. Multi-Provider Strategy
+
+See `references/PROVIDERS.md` for comprehensive guide on:
+- Alternative providers (OpenRouter, Together.ai, Google AI Studio)
+- Cost comparison tables
+- Routing strategies by task complexity
+- Fallback chains for rate-limited scenarios
+- API key management
+
+**Quick reference:**
+
+| Provider | Model | Cost/MTok | Use Case |
+|----------|-------|-----------|----------|
+| Anthropic | Haiku 4 | $0.25 | Simple tasks |
+| Anthropic | Sonnet 4.5 | $3.00 | Balanced default |
+| Anthropic | Opus 4 | $15.00 | Complex reasoning |
+| OpenRouter | Gemini 2.5 Flash | $0.075 | Bulk operations |
+| Google AI | Gemini 2.0 Flash Exp | FREE | Dev/testing |
+| Together | Llama 3.3 70B | $0.18 | Open alternative |
+
+## Configuration Patches
+
+See `assets/config-patches.json` for advanced optimizations:
+
+**Implemented by this skill:**
+- ✅ Heartbeat optimization (fully functional)
+- ✅ Token budget tracking (fully functional)
+- ✅ Model routing logic (fully functional)
+
+**Native OpenClaw 2026.2.15 — apply directly:**
+- ✅ Session pruning (`contextPruning: cache-ttl`) — auto-trims old tool results after Anthropic cache TTL expires
+- ✅ Bootstrap size limits (`bootstrapMaxChars` / `bootstrapTotalMaxChars`) — caps workspace file injection size
+- ✅ Cache retention long (`cacheRetention: "long"` for Opus) — amortizes cache write costs
+
+**Requires OpenClaw core support:**
+- ⏳ Prompt caching (Anthropic API feature — verify current status)
+- ⏳ Lazy context loading (use `context_optimizer.py` script today)
+- ⏳ Multi-provider fallback (partially supported)
+
+**Apply config patches:**
+```bash
+# Example: Enable multi-provider fallback
+gateway config.patch --patch '{"providers": [...]}'
+```
+
+## Native OpenClaw Diagnostics (2026.2.15+)
+
+OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.
+
+### Context breakdown
+```
+/context list    → token count per injected file (shows exactly what's eating your prompt)
+/context detail  → full breakdown including tools, skills, and system prompt sections
+```
+**Use before applying `bootstrap_size_limits`** — see which files are oversized, then set `bootstrapMaxChars` accordingly.
+
+### Per-response usage tracking
+```
+/usage tokens    → append token count to every reply
+/usage full      → append tokens + cost estimate to every reply
+/usage cost      → show cumulative cost summary from session logs
+/usage off       → disable usage footer
+```
+**Combine with `token_tracker.py`** — `/usage cost` gives session totals; `token_tracker.py` tracks daily budget.
+
+### Session status
+```
+/status          → model, context %, last response tokens, estimated cost
+```
+
+---
+
+## Cache TTL Heartbeat Alignment (NEW in v1.4.0)
+
+**The problem:** Anthropic charges ~3.75x more for cache *writes* than cache *reads*. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.
+
+**The fix:** Set heartbeat interval to **55min** (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.
+
+```bash
+# Get optimal interval for your cache TTL
+python3 scripts/heartbeat_optimizer.py cache-ttl
+# → recommended_interval: 55min (3300s)
+# → explanation: keeps 1h Anthropic cache warm
+
+# Custom TTL (e.g., if you've configured 2h cache)
+python3 scripts/heartbeat_optimizer.py cache-ttl 7200
+# → recommended_interval: 115min
+```
+
+**Apply to your OpenClaw config:**
+```json
+{
+  "agents": {
+    "defaults": {
+      "heartbeat": {
+        "every": "55m"
+      }
+    }
+  }
+}
+```
+
+**Who benefits:** Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.
+
+---
+
+## Deployment Patterns
+
+### For Personal Use
+1. Install optimized `HEARTBEAT.md`
+2. Run budget checks before expensive operations
+3. Manually route complex tasks to Opus only when needed
+
+**Expected savings:** 20-30%
+
+### For Managed Hosting (xCloud, etc.)
+1. Default all agents to Haiku
+2. Route user interactions to Sonnet
+3. Reserve Opus for explicitly complex requests
+4. Use Gemini Flash for background operations
+5. Implement daily budget caps per customer
+
+**Expected savings:** 40-60%
+
+### For High-Volume Deployments
+1. Use multi-provider fallback (OpenRouter + Together.ai)
+2. Implement aggressive routing (80% Gemini, 15% Haiku, 5% Sonnet)
+3. Deploy local Ollama for offline/cheap operations
+4. Batch heartbeat checks (every 2-4 hours, not 30 min)
+
+**Expected savings:** 70-90%
+
+## Integration Examples
+
+### Workflow: Smart Task Handling
+```bash
+# 1. User sends message
+user_msg="debug this error in the logs"
+
+# 2. Route to appropriate model
+routing=$(python3 scripts/model_router.py "$user_msg")
+model=$(echo $routing | jq -r .recommended_model)
+
+# 3. Check budget before proceeding
+budget=$(python3 scripts/token_tracker.py check)
+status=$(echo $budget | jq -r .status)
+
+if [ "$status" = "exceeded" ]; then
+    # Use cheapest model regardless of routing
+    model="anthropic/claude-haiku-4"
+fi
+
+# 4. Process with selected model
+# (OpenClaw handles this via config or override)
+```
+
+### Workflow: Optimized Heartbeat
+```markdown
+## HEARTBEAT.md
+
+# Plan what to check
+result=$(python3 scripts/heartbeat_optimizer.py plan)
+should_run=$(echo $result | jq -r .should_run)
+
+if [ "$should_run" = "false" ]; then
+    echo "HEARTBEAT_OK"
+    exit 0
+fi
+
+# Run only planned checks
+planned=$(echo $result | jq -r '.planned[].type')
+
+for check in $planned; do
+    case $check in
+        email) check_email ;;
+        calendar) check_calendar ;;
+    esac
+    python3 scripts/heartbeat_optimizer.py record $check
+done
+```
+
+## Troubleshooting
+
+**Issue:** Scripts fail with "module not found"
+- **Fix:** Ensure Python 3.7+ is installed. Scripts use only stdlib.
+
+**Issue:** State files not persisting
+- **Fix:** Check that `~/.openclaw/workspace/memory/` directory exists and is writable.
+
+**Issue:** Budget tracking shows $0.00
+- **Fix:** `token_tracker.py` needs integration with OpenClaw's `session_status` tool. Currently tracks manually recorded usage.
+
+**Issue:** Routing suggests wrong model tier
+- **Fix:** Customize `ROUTING_RULES` in `model_router.py` for your specific patterns.
+
+## Maintenance
+
+**Daily:**
+- Check budget status: `token_tracker.py check`
+
+**Weekly:**
+- Review routing accuracy (are suggestions correct?)
+- Adjust heartbeat intervals based on activity
+
+**Monthly:**
+- Compare costs before/after optimization
+- Review and update `PROVIDERS.md` with new options
+
+## Cost Estimation
+
+**Example: 100K tokens/day workload**
+
+Without skill:
+- 50K context tokens + 50K conversation tokens = 100K total
+- All Sonnet: 100K × $3/MTok = **$0.30/day = $9/month**
+
+| Strategy | Context | Model | Daily Cost | Monthly | Savings |
+|----------|---------|-------|-----------|---------|---------|
+| Baseline (no optimization) | 50K | Sonnet | $0.30 | $9.00 | 0% |
+| Context opt only | 10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
+| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
+| **Both (this skill)** | **10K** | **Mixed** | **$0.09** | **$2.70** | **70%** |
+| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 | **90%** |
+
+**Key insight:** Context optimization (50K → 10K tokens) saves MORE than model routing!
+
+**xCloud hosting scenario** (100 customers, 50K tokens/customer/day):
+- Baseline (all Sonnet, full context): $450/month
+- With token-optimizer: $135/month
+- **Savings: $315/month per 100 customers (70%)**
+
+## Resources
+
+### Scripts (4 total)
+- **`context_optimizer.py`** — Context loading optimization and lazy loading (NEW!)
+- **`model_router.py`** — Task classification, model suggestions, and communication enforcement (ENHANCED!)
+- **`heartbeat_optimizer.py`** — Interval management and check scheduling
+- **`token_tracker.py`** — Budget monitoring and alerts
+
+### References
+- `PROVIDERS.md` — Alternative AI providers, pricing, and routing strategies
+
+### Assets (3 total)
+- **`HEARTBEAT.template.md`** — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)
+- **`cronjob-model-guide.md`** — Complete guide for choosing models in cronjobs (NEW!)
+- **`config-patches.json`** — Advanced configuration examples
+
+## Future Enhancements
+
+Ideas for extending this skill:
+1. **Auto-routing integration** — Hook into OpenClaw message pipeline
+2. **Real-time usage tracking** — Parse session_status automatically
+3. **Cost forecasting** — Predict monthly spend based on recent usage
+4. **Provider health monitoring** — Track API latency and failures
+5. **A/B testing** — Compare quality across different routing strategies