652 lines
20 KiB
Markdown
652 lines
20 KiB
Markdown
---
|
||
name: token-optimizer
|
||
description: "通过智能模型路由、心跳优化、预算跟踪和原生2026.2.15功能(会话修剪、引导大小限制、缓存TTL对齐)降低OpenClaw令牌使用和API成本。在令牌成本高、遇到API速率限制或大规模托管多个代理时使用。4个可执行脚本(context_optimizer、model_router、heartbeat_optimizer、token_tracker)仅为本地使用——无网络请求、无子进程调用、无系统修改。"
|
||
version: 3.0.0
|
||
homepage: https://github.com/Asif2BD/OpenClaw-Token-Optimizer
|
||
source: https://github.com/Asif2BD/OpenClaw-Token-Optimizer
|
||
author: Asif2BD
|
||
security:
|
||
verified: true
|
||
auditor: Oracle (Matrix Zion)
|
||
audit_date: 2026-02-18
|
||
scripts_no_network: true
|
||
scripts_no_code_execution: true
|
||
scripts_no_subprocess: true
|
||
scripts_data_local_only: true
|
||
reference_files_describe_external_services: true
|
||
optimize_sh_is_convenience_wrapper: true
|
||
optimize_sh_only_calls_bundled_python_scripts: true
|
||
---
|
||
|
||
# Token Optimizer
|
||
|
||
Comprehensive toolkit for reducing token usage and API costs in OpenClaw deployments. Combines smart model routing, optimized heartbeat intervals, usage tracking, and multi-provider strategies.
|
||
|
||
## Quick Start
|
||
|
||
**Immediate actions** (no config changes needed):
|
||
|
||
1. **Generate optimized AGENTS.md (BIGGEST WIN!):**
|
||
```bash
|
||
python3 scripts/context_optimizer.py generate-agents
|
||
# Creates AGENTS.md.optimized — review and replace your current AGENTS.md
|
||
```
|
||
|
||
2. **Check what context you ACTUALLY need:**
|
||
```bash
|
||
python3 scripts/context_optimizer.py recommend "hi, how are you?"
|
||
# Shows: Only 2 files needed (not 50+!)
|
||
```
|
||
|
||
3. **Install optimized heartbeat:**
|
||
```bash
|
||
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
|
||
```
|
||
|
||
4. **Enforce cheaper models for casual chat:**
|
||
```bash
|
||
python3 scripts/model_router.py "thanks!"
|
||
# Single-provider Anthropic setup: Use Sonnet, not Opus
|
||
# Multi-provider setup (OpenRouter/Together): Use Haiku for max savings
|
||
```
|
||
|
||
5. **Check current token budget:**
|
||
```bash
|
||
python3 scripts/token_tracker.py check
|
||
```
|
||
|
||
**Expected savings:** 50-80% reduction in token costs for typical workloads (context optimization is the biggest factor!).
|
||
|
||
## Core Capabilities
|
||
|
||
### 0. Lazy Skill Loading (NEW in v3.0 — BIGGEST WIN!)
|
||
|
||
**The single highest-impact optimization available.** Most agents burn 3,000–15,000 tokens per session loading skill files they never use. Stop that first.
|
||
|
||
**The pattern:**
|
||
|
||
1. Create a lightweight `SKILLS.md` catalog in your workspace (~300 tokens — list of skills + when to load them)
|
||
2. Only load individual SKILL.md files when a task actually needs them
|
||
3. Apply the same logic to memory files — load MEMORY.md at startup, daily logs only on demand
|
||
|
||
**Token savings:**
|
||
|
||
| Library size | Before (eager) | After (lazy) | Savings |
|
||
|---|---|---|---|
|
||
| 5 skills | ~3,000 tokens | ~600 tokens | **80%** |
|
||
| 10 skills | ~6,500 tokens | ~750 tokens | **88%** |
|
||
| 20 skills | ~13,000 tokens | ~900 tokens | **93%** |
|
||
|
||
**Quick implementation in AGENTS.md:**
|
||
|
||
```markdown
|
||
## Skills
|
||
|
||
At session start: Read SKILLS.md (the index only — ~300 tokens).
|
||
Load individual skill files ONLY when a task requires them.
|
||
Never load all skills upfront.
|
||
```
|
||
|
||
**Full implementation (with catalog template + optimizer script):**
|
||
|
||
```bash
|
||
clawhub install openclaw-skill-lazy-loader
|
||
```
|
||
|
||
The companion skill `openclaw-skill-lazy-loader` includes a `SKILLS.md.template`, an `AGENTS.md.template` lazy-loading section, and a `context_optimizer.py` CLI that recommends exactly which skills to load for any given task.
|
||
|
||
**Lazy loading handles context loading costs. The remaining capabilities below handle runtime costs.** Together they cover the full token lifecycle.
|
||
|
||
---
|
||
|
||
### 1. Context Optimization (NEW!)
|
||
|
||
**Biggest token saver** — Only load files you actually need, not everything upfront.
|
||
|
||
**Problem:** Default OpenClaw loads ALL context files every session:
|
||
- SOUL.md, AGENTS.md, USER.md, TOOLS.md, MEMORY.md
|
||
- docs/**/*.md (hundreds of files)
|
||
- memory/2026-*.md (daily logs)
|
||
- Total: Often 50K+ tokens before user even speaks!
|
||
|
||
**Solution:** Lazy loading based on prompt complexity.
|
||
|
||
**Usage:**
|
||
```bash
|
||
python3 scripts/context_optimizer.py recommend "<user prompt>"
|
||
```
|
||
|
||
**Examples:**
|
||
```bash
|
||
# Simple greeting → minimal context (2 files only!)
|
||
context_optimizer.py recommend "hi"
|
||
→ Load: SOUL.md, IDENTITY.md
|
||
→ Skip: Everything else
|
||
→ Savings: ~80% of context
|
||
|
||
# Standard work → selective loading
|
||
context_optimizer.py recommend "write a function"
|
||
→ Load: SOUL.md, IDENTITY.md, memory/TODAY.md
|
||
→ Skip: docs, old memory, knowledge base
|
||
→ Savings: ~50% of context
|
||
|
||
# Complex task → full context
|
||
context_optimizer.py recommend "analyze our entire architecture"
|
||
→ Load: SOUL.md, IDENTITY.md, MEMORY.md, memory/TODAY+YESTERDAY.md
|
||
→ Conditionally load: Relevant docs only
|
||
→ Savings: ~30% of context
|
||
```
|
||
|
||
**Output format:**
|
||
```json
|
||
{
|
||
"complexity": "simple",
|
||
"context_level": "minimal",
|
||
"recommended_files": ["SOUL.md", "IDENTITY.md"],
|
||
"file_count": 2,
|
||
"savings_percent": 80,
|
||
"skip_patterns": ["docs/**/*.md", "memory/20*.md"]
|
||
}
|
||
```
|
||
|
||
**Integration pattern:**
|
||
Before loading context for a new session:
|
||
```python
|
||
from context_optimizer import recommend_context_bundle
|
||
|
||
user_prompt = "thanks for your help"
|
||
recommendation = recommend_context_bundle(user_prompt)
|
||
|
||
if recommendation["context_level"] == "minimal":
|
||
# Load only SOUL.md + IDENTITY.md
|
||
# Skip everything else
|
||
# Save ~80% tokens!
|
||
```
|
||
|
||
**Generate optimized AGENTS.md:**
|
||
```bash
|
||
context_optimizer.py generate-agents
|
||
# Creates AGENTS.md.optimized with lazy loading instructions
|
||
# Review and replace your current AGENTS.md
|
||
```
|
||
|
||
**Expected savings:** 50-80% reduction in context tokens.
|
||
|
||
### 2. Smart Model Routing (ENHANCED!)
|
||
|
||
Automatically classify tasks and route to appropriate model tiers.
|
||
|
||
**NEW: Communication pattern enforcement** — Never waste Opus tokens on "hi" or "thanks"!
|
||
|
||
**Usage:**
|
||
```bash
|
||
python3 scripts/model_router.py "<user prompt>" [current_model] [force_tier]
|
||
```
|
||
|
||
**Examples:**
|
||
```bash
|
||
# Communication (NEW!) → ALWAYS Haiku
|
||
python3 scripts/model_router.py "thanks!"
|
||
python3 scripts/model_router.py "hi"
|
||
python3 scripts/model_router.py "ok got it"
|
||
→ Enforced: Haiku (NEVER Sonnet/Opus for casual chat)
|
||
|
||
# Simple task → suggests Haiku
|
||
python3 scripts/model_router.py "read the log file"
|
||
|
||
# Medium task → suggests Sonnet
|
||
python3 scripts/model_router.py "write a function to parse JSON"
|
||
|
||
# Complex task → suggests Opus
|
||
python3 scripts/model_router.py "design a microservices architecture"
|
||
```
|
||
|
||
**Patterns enforced to Haiku (NEVER Sonnet/Opus):**
|
||
|
||
*Communication:*
|
||
- Greetings: hi, hey, hello, yo
|
||
- Thanks: thanks, thank you, thx
|
||
- Acknowledgments: ok, sure, got it, understood
|
||
- Short responses: yes, no, yep, nope
|
||
- Single words or very short phrases
|
||
|
||
*Background tasks:*
|
||
- Heartbeat checks: "check email", "monitor servers"
|
||
- Cronjobs: "scheduled task", "periodic check", "reminder"
|
||
- Document parsing: "parse CSV", "extract data from log", "read JSON"
|
||
- Log scanning: "scan error logs", "process logs"
|
||
|
||
**Integration pattern:**
|
||
```python
|
||
from model_router import route_task
|
||
|
||
user_prompt = "show me the config"
|
||
routing = route_task(user_prompt)
|
||
|
||
if routing["should_switch"]:
|
||
# Use routing["recommended_model"]
|
||
# Save routing["cost_savings_percent"]
|
||
```
|
||
|
||
**Customization:**
|
||
Edit `ROUTING_RULES` or `COMMUNICATION_PATTERNS` in `scripts/model_router.py` to adjust patterns and keywords.
|
||
|
||
### 3. Heartbeat Optimization
|
||
|
||
Reduce API calls from heartbeat polling with smart interval tracking:
|
||
|
||
**Setup:**
|
||
```bash
|
||
# Copy template to workspace
|
||
cp assets/HEARTBEAT.template.md ~/.openclaw/workspace/HEARTBEAT.md
|
||
|
||
# Plan which checks should run
|
||
python3 scripts/heartbeat_optimizer.py plan
|
||
```
|
||
|
||
**Commands:**
|
||
```bash
|
||
# Check if specific type should run now
|
||
heartbeat_optimizer.py check email
|
||
heartbeat_optimizer.py check calendar
|
||
|
||
# Record that a check was performed
|
||
heartbeat_optimizer.py record email
|
||
|
||
# Update check interval (seconds)
|
||
heartbeat_optimizer.py interval email 7200 # 2 hours
|
||
|
||
# Reset state
|
||
heartbeat_optimizer.py reset
|
||
```
|
||
|
||
**How it works:**
|
||
- Tracks last check time for each type (email, calendar, weather, etc.)
|
||
- Enforces minimum intervals before re-checking
|
||
- Respects quiet hours (23:00-08:00) — skips all checks
|
||
- Returns `HEARTBEAT_OK` when nothing needs attention (saves tokens)
|
||
|
||
**Default intervals:**
|
||
- Email: 60 minutes
|
||
- Calendar: 2 hours
|
||
- Weather: 4 hours
|
||
- Social: 2 hours
|
||
- Monitoring: 30 minutes
|
||
|
||
**Integration in HEARTBEAT.md:**
|
||
```markdown
|
||
## Email Check
|
||
Run only if: `heartbeat_optimizer.py check email` → `should_check: true`
|
||
After checking: `heartbeat_optimizer.py record email`
|
||
```
|
||
|
||
**Expected savings:** 50% reduction in heartbeat API calls.
|
||
|
||
**Model enforcement:** Heartbeat should ALWAYS use Haiku — see updated `HEARTBEAT.template.md` for model override instructions.
|
||
|
||
### 4. Cronjob Optimization (NEW!)
|
||
|
||
**Problem:** Cronjobs often default to expensive models (Sonnet/Opus) even for routine tasks.
|
||
|
||
**Solution:** Always specify Haiku for 90% of scheduled tasks.
|
||
|
||
**See:** `assets/cronjob-model-guide.md` for comprehensive guide with examples.
|
||
|
||
**Quick reference:**
|
||
|
||
| Task Type | Model | Example |
|
||
|-----------|-------|---------|
|
||
| Monitoring/alerts | Haiku | Check server health, disk space |
|
||
| Data parsing | Haiku | Extract CSV/JSON/logs |
|
||
| Reminders | Haiku | Daily standup, backup reminders |
|
||
| Simple reports | Haiku | Status summaries |
|
||
| Content generation | Sonnet | Blog summaries (quality matters) |
|
||
| Deep analysis | Sonnet | Weekly insights |
|
||
| Complex reasoning | Never use Opus for cronjobs |
|
||
|
||
**Example (good):**
|
||
```bash
|
||
# Parse daily logs with Haiku
|
||
cron add --schedule "0 2 * * *" \
|
||
--payload '{
|
||
"kind":"agentTurn",
|
||
"message":"Parse yesterday error logs and summarize",
|
||
"model":"anthropic/claude-haiku-4"
|
||
}' \
|
||
--sessionTarget isolated
|
||
```
|
||
|
||
**Example (bad):**
|
||
```bash
|
||
# ❌ Using Opus for simple check (60x more expensive!)
|
||
cron add --schedule "*/15 * * * *" \
|
||
--payload '{
|
||
"kind":"agentTurn",
|
||
"message":"Check email",
|
||
"model":"anthropic/claude-opus-4"
|
||
}' \
|
||
--sessionTarget isolated
|
||
```
|
||
|
||
**Savings:** Using Haiku instead of Opus for 10 daily cronjobs = **$17.70/month saved per agent**.
|
||
|
||
**Integration with model_router:**
|
||
```bash
|
||
# Test if your cronjob should use Haiku
|
||
model_router.py "parse daily error logs"
|
||
# → Output: Haiku (background task pattern detected)
|
||
```
|
||
|
||
### 5. Token Budget Tracking
|
||
|
||
Monitor usage and alert when approaching limits:
|
||
|
||
**Setup:**
|
||
```bash
|
||
# Check current daily usage
|
||
python3 scripts/token_tracker.py check
|
||
|
||
# Get model suggestions
|
||
python3 scripts/token_tracker.py suggest general
|
||
|
||
# Reset daily tracking
|
||
python3 scripts/token_tracker.py reset
|
||
```
|
||
|
||
**Output format:**
|
||
```json
|
||
{
|
||
"date": "2026-02-06",
|
||
"cost": 2.50,
|
||
"tokens": 50000,
|
||
"limit": 5.00,
|
||
"percent_used": 50,
|
||
"status": "ok",
|
||
"alert": null
|
||
}
|
||
```
|
||
|
||
**Status levels:**
|
||
- `ok`: Below 80% of daily limit
|
||
- `warning`: 80-99% of daily limit
|
||
- `exceeded`: Over daily limit
|
||
|
||
**Integration pattern:**
|
||
Before starting expensive operations, check budget:
|
||
```python
|
||
import json
|
||
import subprocess
|
||
|
||
result = subprocess.run(
|
||
["python3", "scripts/token_tracker.py", "check"],
|
||
capture_output=True, text=True
|
||
)
|
||
budget = json.loads(result.stdout)
|
||
|
||
if budget["status"] == "exceeded":
|
||
# Switch to cheaper model or defer non-urgent work
|
||
use_model = "anthropic/claude-haiku-4"
|
||
elif budget["status"] == "warning":
|
||
# Use balanced model
|
||
use_model = "anthropic/claude-sonnet-4-5"
|
||
```
|
||
|
||
**Customization:**
|
||
Edit `daily_limit_usd` and `warn_threshold` parameters in function calls.
|
||
|
||
### 6. Multi-Provider Strategy
|
||
|
||
See `references/PROVIDERS.md` for comprehensive guide on:
|
||
- Alternative providers (OpenRouter, Together.ai, Google AI Studio)
|
||
- Cost comparison tables
|
||
- Routing strategies by task complexity
|
||
- Fallback chains for rate-limited scenarios
|
||
- API key management
|
||
|
||
**Quick reference:**
|
||
|
||
| Provider | Model | Cost/MTok | Use Case |
|
||
|----------|-------|-----------|----------|
|
||
| Anthropic | Haiku 4 | $0.25 | Simple tasks |
|
||
| Anthropic | Sonnet 4.5 | $3.00 | Balanced default |
|
||
| Anthropic | Opus 4 | $15.00 | Complex reasoning |
|
||
| OpenRouter | Gemini 2.5 Flash | $0.075 | Bulk operations |
|
||
| Google AI | Gemini 2.0 Flash Exp | FREE | Dev/testing |
|
||
| Together | Llama 3.3 70B | $0.18 | Open alternative |
|
||
|
||
## Configuration Patches
|
||
|
||
See `assets/config-patches.json` for advanced optimizations:
|
||
|
||
**Implemented by this skill:**
|
||
- ✅ Heartbeat optimization (fully functional)
|
||
- ✅ Token budget tracking (fully functional)
|
||
- ✅ Model routing logic (fully functional)
|
||
|
||
**Native OpenClaw 2026.2.15 — apply directly:**
|
||
- ✅ Session pruning (`contextPruning: cache-ttl`) — auto-trims old tool results after Anthropic cache TTL expires
|
||
- ✅ Bootstrap size limits (`bootstrapMaxChars` / `bootstrapTotalMaxChars`) — caps workspace file injection size
|
||
- ✅ Cache retention long (`cacheRetention: "long"` for Opus) — amortizes cache write costs
|
||
|
||
**Requires OpenClaw core support:**
|
||
- ⏳ Prompt caching (Anthropic API feature — verify current status)
|
||
- ⏳ Lazy context loading (use `context_optimizer.py` script today)
|
||
- ⏳ Multi-provider fallback (partially supported)
|
||
|
||
**Apply config patches:**
|
||
```bash
|
||
# Example: Enable multi-provider fallback
|
||
gateway config.patch --patch '{"providers": [...]}'
|
||
```
|
||
|
||
## Native OpenClaw Diagnostics (2026.2.15+)
|
||
|
||
OpenClaw 2026.2.15 added built-in commands that complement this skill's Python scripts. Use these first for quick diagnostics before reaching for the scripts.
|
||
|
||
### Context breakdown
|
||
```
|
||
/context list → token count per injected file (shows exactly what's eating your prompt)
|
||
/context detail → full breakdown including tools, skills, and system prompt sections
|
||
```
|
||
**Use before applying `bootstrap_size_limits`** — see which files are oversized, then set `bootstrapMaxChars` accordingly.
|
||
|
||
### Per-response usage tracking
|
||
```
|
||
/usage tokens → append token count to every reply
|
||
/usage full → append tokens + cost estimate to every reply
|
||
/usage cost → show cumulative cost summary from session logs
|
||
/usage off → disable usage footer
|
||
```
|
||
**Combine with `token_tracker.py`** — `/usage cost` gives session totals; `token_tracker.py` tracks daily budget.
|
||
|
||
### Session status
|
||
```
|
||
/status → model, context %, last response tokens, estimated cost
|
||
```
|
||
|
||
---
|
||
|
||
## Cache TTL Heartbeat Alignment (NEW in v1.4.0)
|
||
|
||
**The problem:** Anthropic charges ~3.75x more for cache *writes* than cache *reads*. If your agent goes idle and the 1h cache TTL expires, the next request re-writes the entire prompt cache — expensive.
|
||
|
||
**The fix:** Set heartbeat interval to **55min** (just under the 1h TTL). The heartbeat keeps the cache warm, so every subsequent request pays cache-read rates instead.
|
||
|
||
```bash
|
||
# Get optimal interval for your cache TTL
|
||
python3 scripts/heartbeat_optimizer.py cache-ttl
|
||
# → recommended_interval: 55min (3300s)
|
||
# → explanation: keeps 1h Anthropic cache warm
|
||
|
||
# Custom TTL (e.g., if you've configured 2h cache)
|
||
python3 scripts/heartbeat_optimizer.py cache-ttl 7200
|
||
# → recommended_interval: 115min
|
||
```
|
||
|
||
**Apply to your OpenClaw config:**
|
||
```json
|
||
{
|
||
"agents": {
|
||
"defaults": {
|
||
"heartbeat": {
|
||
"every": "55m"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Who benefits:** Anthropic API key users only. OAuth profiles already default to 1h heartbeat (OpenClaw smart default). API key profiles default to 30min — bumping to 55min is both cheaper (fewer calls) and cache-warm.
|
||
|
||
---
|
||
|
||
## Deployment Patterns
|
||
|
||
### For Personal Use
|
||
1. Install optimized `HEARTBEAT.md`
|
||
2. Run budget checks before expensive operations
|
||
3. Manually route complex tasks to Opus only when needed
|
||
|
||
**Expected savings:** 20-30%
|
||
|
||
### For Managed Hosting (xCloud, etc.)
|
||
1. Default all agents to Haiku
|
||
2. Route user interactions to Sonnet
|
||
3. Reserve Opus for explicitly complex requests
|
||
4. Use Gemini Flash for background operations
|
||
5. Implement daily budget caps per customer
|
||
|
||
**Expected savings:** 40-60%
|
||
|
||
### For High-Volume Deployments
|
||
1. Use multi-provider fallback (OpenRouter + Together.ai)
|
||
2. Implement aggressive routing (80% Gemini, 15% Haiku, 5% Sonnet)
|
||
3. Deploy local Ollama for offline/cheap operations
|
||
4. Batch heartbeat checks (every 2-4 hours, not 30 min)
|
||
|
||
**Expected savings:** 70-90%
|
||
|
||
## Integration Examples
|
||
|
||
### Workflow: Smart Task Handling
|
||
```bash
|
||
# 1. User sends message
|
||
user_msg="debug this error in the logs"
|
||
|
||
# 2. Route to appropriate model
|
||
routing=$(python3 scripts/model_router.py "$user_msg")
|
||
model=$(echo $routing | jq -r .recommended_model)
|
||
|
||
# 3. Check budget before proceeding
|
||
budget=$(python3 scripts/token_tracker.py check)
|
||
status=$(echo $budget | jq -r .status)
|
||
|
||
if [ "$status" = "exceeded" ]; then
|
||
# Use cheapest model regardless of routing
|
||
model="anthropic/claude-haiku-4"
|
||
fi
|
||
|
||
# 4. Process with selected model
|
||
# (OpenClaw handles this via config or override)
|
||
```
|
||
|
||
### Workflow: Optimized Heartbeat
|
||
```markdown
|
||
## HEARTBEAT.md
|
||
|
||
# Plan what to check
|
||
result=$(python3 scripts/heartbeat_optimizer.py plan)
|
||
should_run=$(echo $result | jq -r .should_run)
|
||
|
||
if [ "$should_run" = "false" ]; then
|
||
echo "HEARTBEAT_OK"
|
||
exit 0
|
||
fi
|
||
|
||
# Run only planned checks
|
||
planned=$(echo $result | jq -r '.planned[].type')
|
||
|
||
for check in $planned; do
|
||
case $check in
|
||
email) check_email ;;
|
||
calendar) check_calendar ;;
|
||
esac
|
||
python3 scripts/heartbeat_optimizer.py record $check
|
||
done
|
||
```
|
||
|
||
## Troubleshooting
|
||
|
||
**Issue:** Scripts fail with "module not found"
|
||
- **Fix:** Ensure Python 3.7+ is installed. Scripts use only stdlib.
|
||
|
||
**Issue:** State files not persisting
|
||
- **Fix:** Check that `~/.openclaw/workspace/memory/` directory exists and is writable.
|
||
|
||
**Issue:** Budget tracking shows $0.00
|
||
- **Fix:** `token_tracker.py` needs integration with OpenClaw's `session_status` tool. Currently tracks manually recorded usage.
|
||
|
||
**Issue:** Routing suggests wrong model tier
|
||
- **Fix:** Customize `ROUTING_RULES` in `model_router.py` for your specific patterns.
|
||
|
||
## Maintenance
|
||
|
||
**Daily:**
|
||
- Check budget status: `token_tracker.py check`
|
||
|
||
**Weekly:**
|
||
- Review routing accuracy (are suggestions correct?)
|
||
- Adjust heartbeat intervals based on activity
|
||
|
||
**Monthly:**
|
||
- Compare costs before/after optimization
|
||
- Review and update `PROVIDERS.md` with new options
|
||
|
||
## Cost Estimation
|
||
|
||
**Example: 100K tokens/day workload**
|
||
|
||
Without skill:
|
||
- 50K context tokens + 50K conversation tokens = 100K total
|
||
- All Sonnet: 100K × $3/MTok = **$0.30/day = $9/month**
|
||
|
||
| Strategy | Context | Model | Daily Cost | Monthly | Savings |
|
||
|----------|---------|-------|-----------|---------|---------|
|
||
| Baseline (no optimization) | 50K | Sonnet | $0.30 | $9.00 | 0% |
|
||
| Context opt only | 10K (-80%) | Sonnet | $0.18 | $5.40 | 40% |
|
||
| Model routing only | 50K | Mixed | $0.18 | $5.40 | 40% |
|
||
| **Both (this skill)** | **10K** | **Mixed** | **$0.09** | **$2.70** | **70%** |
|
||
| Aggressive + Gemini | 10K | Gemini | $0.03 | $0.90 | **90%** |
|
||
|
||
**Key insight:** Context optimization (50K → 10K tokens) saves MORE than model routing!
|
||
|
||
**xCloud hosting scenario** (100 customers, 50K tokens/customer/day):
|
||
- Baseline (all Sonnet, full context): $450/month
|
||
- With token-optimizer: $135/month
|
||
- **Savings: $315/month per 100 customers (70%)**
|
||
|
||
## Resources
|
||
|
||
### Scripts (4 total)
|
||
- **`context_optimizer.py`** — Context loading optimization and lazy loading (NEW!)
|
||
- **`model_router.py`** — Task classification, model suggestions, and communication enforcement (ENHANCED!)
|
||
- **`heartbeat_optimizer.py`** — Interval management and check scheduling
|
||
- **`token_tracker.py`** — Budget monitoring and alerts
|
||
|
||
### References
|
||
- `PROVIDERS.md` — Alternative AI providers, pricing, and routing strategies
|
||
|
||
### Assets (3 total)
|
||
- **`HEARTBEAT.template.md`** — Drop-in optimized heartbeat template with Haiku enforcement (ENHANCED!)
|
||
- **`cronjob-model-guide.md`** — Complete guide for choosing models in cronjobs (NEW!)
|
||
- **`config-patches.json`** — Advanced configuration examples
|
||
|
||
## Future Enhancements
|
||
|
||
Ideas for extending this skill:
|
||
1. **Auto-routing integration** — Hook into OpenClaw message pipeline
|
||
2. **Real-time usage tracking** — Parse session_status automatically
|
||
3. **Cost forecasting** — Predict monthly spend based on recent usage
|
||
4. **Provider health monitoring** — Track API latency and failures
|
||
5. **A/B testing** — Compare quality across different routing strategies
|