skills/robbyczgw-cla_web-search-plus

Fork 0

Files

zlei9 60e6461707 Initial commit with translated description

2026-03-29 13:18:55 +08:00

20 KiB

Raw Blame History

Changelog - Web Search Plus

[2.9.2] - 2026-03-27

Fixed

Replaced hardcoded temporary cache path examples with portable $TMP_DIR placeholders in TROUBLESHOOTING.md

[2.9.0] - 2026-03-12

✨ New Provider: Querit (Multilingual AI Search)

Querit.ai is a Singapore-based multilingual AI search API purpose-built for LLMs and RAG pipelines. 300 billion page index, 20+ countries, 10+ languages.

Added Querit as the 7th search provider via https://api.querit.ai/v1/search
Configure via QUERIT_API_KEY — optional, gracefully skipped if not set
Routing score: research * 0.65 + rag * 0.35 + recency * 0.45 — favored for multilingual and real-time queries
Handles Querit's quirky error_code=200 responses as success (not an error)
Handles IncompleteRead as transient/retryable failure
Live-tested with 10 benchmark queries ✅

🔧 Fixed: Fallback chain dies on unconfigured provider

sys.exit(1) in validate_api_key() raised SystemExit (inherits from BaseException), which bypassed the except Exception fallback loop and killed the entire process instead of trying the next provider
Replaced with catchable ProviderConfigError — fallback chain now continues correctly through all configured providers

🔧 Fixed: Perplexity citations are generic placeholders

Previously extracted citation URLs via regex from the answer text, resulting in generic "Source 1" / "Source 2" labels
Now uses the structured data["citations"] array from the Perplexity API response directly — results have readable titles
Regex extraction kept as fallback when API doesn't return a citations field

✨ Improved: German locale routing patterns

Added German-language signal patterns for local and news queries
Improves auto-routing for queries like "aktuelle Nachrichten", "beste Restaurants Graz", "KI Regulierung Europa"

📝 Documentation

Added Querit to README provider tables, routing examples, and API key setup section
Added querit_api_key to config.example.json
Updated SKILL.md provider mentions and env metadata
Bumped package version to 2.9.0

[2.8.6] - 2026-03-03

Changed

Documented Perplexity Sonar Pro usage and refreshed release docs.

[2.8.5] - 2026-02-20

✨ Feature: Perplexity freshness filter

Added freshness parameter to Perplexity provider (day, week, month, year)
Maps to Perplexity's native search_recency_filter parameter
Example: python3 scripts/search.py -p perplexity -q "latest AI news" --freshness day
Consistent with freshness support in Serper and Brave providers

[2.8.4] - 2026-02-20

🔒 Security Fix: SSRF protection in setup wizard

Fixed: setup.py SearXNG connection test had no SSRF protection (unlike search.py)
Before: Operator could be tricked into probing internal networks during setup
After: Same IP validation as search.py — blocks private IPs, cloud metadata, loopback
Credit: ClawHub security scanner

[2.8.3] - 2026-02-20

🐛 Critical Fix: Perplexity results empty

Fixed: Perplexity provider returned 0 results because the AI-synthesized answer wasn't mapped into the results array
Before: Only extracted URLs from the answer text were returned as results (often 0)
After: The full answer is now the primary result (title, snippet with cleaned text), extracted source URLs follow as additional results
Impact: Perplexity queries now always return at least 1 result with the synthesized answer

[2.8.0] - 2026-02-20

🆕 New Provider: Perplexity (AI-Synthesized Answers)

Added Perplexity as the 6th search provider via Kilo Gateway — the first provider that returns direct answers with citations instead of just links:

Features

AI-Synthesized Answers: Get a complete answer, not a list of links
Inline Citations: Every claim backed by [1][2][3] source references
Real-Time Web Search: Perplexity searches the web live, reads pages, and summarizes
Zero Extra Config: Works through Kilo Gateway with your existing KILOCODE_API_KEY
Model: perplexity/sonar-pro (best quality, supports complex queries)

Auto-Routing Signals

New direct-answer intent detection routes to Perplexity for:

Status queries: "status of", "current state of", "what is the status"
Local info: "events in [city]", "things to do in", "what's happening in"
Direct questions: "what is", "who is", "when did", "how many"
Current affairs: "this week", "this weekend", "right now", "today"

Usage Examples

# Auto-routed
python3 scripts/search.py -q "events in Graz Austria this weekend"  # → Perplexity
python3 scripts/search.py -q "what is the current status of Ethereum"  # → Perplexity

# Explicit
python3 scripts/search.py -p perplexity -q "latest AI regulation news"

Configuration

Requires KILOCODE_API_KEY environment variable (Kilo Gateway account). No additional API key needed — Perplexity is accessed through Kilo's unified API.

export KILOCODE_API_KEY="your-kilo-key"

🔧 Routing Rebalance

Major overhaul of the auto-routing confidence scoring to fix Serper dominance:

Problem

Serper (Google) was winning ~90% of queries due to:

High recency multiplier boosting Serper on any query with dates/years
Default provider priority placing Serper first in ties
Research and discovery signals not strong enough to override

Changes

Lowered Serper recency multiplier — date mentions no longer auto-route to Google
Strengthened research signals for Tavily:
- Added: "status of", "what happened with", "how does X compare"
- Boosted weights for comparison patterns (4.0 → 5.0)
Strengthened discovery signals for Exa:
- Added: "events in", "things to do in", "startups similar to"
- Boosted weights for local discovery patterns
Updated provider priority order: tavily → exa → perplexity → serper → you → searxng
- Serper moved from 1st to 4th in tie-breaking
- Research/discovery providers now win on ambiguous queries

Routing Test Results

Query	Before	After	✓
"latest OpenClaw version Feb 2026"	Serper	Serper	✅
"Ethereum Pectra upgrade status"	Serper	Tavily	✅
"events in Graz this weekend"	Serper	Perplexity	✅
"compare SearXNG vs Brave for AI agents"	Serper	Tavily	✅
"Sam Altman OpenAI news this week"	Serper	Serper	✅
"find startups similar to Kilo Code"	Serper	Exa	✅

📊 Updated Provider Comparison

Feature	Serper	Tavily	Exa	Perplexity	You.com	SearXNG
Speed	⚡⚡⚡	⚡⚡	⚡⚡	⚡⚡	⚡⚡⚡	⚡
Direct Answers	✗	✗	✗	✓✓	✗	✗
Citations	✗	✗	✗	✓	✗	✗
Local Events	✓	✗	✓	✓✓	✗	✓
Research	✗	✓✓	✓	✓	✓	✗
Discovery	✗	✗	✓✓	✗	✗	✗
Self-Hosted	✗	✗	✗	✗	✗	✓

[2.7.0] - 2026-02-14

✨ Added

Provider cooldown tracking in .cache/provider_health.json
Exponential cooldown on provider failures: 1m → 5m → 25m → 1h (cap)
Retry strategy for transient failures (timeout, 429, 503): up to 2 retries with backoff 1s → 3s → 9s
Smarter cache keys hashed from full request context (query/provider/max_results + locale, freshness, time_range, topic, search_engines, include_news, and related params)
Cross-provider result deduplication by normalized URL during fallback merge

🔧 Changed

Cooldown providers are skipped in routing while their cooldown is active
Provider health is reset automatically after successful requests
Fallback output now includes dedup metadata:
- deduplicated: true|false
- metadata.dedup_count

[2.6.5] - 2026-02-11

🆕 File-Based Result Caching

Added local caching to save API costs on repeated searches:

Features

Automatic Caching: Search results cached locally by default
1-Hour TTL: Results expire after 3600 seconds (configurable)
Cache Indicators: Response includes cached: true/false and cache_age_seconds
Zero-Cost Repeats: Cached requests don't hit APIs

New CLI Options

--cache-ttl SECONDS — Custom cache TTL (default: 3600)
--no-cache — Bypass cache, always fetch fresh
--clear-cache — Delete all cached results
--cache-stats — Show cache statistics (entries, size, age)

Configuration

Cache directory: .cache/ in skill directory
Environment variable: WSP_CACHE_DIR to override location
Cache key: Based on query + provider + max_results (SHA256)

Usage Examples

# First request costs API credits
python3 scripts/search.py -q "AI startups"

# Second request is FREE (uses cache)
python3 scripts/search.py -q "AI startups"

# Force fresh results
python3 scripts/search.py -q "AI startups" --no-cache

# View stats
python3 scripts/search.py --cache-stats

# Clear everything
python3 scripts/search.py --clear-cache

Technical Details

Cache files: JSON with metadata (_cache_timestamp, _cache_key, etc.)
Automatic cleanup of expired entries on access
Graceful handling of corrupted cache files

[2.6.1] - 2026-02-04

Privacy cleanup: removed hardcoded paths and personal info from docs

[2.5.0] - 2026-02-03

🆕 New Provider: SearXNG (Privacy-First Meta-Search)

Added SearXNG as the 5th search provider, focused on privacy and self-hosted search:

Features

Privacy-Preserving: No tracking, no profiling — your searches stay private
Multi-Source Aggregation: Queries 70+ upstream engines (Google, Bing, DuckDuckGo, etc.)
$0 API Cost: Self-hosted = unlimited queries with no API fees
Diverse Results: Get perspectives from multiple search engines in one query
Customizable: Choose which engines to use, set SafeSearch levels, language preferences

Auto-Routing Signals

New privacy/multi-source intent detection routes to SearXNG for:

Privacy queries: "private", "anonymous", "without tracking", "no tracking"
Multi-source: "aggregate results", "multiple sources", "diverse perspectives"
Budget/free: "free search", "no api cost", "self-hosted search"
German: "privat", "anonym", "ohne tracking", "verschiedene quellen"

Usage Examples

# Auto-routed
python3 scripts/search.py -q "search privately without tracking"  # → SearXNG

# Explicit
python3 scripts/search.py -p searxng -q "linux distros"
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2

Configuration

{
  "searxng": {
    "instance_url": "https://your-instance.example.com",
    "safesearch": 0,
    "engines": null,
    "language": "en"
  }
}

Setup

SearXNG requires a self-hosted instance with JSON format enabled:

# Docker setup (5 minutes)
docker run -d -p 8080:8080 searxng/searxng

# Enable JSON in settings.yml:
# search:
#   formats: [html, json]

# Set instance URL
export SEARXNG_INSTANCE_URL="http://localhost:8080"

See: https://docs.searxng.org/admin/installation.html

📊 Updated Provider Comparison

Feature	Serper	Tavily	Exa	You.com	SearXNG
Privacy-First	✗	✗	✗	✗	✓✓
Self-Hosted	✗	✗	✗	✗	✓
API Cost	$$	$$	$$	$	FREE
Multi-Engine	✗	✗	✗	✗	✓ (70+)

🔧 Technical Changes

Added search_searxng() function with full error handling
Added PRIVACY_SIGNALS to QueryAnalyzer for auto-routing
Updated setup wizard with SearXNG option (instance URL validation)
Updated config.example.json with searxng section
New CLI args: --searxng-url, --searxng-safesearch, --engines, --categories

[2.4.4] - 2026-02-03

📝 Documentation: Provider Count Fix

Fixed: "You can use 1, 2, or all 3" → "1, 2, 3, or all 4" (we have 4 providers now!)
Impact: Accurate documentation for setup wizard

[2.4.3] - 2026-02-03

📝 Documentation: Updated README

Added: "NEW in v2.4.2" badge for You.com in SKILL.md
Impact: ClawHub README now properly highlights You.com as new feature

[2.4.2] - 2026-02-03

🐛 Critical Fix: You.com API Configuration

Fixed: Incorrect hostname (api.ydc-index.io → ydc-index.io)
Fixed: Incorrect header name (X-API-Key → X-API-KEY uppercase)
Impact: You.com now works correctly - was giving 403 Forbidden before
Status: ✅ Fully tested and working

[2.4.1] - 2026-02-03

🐛 Bugfix: You.com URL Encoding

Fixed: URL encoding for You.com queries - spaces and special characters now properly encoded
Impact: Queries with spaces (e.g., "OpenClaw AI framework") work correctly now
Technical: Added urllib.parse.quote for parameter encoding

[2.4.0] - 2026-02-03

🆕 New Provider: You.com

Added You.com as the 4th search provider, optimized for RAG applications and real-time information:

Features

LLM-Ready Snippets: Pre-extracted, query-aware text excerpts perfect for feeding into AI models
Unified Web + News: Get both web pages and news articles in a single API call
Live Crawling: Fetch full page content on-demand in Markdown format (--livecrawl)
Automatic News Classification: Intelligently includes news results based on query intent
Freshness Controls: Filter by recency (day, week, month, year, or date range)
SafeSearch Support: Content filtering (off, moderate, strict)

Auto-Routing Signals

New RAG/Real-time intent detection routes to You.com for:

RAG context queries: "summarize", "key points", "tldr", "context for"
Real-time info: "latest news", "current status", "right now", "what's happening"
Information synthesis: "updates on", "situation", "main takeaways"

Usage Examples

# Auto-routed
python3 scripts/search.py -q "summarize key points about AI regulation"  # → You.com

# Explicit
python3 scripts/search.py -p you -q "climate change" --livecrawl all
python3 scripts/search.py -p you -q "tech news" --freshness week

Configuration

{
  "you": {
    "country": "US",
    "language": "en",
    "safesearch": "moderate",
    "include_news": true
  }
}

API Key Setup

export YOU_API_KEY="your-key"  # Get from https://api.you.com

📊 Updated Provider Comparison

Feature	Serper	Tavily	Exa	You.com
Speed	⚡⚡⚡	⚡⚡	⚡⚡	⚡⚡⚡
News Integration	✓	✗	✗	✓
RAG-Optimized	✗	✓	✗	✓✓
Full Page Content	✗	✓	✓	✓

[2.1.5] - 2026-01-27

📝 Documentation

Added warning about NOT using Tavily/Serper/Exa in core OpenClaw config
Core OpenClaw only supports brave as the built-in provider
This skill's providers must be used via environment variables and scripts, not openclaw.json

[2.1.0] - 2026-01-23

🧠 Intelligent Multi-Signal Routing

Completely overhauled auto-routing with sophisticated query analysis:

Intent Classification

Shopping Intent: Detects price patterns ("how much", "cost of"), purchase signals ("buy", "order"), deal keywords, and product+brand combinations
Research Intent: Identifies explanation patterns ("how does", "why does"), analysis signals ("pros and cons", "compare"), learning keywords, and complex multi-clause queries
Discovery Intent: Recognizes similarity patterns ("similar to", "alternatives"), company discovery signals, URL/domain detection, and academic patterns

Linguistic Pattern Detection

"How much" / "price of" → Shopping (Serper)
"How does" / "Why does" / "Explain" → Research (Tavily)
"Companies like" / "Similar to" / "Alternatives" → Discovery (Exa)
Product + Brand name combos → Shopping (Serper)
URLs and domains in query → Similar search (Exa)

Query Analysis Features

Complexity scoring: Long, multi-clause queries get routed to research providers
URL detection: Automatic detection of URLs/domains triggers Exa similar search
Brand recognition: Tech brands (Apple, Samsung, Sony, etc.) with product terms → shopping
Recency signals: "latest", "2026", "breaking" boost news mode

Confidence Scoring

HIGH (70-100%): Strong signal match, very reliable routing
MEDIUM (40-69%): Good match, should work well
LOW (0-39%): Ambiguous query, using fallback provider
Confidence based on absolute signal strength + relative margin over alternatives

Enhanced Debug Mode

python3 scripts/search.py --explain-routing -q "your query"

Now shows:

Routing decision with confidence level
All provider scores
Top matched signals with weights
Query analysis (complexity, URL detection, recency focus)
All matched patterns per provider

🔧 Technical Changes

QueryAnalyzer Class

New QueryAnalyzer class with:

SHOPPING_SIGNALS: 25+ weighted patterns for shopping intent
RESEARCH_SIGNALS: 30+ weighted patterns for research intent
DISCOVERY_SIGNALS: 20+ weighted patterns for discovery intent
LOCAL_NEWS_SIGNALS: 25+ patterns for local/news queries
BRAND_PATTERNS: Tech brand detection regex

Signal Weighting

Multi-word phrases get higher weights (e.g., "how much" = 4.0 vs "price" = 3.0)
Strong signals: price patterns (4.0), similarity patterns (5.0), URLs (5.0)
Medium signals: product terms (2.5), learning keywords (2.5)
Bonus scoring: Product+brand combo (+3.0), complex query (+2.5)

Improved Output Format

{
  "routing": {
    "auto_routed": true,
    "provider": "serper",
    "confidence": 0.78,
    "confidence_level": "high",
    "reason": "high_confidence_match",
    "top_signals": [{"matched": "price", "weight": 3.0}],
    "scores": {"serper": 7.0, "tavily": 0.0, "exa": 0.0}
  }
}

📚 Documentation Updates

SKILL.md: Complete rewrite with signal tables and confidence scoring guide
README.md: Updated with intelligent routing examples and confidence levels
FAQ: Updated to explain multi-signal analysis

🧪 Test Results

Query	Provider	Confidence	Signals
"how much does iPhone 16 cost"	Serper	68%	"how much", brand+product
"how does quantum entanglement work"	Tavily	86% HIGH	"how does", "what are", "implications"
"startups similar to Notion"	Exa	76% HIGH	"similar to", "Series A"
"companies like stripe.com"	Exa	100% HIGH	URL detected, "companies like"
"MacBook Pro M3 specs review"	Serper	70% HIGH	brand+product, "specs", "review"
"Tesla"	Serper	0% LOW	No signals (fallback)
"arxiv papers on transformers"	Exa	58%	"arxiv"
"latest AI news 2026"	Serper	77% HIGH	"latest", "news", "2026"

[2.0.0] - 2026-01-23

🎉 Major Features

Smart Auto-Routing

Automatic provider selection based on query analysis
No need to manually choose provider - just search!
Intelligent keyword matching for routing decisions
Pattern detection for query types (shopping, research, discovery)
Scoring system for provider selection

User Configuration

config.json: Full control over auto-routing behavior
Configurable keyword mappings: Add your own routing keywords
Provider priority: Set tie-breaker order
Disable providers: Turn off providers you don't have API keys for
Enable/disable auto-routing: Opt-in or opt-out as needed

Debugging Tools

--explain-routing flag: See exactly why a provider was selected
Detailed routing metadata in JSON responses
Shows matched keywords and routing scores

📚 Documentation

README.md: Complete auto-routing guide with examples
SKILL.md: Detailed routing logic and configuration reference
FAQ section: Common questions about auto-routing
Configuration examples: Pre-built configs for common use cases

[1.0.x] - Initial Release

Multi-provider search: Serper, Tavily, Exa
Manual provider selection with -p flag
Unified JSON output format
Provider-specific options (--depth, --category, --similar-url, etc.)
Domain filtering for Tavily/Exa
Date filtering for Exa

20 KiB Raw Blame History

Changelog - Web Search Plus

[2.9.2] - 2026-03-27

Fixed

[2.9.0] - 2026-03-12

✨ New Provider: Querit (Multilingual AI Search)

🔧 Fixed: Fallback chain dies on unconfigured provider

🔧 Fixed: Perplexity citations are generic placeholders

✨ Improved: German locale routing patterns

📝 Documentation

[2.8.6] - 2026-03-03

Changed

[2.8.5] - 2026-02-20

✨ Feature: Perplexity freshness filter

[2.8.4] - 2026-02-20

🔒 Security Fix: SSRF protection in setup wizard

[2.8.3] - 2026-02-20

🐛 Critical Fix: Perplexity results empty

[2.8.0] - 2026-02-20

🆕 New Provider: Perplexity (AI-Synthesized Answers)

Features

Auto-Routing Signals

Usage Examples

Configuration

🔧 Routing Rebalance

Problem

Changes

Routing Test Results

📊 Updated Provider Comparison

[2.7.0] - 2026-02-14

✨ Added

🔧 Changed

[2.6.5] - 2026-02-11

🆕 File-Based Result Caching

Features

New CLI Options

Configuration

Usage Examples

Technical Details

[2.6.1] - 2026-02-04

[2.5.0] - 2026-02-03

🆕 New Provider: SearXNG (Privacy-First Meta-Search)

Features

Auto-Routing Signals

Usage Examples

Configuration

Setup

📊 Updated Provider Comparison

🔧 Technical Changes

[2.4.4] - 2026-02-03

📝 Documentation: Provider Count Fix

[2.4.3] - 2026-02-03

📝 Documentation: Updated README

[2.4.2] - 2026-02-03

🐛 Critical Fix: You.com API Configuration

[2.4.1] - 2026-02-03

🐛 Bugfix: You.com URL Encoding

[2.4.0] - 2026-02-03

🆕 New Provider: You.com

Features

Auto-Routing Signals

Usage Examples

Configuration

API Key Setup

📊 Updated Provider Comparison

[2.1.5] - 2026-01-27

📝 Documentation

[2.1.0] - 2026-01-23

🧠 Intelligent Multi-Signal Routing

Intent Classification

Linguistic Pattern Detection

Query Analysis Features

Confidence Scoring

Enhanced Debug Mode

🔧 Technical Changes

QueryAnalyzer Class

Signal Weighting

Improved Output Format

📚 Documentation Updates

🧪 Test Results

20 KiB

Raw Blame History