Initial commit with translated description

This commit is contained in:
2026-03-29 13:18:55 +08:00
commit 60e6461707
11 changed files with 5944 additions and 0 deletions

536
CHANGELOG.md Normal file
View File

@@ -0,0 +1,536 @@
# Changelog - Web Search Plus
## [2.9.2] - 2026-03-27
### Fixed
- Replaced hardcoded temporary cache path examples with portable `$TMP_DIR` placeholders in `TROUBLESHOOTING.md`
## [2.9.0] - 2026-03-12
### ✨ New Provider: Querit (Multilingual AI Search)
[Querit.ai](https://querit.ai) is a Singapore-based multilingual AI search API purpose-built for LLMs and RAG pipelines. 300 billion page index, 20+ countries, 10+ languages.
- Added **Querit** as the 7th search provider via `https://api.querit.ai/v1/search`
- Configure via `QUERIT_API_KEY` — optional, gracefully skipped if not set
- Routing score: `research * 0.65 + rag * 0.35 + recency * 0.45` — favored for multilingual and real-time queries
- Handles Querit's quirky `error_code=200` responses as success (not an error)
- Handles `IncompleteRead` as transient/retryable failure
- Live-tested with 10 benchmark queries ✅
### 🔧 Fixed: Fallback chain dies on unconfigured provider
- `sys.exit(1)` in `validate_api_key()` raised `SystemExit` (inherits from `BaseException`), which bypassed the `except Exception` fallback loop and killed the entire process instead of trying the next provider
- Replaced with catchable `ProviderConfigError` — fallback chain now continues correctly through all configured providers
### 🔧 Fixed: Perplexity citations are generic placeholders
- Previously extracted citation URLs via regex from the answer text, resulting in generic "Source 1" / "Source 2" labels
- Now uses the structured `data["citations"]` array from the Perplexity API response directly — results have readable titles
- Regex extraction kept as fallback when API doesn't return a `citations` field
### ✨ Improved: German locale routing patterns
- Added German-language signal patterns for local and news queries
- Improves auto-routing for queries like `"aktuelle Nachrichten"`, `"beste Restaurants Graz"`, `"KI Regulierung Europa"`
### 📝 Documentation
- Added Querit to README provider tables, routing examples, and API key setup section
- Added `querit_api_key` to `config.example.json`
- Updated `SKILL.md` provider mentions and env metadata
- Bumped package version to `2.9.0`
## [2.8.6] - 2026-03-03
### Changed
- Documented Perplexity Sonar Pro usage and refreshed release docs.
## [2.8.5] - 2026-02-20
### ✨ Feature: Perplexity freshness filter
- Added `freshness` parameter to Perplexity provider (`day`, `week`, `month`, `year`)
- Maps to Perplexity's native `search_recency_filter` parameter
- Example: `python3 scripts/search.py -p perplexity -q "latest AI news" --freshness day`
- Consistent with freshness support in Serper and Brave providers
## [2.8.4] - 2026-02-20
### 🔒 Security Fix: SSRF protection in setup wizard
- **Fixed:** `setup.py` SearXNG connection test had no SSRF protection (unlike `search.py`)
- **Before:** Operator could be tricked into probing internal networks during setup
- **After:** Same IP validation as `search.py` — blocks private IPs, cloud metadata, loopback
- **Credit:** ClawHub security scanner
## [2.8.3] - 2026-02-20
### 🐛 Critical Fix: Perplexity results empty
- **Fixed:** Perplexity provider returned 0 results because the AI-synthesized answer wasn't mapped into the results array
- **Before:** Only extracted URLs from the answer text were returned as results (often 0)
- **After:** The full answer is now the primary result (title, snippet with cleaned text), extracted source URLs follow as additional results
- **Impact:** Perplexity queries now always return at least 1 result with the synthesized answer
## [2.8.0] - 2026-02-20
### 🆕 New Provider: Perplexity (AI-Synthesized Answers)
Added Perplexity as the 6th search provider via Kilo Gateway — the first provider that returns **direct answers with citations** instead of just links:
#### Features
- **AI-Synthesized Answers**: Get a complete answer, not a list of links
- **Inline Citations**: Every claim backed by `[1][2][3]` source references
- **Real-Time Web Search**: Perplexity searches the web live, reads pages, and summarizes
- **Zero Extra Config**: Works through Kilo Gateway with your existing `KILOCODE_API_KEY`
- **Model**: `perplexity/sonar-pro` (best quality, supports complex queries)
#### Auto-Routing Signals
New direct-answer intent detection routes to Perplexity for:
- Status queries: "status of", "current state of", "what is the status"
- Local info: "events in [city]", "things to do in", "what's happening in"
- Direct questions: "what is", "who is", "when did", "how many"
- Current affairs: "this week", "this weekend", "right now", "today"
#### Usage Examples
```bash
# Auto-routed
python3 scripts/search.py -q "events in Graz Austria this weekend" # → Perplexity
python3 scripts/search.py -q "what is the current status of Ethereum" # → Perplexity
# Explicit
python3 scripts/search.py -p perplexity -q "latest AI regulation news"
```
#### Configuration
Requires `KILOCODE_API_KEY` environment variable (Kilo Gateway account).
No additional API key needed — Perplexity is accessed through Kilo's unified API.
```bash
export KILOCODE_API_KEY="your-kilo-key"
```
### 🔧 Routing Rebalance
Major overhaul of the auto-routing confidence scoring to fix Serper dominance:
#### Problem
Serper (Google) was winning ~90% of queries due to:
- High recency multiplier boosting Serper on any query with dates/years
- Default provider priority placing Serper first in ties
- Research and discovery signals not strong enough to override
#### Changes
- **Lowered Serper recency multiplier** — date mentions no longer auto-route to Google
- **Strengthened research signals** for Tavily:
- Added: "status of", "what happened with", "how does X compare"
- Boosted weights for comparison patterns (4.0 → 5.0)
- **Strengthened discovery signals** for Exa:
- Added: "events in", "things to do in", "startups similar to"
- Boosted weights for local discovery patterns
- **Updated provider priority order**: `tavily → exa → perplexity → serper → you → searxng`
- Serper moved from 1st to 4th in tie-breaking
- Research/discovery providers now win on ambiguous queries
#### Routing Test Results
| Query | Before | After | ✓ |
|-------|--------|-------|---|
| "latest OpenClaw version Feb 2026" | Serper | Serper | ✅ |
| "Ethereum Pectra upgrade status" | Serper | **Tavily** | ✅ |
| "events in Graz this weekend" | Serper | **Perplexity** | ✅ |
| "compare SearXNG vs Brave for AI agents" | Serper | **Tavily** | ✅ |
| "Sam Altman OpenAI news this week" | Serper | Serper | ✅ |
| "find startups similar to Kilo Code" | Serper | **Exa** | ✅ |
### 📊 Updated Provider Comparison
| Feature | Serper | Tavily | Exa | Perplexity | You.com | SearXNG |
|---------|:------:|:------:|:---:|:----------:|:-------:|:-------:|
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡ |
| Direct Answers | ✗ | ✗ | ✗ | ✓✓ | ✗ | ✗ |
| Citations | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| Local Events | ✓ | ✗ | ✓ | ✓✓ | ✗ | ✓ |
| Research | ✗ | ✓✓ | ✓ | ✓ | ✓ | ✗ |
| Discovery | ✗ | ✗ | ✓✓ | ✗ | ✗ | ✗ |
| Self-Hosted | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
## [2.7.0] - 2026-02-14
### ✨ Added
- Provider cooldown tracking in `.cache/provider_health.json`
- Exponential cooldown on provider failures: **1m → 5m → 25m → 1h (cap)**
- Retry strategy for transient failures (timeout, 429, 503): up to 2 retries with backoff **1s → 3s → 9s**
- Smarter cache keys hashed from full request context (query/provider/max_results + locale, freshness, time_range, topic, search_engines, include_news, and related params)
- Cross-provider result deduplication by normalized URL during fallback merge
### 🔧 Changed
- Cooldown providers are skipped in routing while their cooldown is active
- Provider health is reset automatically after successful requests
- Fallback output now includes dedup metadata:
- `deduplicated: true|false`
- `metadata.dedup_count`
## [2.6.5] - 2026-02-11
### 🆕 File-Based Result Caching
Added local caching to save API costs on repeated searches:
#### Features
- **Automatic Caching**: Search results cached locally by default
- **1-Hour TTL**: Results expire after 3600 seconds (configurable)
- **Cache Indicators**: Response includes `cached: true/false` and `cache_age_seconds`
- **Zero-Cost Repeats**: Cached requests don't hit APIs
#### New CLI Options
- `--cache-ttl SECONDS` — Custom cache TTL (default: 3600)
- `--no-cache` — Bypass cache, always fetch fresh
- `--clear-cache` — Delete all cached results
- `--cache-stats` — Show cache statistics (entries, size, age)
#### Configuration
- **Cache directory**: `.cache/` in skill directory
- **Environment variable**: `WSP_CACHE_DIR` to override location
- **Cache key**: Based on query + provider + max_results (SHA256)
#### Usage Examples
```bash
# First request costs API credits
python3 scripts/search.py -q "AI startups"
# Second request is FREE (uses cache)
python3 scripts/search.py -q "AI startups"
# Force fresh results
python3 scripts/search.py -q "AI startups" --no-cache
# View stats
python3 scripts/search.py --cache-stats
# Clear everything
python3 scripts/search.py --clear-cache
```
#### Technical Details
- Cache files: JSON with metadata (_cache_timestamp, _cache_key, etc.)
- Automatic cleanup of expired entries on access
- Graceful handling of corrupted cache files
## [2.6.1] - 2026-02-04
- Privacy cleanup: removed hardcoded paths and personal info from docs
## [2.5.0] - 2026-02-03
### 🆕 New Provider: SearXNG (Privacy-First Meta-Search)
Added SearXNG as the 5th search provider, focused on privacy and self-hosted search:
#### Features
- **Privacy-Preserving**: No tracking, no profiling — your searches stay private
- **Multi-Source Aggregation**: Queries 70+ upstream engines (Google, Bing, DuckDuckGo, etc.)
- **$0 API Cost**: Self-hosted = unlimited queries with no API fees
- **Diverse Results**: Get perspectives from multiple search engines in one query
- **Customizable**: Choose which engines to use, set SafeSearch levels, language preferences
#### Auto-Routing Signals
New privacy/multi-source intent detection routes to SearXNG for:
- Privacy queries: "private", "anonymous", "without tracking", "no tracking"
- Multi-source: "aggregate results", "multiple sources", "diverse perspectives"
- Budget/free: "free search", "no api cost", "self-hosted search"
- German: "privat", "anonym", "ohne tracking", "verschiedene quellen"
#### Usage Examples
```bash
# Auto-routed
python3 scripts/search.py -q "search privately without tracking" # → SearXNG
# Explicit
python3 scripts/search.py -p searxng -q "linux distros"
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2
```
#### Configuration
```json
{
"searxng": {
"instance_url": "https://your-instance.example.com",
"safesearch": 0,
"engines": null,
"language": "en"
}
}
```
#### Setup
SearXNG requires a self-hosted instance with JSON format enabled:
```bash
# Docker setup (5 minutes)
docker run -d -p 8080:8080 searxng/searxng
# Enable JSON in settings.yml:
# search:
# formats: [html, json]
# Set instance URL
export SEARXNG_INSTANCE_URL="http://localhost:8080"
```
See: https://docs.searxng.org/admin/installation.html
### 📊 Updated Provider Comparison
| Feature | Serper | Tavily | Exa | You.com | SearXNG |
|---------|:------:|:------:|:---:|:-------:|:-------:|
| Privacy-First | ✗ | ✗ | ✗ | ✗ | ✓✓ |
| Self-Hosted | ✗ | ✗ | ✗ | ✗ | ✓ |
| API Cost | $$ | $$ | $$ | $ | **FREE** |
| Multi-Engine | ✗ | ✗ | ✗ | ✗ | ✓ (70+) |
### 🔧 Technical Changes
- Added `search_searxng()` function with full error handling
- Added `PRIVACY_SIGNALS` to QueryAnalyzer for auto-routing
- Updated setup wizard with SearXNG option (instance URL validation)
- Updated config.example.json with searxng section
- New CLI args: `--searxng-url`, `--searxng-safesearch`, `--engines`, `--categories`
---
## [2.4.4] - 2026-02-03
### 📝 Documentation: Provider Count Fix
- **Fixed:** "You can use 1, 2, or all 3" → "1, 2, 3, or all 4" (we have 4 providers now!)
- **Impact:** Accurate documentation for setup wizard
## [2.4.3] - 2026-02-03
### 📝 Documentation: Updated README
- **Added:** "NEW in v2.4.2" badge for You.com in SKILL.md
- **Impact:** ClawHub README now properly highlights You.com as new feature
## [2.4.2] - 2026-02-03
### 🐛 Critical Fix: You.com API Configuration
- **Fixed:** Incorrect hostname (`api.ydc-index.io``ydc-index.io`)
- **Fixed:** Incorrect header name (`X-API-Key``X-API-KEY` uppercase)
- **Impact:** You.com now works correctly - was giving 403 Forbidden before
- **Status:** ✅ Fully tested and working
## [2.4.1] - 2026-02-03
### 🐛 Bugfix: You.com URL Encoding
- **Fixed:** URL encoding for You.com queries - spaces and special characters now properly encoded
- **Impact:** Queries with spaces (e.g., "OpenClaw AI framework") work correctly now
- **Technical:** Added `urllib.parse.quote` for parameter encoding
## [2.4.0] - 2026-02-03
### 🆕 New Provider: You.com
Added You.com as the 4th search provider, optimized for RAG applications and real-time information:
#### Features
- **LLM-Ready Snippets**: Pre-extracted, query-aware text excerpts perfect for feeding into AI models
- **Unified Web + News**: Get both web pages and news articles in a single API call
- **Live Crawling**: Fetch full page content on-demand in Markdown format (`--livecrawl`)
- **Automatic News Classification**: Intelligently includes news results based on query intent
- **Freshness Controls**: Filter by recency (day, week, month, year, or date range)
- **SafeSearch Support**: Content filtering (off, moderate, strict)
#### Auto-Routing Signals
New RAG/Real-time intent detection routes to You.com for:
- RAG context queries: "summarize", "key points", "tldr", "context for"
- Real-time info: "latest news", "current status", "right now", "what's happening"
- Information synthesis: "updates on", "situation", "main takeaways"
#### Usage Examples
```bash
# Auto-routed
python3 scripts/search.py -q "summarize key points about AI regulation" # → You.com
# Explicit
python3 scripts/search.py -p you -q "climate change" --livecrawl all
python3 scripts/search.py -p you -q "tech news" --freshness week
```
#### Configuration
```json
{
"you": {
"country": "US",
"language": "en",
"safesearch": "moderate",
"include_news": true
}
}
```
#### API Key Setup
```bash
export YOU_API_KEY="your-key" # Get from https://api.you.com
```
### 📊 Updated Provider Comparison
| Feature | Serper | Tavily | Exa | You.com |
|---------|:------:|:------:|:---:|:-------:|
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ |
| News Integration | ✓ | ✗ | ✗ | ✓ |
| RAG-Optimized | ✗ | ✓ | ✗ | ✓✓ |
| Full Page Content | ✗ | ✓ | ✓ | ✓ |
---
## [2.1.5] - 2026-01-27
### 📝 Documentation
- Added warning about NOT using Tavily/Serper/Exa in core OpenClaw config
- Core OpenClaw only supports `brave` as the built-in provider
- This skill's providers must be used via environment variables and scripts, not `openclaw.json`
## [2.1.0] - 2026-01-23
### 🧠 Intelligent Multi-Signal Routing
Completely overhauled auto-routing with sophisticated query analysis:
#### Intent Classification
- **Shopping Intent**: Detects price patterns ("how much", "cost of"), purchase signals ("buy", "order"), deal keywords, and product+brand combinations
- **Research Intent**: Identifies explanation patterns ("how does", "why does"), analysis signals ("pros and cons", "compare"), learning keywords, and complex multi-clause queries
- **Discovery Intent**: Recognizes similarity patterns ("similar to", "alternatives"), company discovery signals, URL/domain detection, and academic patterns
#### Linguistic Pattern Detection
- "How much" / "price of" → Shopping (Serper)
- "How does" / "Why does" / "Explain" → Research (Tavily)
- "Companies like" / "Similar to" / "Alternatives" → Discovery (Exa)
- Product + Brand name combos → Shopping (Serper)
- URLs and domains in query → Similar search (Exa)
#### Query Analysis Features
- **Complexity scoring**: Long, multi-clause queries get routed to research providers
- **URL detection**: Automatic detection of URLs/domains triggers Exa similar search
- **Brand recognition**: Tech brands (Apple, Samsung, Sony, etc.) with product terms → shopping
- **Recency signals**: "latest", "2026", "breaking" boost news mode
#### Confidence Scoring
- **HIGH (70-100%)**: Strong signal match, very reliable routing
- **MEDIUM (40-69%)**: Good match, should work well
- **LOW (0-39%)**: Ambiguous query, using fallback provider
- Confidence based on absolute signal strength + relative margin over alternatives
#### Enhanced Debug Mode
```bash
python3 scripts/search.py --explain-routing -q "your query"
```
Now shows:
- Routing decision with confidence level
- All provider scores
- Top matched signals with weights
- Query analysis (complexity, URL detection, recency focus)
- All matched patterns per provider
### 🔧 Technical Changes
#### QueryAnalyzer Class
New `QueryAnalyzer` class with:
- `SHOPPING_SIGNALS`: 25+ weighted patterns for shopping intent
- `RESEARCH_SIGNALS`: 30+ weighted patterns for research intent
- `DISCOVERY_SIGNALS`: 20+ weighted patterns for discovery intent
- `LOCAL_NEWS_SIGNALS`: 25+ patterns for local/news queries
- `BRAND_PATTERNS`: Tech brand detection regex
#### Signal Weighting
- Multi-word phrases get higher weights (e.g., "how much" = 4.0 vs "price" = 3.0)
- Strong signals: price patterns (4.0), similarity patterns (5.0), URLs (5.0)
- Medium signals: product terms (2.5), learning keywords (2.5)
- Bonus scoring: Product+brand combo (+3.0), complex query (+2.5)
#### Improved Output Format
```json
{
"routing": {
"auto_routed": true,
"provider": "serper",
"confidence": 0.78,
"confidence_level": "high",
"reason": "high_confidence_match",
"top_signals": [{"matched": "price", "weight": 3.0}],
"scores": {"serper": 7.0, "tavily": 0.0, "exa": 0.0}
}
}
```
### 📚 Documentation Updates
- **SKILL.md**: Complete rewrite with signal tables and confidence scoring guide
- **README.md**: Updated with intelligent routing examples and confidence levels
- **FAQ**: Updated to explain multi-signal analysis
### 🧪 Test Results
| Query | Provider | Confidence | Signals |
|-------|----------|------------|---------|
| "how much does iPhone 16 cost" | Serper | 68% | "how much", brand+product |
| "how does quantum entanglement work" | Tavily | 86% HIGH | "how does", "what are", "implications" |
| "startups similar to Notion" | Exa | 76% HIGH | "similar to", "Series A" |
| "companies like stripe.com" | Exa | 100% HIGH | URL detected, "companies like" |
| "MacBook Pro M3 specs review" | Serper | 70% HIGH | brand+product, "specs", "review" |
| "Tesla" | Serper | 0% LOW | No signals (fallback) |
| "arxiv papers on transformers" | Exa | 58% | "arxiv" |
| "latest AI news 2026" | Serper | 77% HIGH | "latest", "news", "2026" |
---
## [2.0.0] - 2026-01-23
### 🎉 Major Features
#### Smart Auto-Routing
- **Automatic provider selection** based on query analysis
- No need to manually choose provider - just search!
- Intelligent keyword matching for routing decisions
- Pattern detection for query types (shopping, research, discovery)
- Scoring system for provider selection
#### User Configuration
- **config.json**: Full control over auto-routing behavior
- **Configurable keyword mappings**: Add your own routing keywords
- **Provider priority**: Set tie-breaker order
- **Disable providers**: Turn off providers you don't have API keys for
- **Enable/disable auto-routing**: Opt-in or opt-out as needed
#### Debugging Tools
- **--explain-routing** flag: See exactly why a provider was selected
- Detailed routing metadata in JSON responses
- Shows matched keywords and routing scores
### 📚 Documentation
- **README.md**: Complete auto-routing guide with examples
- **SKILL.md**: Detailed routing logic and configuration reference
- **FAQ section**: Common questions about auto-routing
- **Configuration examples**: Pre-built configs for common use cases
---
## [1.0.x] - Initial Release
- Multi-provider search: Serper, Tavily, Exa
- Manual provider selection with `-p` flag
- Unified JSON output format
- Provider-specific options (--depth, --category, --similar-url, etc.)
- Domain filtering for Tavily/Exa
- Date filtering for Exa

263
FAQ.md Normal file
View File

@@ -0,0 +1,263 @@
# Frequently Asked Questions
## Caching (NEW in v2.7.0!)
### How does caching work?
Search results are automatically cached locally for 1 hour (3600 seconds). When you make the same query again, you get instant results at $0 API cost. The cache key is based on: query text + provider + max_results.
### Where are cached results stored?
In `.cache/` directory inside the skill folder by default. Override with `WSP_CACHE_DIR` environment variable:
```bash
export WSP_CACHE_DIR="/path/to/custom/cache"
```
### How do I see cache stats?
```bash
python3 scripts/search.py --cache-stats
```
This shows total entries, size, oldest/newest entries, and breakdown by provider.
### How do I clear the cache?
```bash
python3 scripts/search.py --clear-cache
```
### Can I change the cache TTL?
Yes! Default is 3600 seconds (1 hour). Set a custom TTL per request:
```bash
python3 scripts/search.py -q "query" --cache-ttl 7200 # 2 hours
```
### How do I skip the cache?
Use `--no-cache` to always fetch fresh results:
```bash
python3 scripts/search.py -q "query" --no-cache
```
### How do I know if a result was cached?
The response includes:
- `"cached": true/false` — whether result came from cache
- `"cache_age_seconds": 1234` — how old the cached result is (when cached)
---
## General
### How does auto-routing decide which provider to use?
Multi-signal analysis scores each provider based on: price patterns, explanation phrases, similarity keywords, URLs, product+brand combos, and query complexity. Highest score wins. Use `--explain-routing` to see the decision breakdown.
### What if it picks the wrong provider?
Override with `-p serper/tavily/exa`. Check `--explain-routing` to understand why it chose differently.
### What does "low confidence" mean?
Query is ambiguous (e.g., "Tesla" could be cars, stock, or company). Falls back to Serper. Results may vary.
### Can I disable a provider?
Yes! In config.json: `"disabled_providers": ["exa"]`
---
## API Keys
### Which API keys do I need?
At minimum ONE key (or SearXNG instance). You can use just Serper, just Tavily, just Exa, just You.com, or just SearXNG. Missing keys = that provider is skipped.
### Where do I get API keys?
- Serper: https://serper.dev (2,500 free queries, no credit card)
- Tavily: https://tavily.com (1,000 free searches/month)
- Exa: https://exa.ai (1,000 free searches/month)
- You.com: https://api.you.com (Limited free tier for testing)
- SearXNG: Self-hosted, no key needed! https://docs.searxng.org/admin/installation.html
### How do I set API keys?
Two options (both auto-load):
**Option A: .env file**
```bash
export SERPER_API_KEY="your-key"
```
**Option B: config.json** (v2.2.1+)
```json
{ "serper": { "api_key": "your-key" } }
```
---
## Routing Details
### How do I know which provider handled my search?
Check `routing.provider` in JSON output, or `[🔍 Searched with: Provider]` in chat responses.
### Why does it sometimes choose Serper for research questions?
If the query has brand/product signals (e.g., "how does Tesla FSD work"), shopping intent may outweigh research intent. Override with `-p tavily`.
### What's the confidence threshold?
Default: 0.3 (30%). Below this = low confidence, uses fallback. Adjustable in config.json.
---
## You.com Specific
### When should I use You.com over other providers?
You.com excels at:
- **RAG applications**: Pre-extracted snippets ready for LLM consumption
- **Real-time information**: Current events, breaking news, status updates
- **Combined sources**: Web + news results in a single API call
- **Summarization tasks**: "What's the latest on...", "Key points about..."
### What's the livecrawl feature?
You.com can fetch full page content on-demand. Use `--livecrawl web` for web results, `--livecrawl news` for news articles, or `--livecrawl all` for both. Content is returned in Markdown format.
### Does You.com include news automatically?
Yes! You.com's intelligent classification automatically includes relevant news results when your query has news intent. You can also use `--include-news` to explicitly enable it.
---
## SearXNG Specific
### Do I need my own SearXNG instance?
Yes! SearXNG is self-hosted. Most public instances disable the JSON API to prevent bot abuse. You need to run your own instance with JSON format enabled. See: https://docs.searxng.org/admin/installation.html
### How do I set up SearXNG?
Docker is the easiest way:
```bash
docker run -d -p 8080:8080 searxng/searxng
```
Then enable JSON in `settings.yml`:
```yaml
search:
formats:
- html
- json
```
### Why am I getting "403 Forbidden"?
The JSON API is disabled on your instance. Enable it in `settings.yml` under `search.formats`.
### What's the API cost for SearXNG?
**$0!** SearXNG is free and open-source. You only pay for hosting (~$5/month VPS). Unlimited queries.
### When should I use SearXNG?
- **Privacy-sensitive queries**: No tracking, no profiling
- **Budget-conscious**: $0 API cost
- **Diverse results**: Aggregates 70+ search engines
- **Self-hosted requirements**: Full control over your search infrastructure
- **Fallback provider**: When paid APIs are rate-limited
### Can I limit which search engines SearXNG uses?
Yes! Use `--engines google,bing,duckduckgo` to specify engines, or configure defaults in `config.json`.
---
## Provider Selection
### Which provider should I use?
| Query Type | Best Provider | Why |
|------------|---------------|-----|
| **Shopping** ("buy laptop", "cheap shoes") | **Serper** | Google Shopping, price comparisons, local stores |
| **Research** ("how does X work?", "explain Y") | **Tavily** | Deep research, academic quality, full-page content |
| **Startups/Papers** ("companies like X", "arxiv papers") | **Exa** | Semantic/neural search, startup discovery |
| **RAG/Real-time** ("summarize latest", "current events") | **You.com** | LLM-ready snippets, combined web+news |
| **Privacy** ("search without tracking") | **SearXNG** | No tracking, multi-source, self-hosted |
**Tip:** Enable auto-routing and let the skill choose automatically! 🎯
### Do I need all 5 providers?
**No!** All providers are optional. You can use:
- **1 provider** (e.g., just Serper for everything)
- **2-3 providers** (e.g., Serper + You.com for most needs)
- **All 5** (maximum flexibility + fallback options)
### How much do the APIs cost?
| Provider | Free Tier | Paid Plan |
|----------|-----------|-----------|
| **Serper** | 2,500 queries/mo | $50/mo (5,000 queries) |
| **Tavily** | 1,000 queries/mo | $150/mo (10,000 queries) |
| **Exa** | 1,000 queries/mo | $1,000/mo (100,000 queries) |
| **You.com** | Limited free | ~$10/mo (varies by usage) |
| **SearXNG** | **FREE** ✅ | Only VPS cost (~$5/mo if self-hosting) |
**Budget tip:** Use SearXNG as primary + others as fallback for specialized queries!
### How private is SearXNG really?
| Setup | Privacy Level |
|-------|---------------|
| **Self-hosted (your VPS)** | ⭐⭐⭐⭐⭐ You control everything |
| **Self-hosted (Docker local)** | ⭐⭐⭐⭐⭐ Fully private |
| **Public instance** | ⭐⭐⭐ Depends on operator's logging policy |
**Best practice:** Self-host if privacy is critical.
### Which provider has the best results?
| Metric | Winner |
|--------|--------|
| **Most accurate for facts** | Serper (Google) |
| **Best for research depth** | Tavily |
| **Best for semantic queries** | Exa |
| **Best for RAG/AI context** | You.com |
| **Most diverse sources** | SearXNG (70+ engines) |
| **Most private** | SearXNG (self-hosted) |
**Recommendation:** Enable multiple providers + auto-routing for best overall experience.
### How does auto-routing work?
The skill analyzes your query for keywords and patterns:
```python
"buy cheap laptop" Serper (shopping signals)
"how does AI work?" Tavily (research/explanation)
"companies like X" Exa (semantic/similar)
"summarize latest news" You.com (RAG/real-time)
"search privately" SearXNG (privacy signals)
```
**Confidence threshold:** Only routes if confidence > 30%. Otherwise uses default provider.
**Override:** Use `-p provider` to force a specific provider.
---
## Production Use
### Can I use this in production?
**Yes!** Web-search-plus is production-ready:
- ✅ Error handling with automatic fallback
- ✅ Rate limit protection
- ✅ Timeout handling (30s per provider)
- ✅ API key security (.env + config.json gitignored)
- ✅ 5 providers for redundancy
**Tip:** Monitor API usage to avoid exceeding free tiers!
### What if I run out of API credits?
1. **Fallback chain:** Other enabled providers automatically take over
2. **Use SearXNG:** Switch to self-hosted (unlimited queries)
3. **Upgrade plan:** Paid tiers have higher limits
4. **Rate limit:** Use `disabled_providers` to skip exhausted APIs temporarily
---
## Updates
### How do I update to the latest version?
**Via ClawHub (recommended):**
```bash
clawhub update web-search-plus --registry "https://www.clawhub.ai" --no-input
```
**Manually:**
```bash
cd /path/to/workspace/skills/web-search-plus/
git pull origin main
python3 scripts/setup.py # Re-run to configure new features
```
### Where can I report bugs or request features?
- **GitHub Issues:** https://github.com/robbyczgw-cla/web-search-plus/issues
- **ClawHub:** https://www.clawhub.ai/skills/web-search-plus

800
README.md Normal file
View File

@@ -0,0 +1,800 @@
# Web Search Plus
> Unified multi-provider web search with **Intelligent Auto-Routing** — uses multi-signal analysis to automatically select between **Serper**, **Tavily**, **Querit**, **Exa**, **Perplexity (Sonar Pro)**, **You.com**, and **SearXNG** with confidence scoring.
[![ClawHub](https://img.shields.io/badge/ClawHub-web--search--plus-blue)](https://clawhub.ai)
[![Version](https://img.shields.io/badge/version-2.9.0-green)](https://clawhub.ai)
[![GitHub](https://img.shields.io/badge/GitHub-web--search--plus-blue)](https://github.com/robbyczgw-cla/web-search-plus)
---
## 🧠 Features (v2.9.0)
**Intelligent Multi-Signal Routing** — The skill uses sophisticated query analysis:
- **Intent Classification**: Shopping vs Research vs Discovery vs RAG/Real-time vs Privacy
- **Linguistic Patterns**: "how much" (price) vs "how does" (research) vs "privately" (privacy)
- **Entity Detection**: Product+brand combos, URLs, domains
- **Complexity Analysis**: Long queries favor research providers
- **Confidence Scoring**: Know how reliable the routing decision is
```bash
python3 scripts/search.py -q "how much does iPhone 16 cost" # → Serper (68% confidence)
python3 scripts/search.py -q "how does quantum entanglement work" # → Tavily (86% HIGH)
python3 scripts/search.py -q "startups similar to Notion" # → Exa (76% HIGH)
python3 scripts/search.py -q "companies like stripe.com" # → Exa (100% HIGH - URL detected)
python3 scripts/search.py -q "summarize key points on AI" # → You.com (68% MEDIUM - RAG intent)
python3 scripts/search.py -q "search privately without tracking" # → SearXNG (74% HIGH - privacy intent)
```
---
## 🔍 When to Use Which Provider
### Built-in Brave Search (OpenClaw default)
- ✅ General web searches
- ✅ Privacy-focused
- ✅ Quick lookups
- ✅ Default fallback
### Serper (Google Results)
- 🛍 **Product specs, prices, shopping**
- 📍 **Local businesses, places**
- 🎯 **"Google it" - explicit Google results**
- 📰 **Shopping/images needed**
- 🏆 **Knowledge Graph data**
### Tavily (AI-Optimized Research)
- 📚 **Research questions, deep dives**
- 🔬 **Complex multi-part queries**
- 📄 **Need full page content** (not just snippets)
- 🎓 **Academic/technical research**
- 🔒 **Domain filtering** (trusted sources)
### Querit (Multilingual AI Search)
- 🌏 **Multilingual AI search** across 10+ languages
-**Fast real-time answers** with ~400ms latency
- 🗺️ **International / cross-language queries**
- 📰 **Recency-aware results** for current information
- 🤖 **Good fit for AI workflows** with clean metadata
### Exa (Neural Semantic Search)
- 🔗 **Find similar pages**
- 🏢 **Company/startup discovery**
- 📝 **Research papers**
- 💻 **GitHub projects**
- 📅 **Date-specific content**
### Perplexity (Sonar Pro via Kilo Gateway)
-**Direct answers** (great for “who/what/define”)
- 🧾 **Cited, answer-first output**
- 🕒 **Current events / “as of” questions**
- 🔑 Auth via `KILOCODE_API_KEY` (routes to `https://api.kilo.ai`)
### You.com (RAG/Real-time)
- 🤖 **RAG applications** (LLM-ready snippets)
- 📰 **Combined web + news** (single API call)
-**Real-time information** (current events)
- 📋 **Summarization context** ("What's the latest...")
- 🔄 **Live crawling** (full page content on demand)
### SearXNG (Privacy-First/Self-Hosted)
- 🔒 **Privacy-preserving search** (no tracking)
- 🌐 **Multi-source aggregation** (70+ engines)
- 💰 **$0 API cost** (self-hosted)
- 🎯 **Diverse perspectives** (results from multiple engines)
- 🏠 **Self-hosted environments** (full control)
---
## Table of Contents
- [Quick Start](#quick-start)
- [Smart Auto-Routing](#smart-auto-routing)
- [Configuration Guide](#configuration-guide)
- [Provider Deep Dives](#provider-deep-dives)
- [Usage Examples](#usage-examples)
- [Workflow Examples](#workflow-examples)
- [Optimization Tips](#optimization-tips)
- [FAQ & Troubleshooting](#faq--troubleshooting)
- [API Reference](#api-reference)
---
## Quick Start
### Option A: Interactive Setup (Recommended)
```bash
# Run the setup wizard - it guides you through everything
python3 scripts/setup.py
```
The wizard explains each provider, collects your API keys, and creates `config.json` automatically.
### Option B: Manual Setup
```bash
# 1. Set up at least one API key (or SearXNG instance)
export SERPER_API_KEY="your-key" # https://serper.dev
export TAVILY_API_KEY="your-key" # https://tavily.com
export QUERIT_API_KEY="your-key" # https://querit.ai
export EXA_API_KEY="your-key" # https://exa.ai
export KILOCODE_API_KEY="your-key" # enables Perplexity Sonar Pro via https://api.kilo.ai
export YOU_API_KEY="your-key" # https://api.you.com
export SEARXNG_INSTANCE_URL="https://your-instance.example.com" # Self-hosted
# 2. Run a search (auto-routed!)
python3 scripts/search.py -q "best laptop 2024"
```
### Run a Search
```bash
# Auto-routed to best provider
python3 scripts/search.py -q "best laptop 2024"
# Or specify a provider explicitly
python3 scripts/search.py -p serper -q "iPhone 16 specs"
python3 scripts/search.py -p tavily -q "quantum computing explained" --depth advanced
python3 scripts/search.py -p querit -q "latest AI policy updates in Germany"
python3 scripts/search.py -p exa -q "AI startups 2024" --category company
python3 scripts/search.py -p perplexity -q "Who is the president of Austria?"
```
---
## Smart Auto-Routing
### How It Works
When you don't specify a provider, the skill analyzes your query and routes it to the best provider:
| Query Contains | Routes To | Example |
|---------------|-----------|---------|
| "price", "buy", "shop", "cost" | **Serper** | "iPhone 16 price" |
| "near me", "restaurant", "hotel" | **Serper** | "pizza near me" |
| "weather", "news", "latest" | **Serper** | "weather Berlin" |
| "how does", "explain", "what is" | **Tavily** | "how does TCP work" |
| "research", "study", "analyze" | **Tavily** | "climate research" |
| "tutorial", "guide", "learn" | **Tavily** | "python tutorial" |
| multilingual, current status, latest updates | **Querit** | "latest AI policy updates in Germany" |
| "similar to", "companies like" | **Exa** | "companies like Stripe" |
| "startup", "Series A" | **Exa** | "AI startups Series A" |
| "github", "research paper" | **Exa** | "LLM papers arxiv" |
| "private", "anonymous", "no tracking" | **SearXNG** | "search privately" |
| "multiple sources", "aggregate" | **SearXNG** | "results from all engines" |
### Examples
```bash
# These are all auto-routed to the optimal provider:
python3 scripts/search.py -q "MacBook Pro M3 price" # → Serper
python3 scripts/search.py -q "how does HTTPS work" # → Tavily
python3 scripts/search.py -q "latest AI policy updates in Germany" # → Querit
python3 scripts/search.py -q "startups like Notion" # → Exa
python3 scripts/search.py -q "best sushi restaurant near me" # → Serper
python3 scripts/search.py -q "explain attention mechanism" # → Tavily
python3 scripts/search.py -q "alternatives to Figma" # → Exa
python3 scripts/search.py -q "search privately without tracking" # → SearXNG
```
### Result Caching (introduced in v2.7.x)
Search results are **automatically cached** for 1 hour to save API costs:
```bash
# First request: fetches from API ($)
python3 scripts/search.py -q "AI startups 2024"
# Second request: uses cache (FREE!)
python3 scripts/search.py -q "AI startups 2024"
# Output includes: "cached": true
# Bypass cache (force fresh results)
python3 scripts/search.py -q "AI startups 2024" --no-cache
# View cache stats
python3 scripts/search.py --cache-stats
# Clear all cached results
python3 scripts/search.py --clear-cache
# Custom TTL (in seconds, default: 3600 = 1 hour)
python3 scripts/search.py -q "query" --cache-ttl 7200
```
**Cache location:** `.cache/` in skill directory (override with `WSP_CACHE_DIR` environment variable)
### Debug Auto-Routing
See exactly why a provider was selected:
```bash
python3 scripts/search.py --explain-routing -q "best laptop to buy"
```
Output:
```json
{
"query": "best laptop to buy",
"selected_provider": "serper",
"reason": "matched_keywords (score=2)",
"matched_keywords": ["buy", "best"],
"available_providers": ["serper", "tavily", "exa"]
}
```
### Routing Info in Results
Every search result includes routing information:
```json
{
"provider": "serper",
"query": "iPhone 16 price",
"results": [...],
"routing": {
"auto_routed": true,
"selected_provider": "serper",
"reason": "matched_keywords (score=1)",
"matched_keywords": ["price"]
}
}
```
---
## Configuration Guide
### Environment Variables
Create a `.env` file or set these in your shell:
```bash
# Required: Set at least one
export SERPER_API_KEY="your-serper-key"
export TAVILY_API_KEY="your-tavily-key"
export EXA_API_KEY="your-exa-key"
```
### Config File (config.json)
The `config.json` file lets you customize auto-routing and provider defaults:
```json
{
"defaults": {
"provider": "serper",
"max_results": 5
},
"auto_routing": {
"enabled": true,
"fallback_provider": "serper",
"provider_priority": ["serper", "tavily", "exa"],
"disabled_providers": [],
"keyword_mappings": {
"serper": ["price", "buy", "shop", "cost", "deal", "near me", "weather"],
"tavily": ["how does", "explain", "research", "what is", "tutorial"],
"exa": ["similar to", "companies like", "alternatives", "startup", "github"]
}
},
"serper": {
"country": "us",
"language": "en"
},
"tavily": {
"depth": "basic",
"topic": "general"
},
"exa": {
"type": "neural"
}
}
```
### Configuration Examples
#### Example 1: Disable Exa (Only Use Serper + Tavily)
```json
{
"auto_routing": {
"disabled_providers": ["exa"]
}
}
```
#### Example 2: Make Tavily the Default
```json
{
"auto_routing": {
"fallback_provider": "tavily"
}
}
```
#### Example 3: Add Custom Keywords
```json
{
"auto_routing": {
"keyword_mappings": {
"serper": [
"price", "buy", "shop", "amazon", "ebay", "walmart",
"deal", "discount", "coupon", "sale", "cheap"
],
"tavily": [
"how does", "explain", "research", "what is",
"coursera", "udemy", "learn", "course", "certification"
],
"exa": [
"similar to", "companies like", "competitors",
"YC company", "funded startup", "Series A", "Series B"
]
}
}
}
```
#### Example 4: German Locale for Serper
```json
{
"serper": {
"country": "de",
"language": "de"
}
}
```
#### Example 5: Disable Auto-Routing
```json
{
"auto_routing": {
"enabled": false
},
"defaults": {
"provider": "serper"
}
}
```
#### Example 6: Research-Heavy Config
```json
{
"auto_routing": {
"fallback_provider": "tavily",
"provider_priority": ["tavily", "serper", "exa"]
},
"tavily": {
"depth": "advanced",
"include_raw_content": true
}
}
```
---
## Provider Deep Dives
### Serper (Google Search API)
**What it is:** Direct access to Google Search results via API — the same results you'd see on google.com.
#### Strengths
| Strength | Description |
|----------|-------------|
| 🎯 **Accuracy** | Google's search quality, knowledge graph, featured snippets |
| 🛒 **Shopping** | Product prices, reviews, shopping results |
| 📍 **Local** | Business listings, maps, places |
| 📰 **News** | Real-time news with Google News integration |
| 🖼 **Images** | Google Images search |
| ⚡ **Speed** | Fastest response times (~200-400ms) |
#### Best Use Cases
- ✅ Product specifications and comparisons
- ✅ Shopping and price lookups
- ✅ Local business searches ("restaurants near me")
- ✅ Quick factual queries (weather, conversions, definitions)
- ✅ News headlines and current events
- ✅ Image searches
- ✅ When you need "what Google shows"
#### Getting Your API Key
1. Go to [serper.dev](https://serper.dev)
2. Sign up with email or Google
3. Copy your API key from the dashboard
4. Set `SERPER_API_KEY` environment variable
---
### Tavily (Research Search)
**What it is:** AI-optimized search engine built for research and RAG applications — returns synthesized answers plus full content.
#### Strengths
| Strength | Description |
|----------|-------------|
| 📚 **Research Quality** | Optimized for comprehensive, accurate research |
| 💬 **AI Answers** | Returns synthesized answers, not just links |
| 📄 **Full Content** | Can return complete page content (raw_content) |
| 🎯 **Domain Filtering** | Include/exclude specific domains |
| 🔬 **Deep Mode** | Advanced search for thorough research |
| 📰 **Topic Modes** | Specialized for general vs news content |
#### Best Use Cases
- ✅ Research questions requiring synthesized answers
- ✅ Academic or technical deep dives
- ✅ When you need actual page content (not just snippets)
- ✅ Multi-source information comparison
- ✅ Domain-specific research (filter to authoritative sources)
- ✅ News research with context
- ✅ RAG/LLM applications
#### Getting Your API Key
1. Go to [tavily.com](https://tavily.com)
2. Sign up and verify email
3. Navigate to API Keys section
4. Generate and copy your key
5. Set `TAVILY_API_KEY` environment variable
---
### Exa (Neural Search)
**What it is:** Neural/semantic search engine that understands meaning, not just keywords — finds conceptually similar content.
#### Strengths
| Strength | Description |
|----------|-------------|
| 🧠 **Semantic Understanding** | Finds results by meaning, not keywords |
| 🔗 **Similar Pages** | Find pages similar to a reference URL |
| 🏢 **Company Discovery** | Excellent for finding startups, companies |
| 📑 **Category Filters** | Filter by type (company, paper, tweet, etc.) |
| 📅 **Date Filtering** | Precise date range searches |
| 🎓 **Academic** | Great for research papers and technical content |
#### Best Use Cases
- ✅ Conceptual queries ("companies building X")
- ✅ Finding similar companies or pages
- ✅ Startup and company discovery
- ✅ Research paper discovery
- ✅ Finding GitHub projects
- ✅ Date-filtered searches for recent content
- ✅ When keyword matching fails
#### Getting Your API Key
1. Go to [exa.ai](https://exa.ai)
2. Sign up with email or Google
3. Navigate to API section in dashboard
4. Copy your API key
5. Set `EXA_API_KEY` environment variable
---
### SearXNG (Privacy-First Meta-Search)
**What it is:** Open-source, self-hosted meta-search engine that aggregates results from 70+ search engines without tracking.
#### Strengths
| Strength | Description |
|----------|-------------|
| 🔒 **Privacy-First** | No tracking, no profiling, no data collection |
| 🌐 **Multi-Engine** | Aggregates Google, Bing, DuckDuckGo, and 70+ more |
| 💰 **Free** | $0 API cost (self-hosted, unlimited queries) |
| 🎯 **Diverse Results** | Get perspectives from multiple search engines |
| ⚙ **Customizable** | Choose which engines to use, SafeSearch, language |
| 🏠 **Self-Hosted** | Full control over your search infrastructure |
#### Best Use Cases
- ✅ Privacy-sensitive searches (no tracking)
- ✅ When you want diverse results from multiple engines
- ✅ Budget-conscious (no API fees)
- ✅ Self-hosted/air-gapped environments
- ✅ Fallback when paid APIs are rate-limited
- ✅ When "aggregate everything" is the goal
#### Setting Up Your Instance
```bash
# Docker (recommended, 5 minutes)
docker run -d -p 8080:8080 searxng/searxng
# Enable JSON API in settings.yml:
# search:
# formats: [html, json]
```
1. See [docs.searxng.org](https://docs.searxng.org/admin/installation.html)
2. Deploy via Docker, pip, or your preferred method
3. Enable JSON format in `settings.yml`
4. Set `SEARXNG_INSTANCE_URL` environment variable
---
## Usage Examples
### Auto-Routed Searches (Recommended)
```bash
# Just search — the skill picks the best provider
python3 scripts/search.py -q "Tesla Model 3 price"
python3 scripts/search.py -q "how do neural networks learn"
python3 scripts/search.py -q "YC startups like Stripe"
python3 scripts/search.py -q "search privately without tracking"
```
### Serper Options
```bash
# Different search types
python3 scripts/search.py -p serper -q "gaming monitor" --type shopping
python3 scripts/search.py -p serper -q "coffee shop" --type places
python3 scripts/search.py -p serper -q "AI news" --type news
# With time filter
python3 scripts/search.py -p serper -q "OpenAI news" --time-range day
# Include images
python3 scripts/search.py -p serper -q "iPhone 16 Pro" --images
# Different locale
python3 scripts/search.py -p serper -q "Wetter Wien" --country at --language de
```
### Tavily Options
```bash
# Deep research mode
python3 scripts/search.py -p tavily -q "quantum computing applications" --depth advanced
# With full page content
python3 scripts/search.py -p tavily -q "transformer architecture" --raw-content
# Domain filtering
python3 scripts/search.py -p tavily -q "AI research" --include-domains arxiv.org nature.com
```
### Exa Options
```bash
# Category filtering
python3 scripts/search.py -p exa -q "AI startups Series A" --category company
python3 scripts/search.py -p exa -q "attention mechanism" --category "research paper"
# Date filtering
python3 scripts/search.py -p exa -q "YC companies" --start-date 2024-01-01
# Find similar pages
python3 scripts/search.py -p exa --similar-url "https://stripe.com" --category company
```
### SearXNG Options
```bash
# Basic search
python3 scripts/search.py -p searxng -q "linux distros"
# Specific engines only
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"
# SafeSearch (0=off, 1=moderate, 2=strict)
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2
# With time filter
python3 scripts/search.py -p searxng -q "open source projects" --time-range week
# Custom instance URL
python3 scripts/search.py -p searxng -q "test" --searxng-url "http://localhost:8080"
```
---
## Workflow Examples
### 🛒 Product Research Workflow
```bash
# Step 1: Get product specs (auto-routed to Serper)
python3 scripts/search.py -q "MacBook Pro M3 Max specs"
# Step 2: Check prices (auto-routed to Serper)
python3 scripts/search.py -q "MacBook Pro M3 Max price comparison"
# Step 3: In-depth reviews (auto-routed to Tavily)
python3 scripts/search.py -q "detailed MacBook Pro M3 Max review"
```
### 📚 Academic Research Workflow
```bash
# Step 1: Understand the topic (auto-routed to Tavily)
python3 scripts/search.py -q "explain transformer architecture in deep learning"
# Step 2: Find recent papers (Exa)
python3 scripts/search.py -p exa -q "transformer improvements" --category "research paper" --start-date 2024-01-01
# Step 3: Find implementations (Exa)
python3 scripts/search.py -p exa -q "transformer implementation" --category github
```
### 🏢 Competitive Analysis Workflow
```bash
# Step 1: Find competitors (auto-routed to Exa)
python3 scripts/search.py -q "companies like Notion"
# Step 2: Find similar products (Exa)
python3 scripts/search.py -p exa --similar-url "https://notion.so" --category company
# Step 3: Deep dive comparison (Tavily)
python3 scripts/search.py -p tavily -q "Notion vs Coda comparison" --depth advanced
```
---
## Optimization Tips
### Cost Optimization
| Tip | Savings |
|-----|---------|
| Use SearXNG for routine queries | **$0 API cost** |
| Use auto-routing (defaults to Serper, cheapest paid) | Best value |
| Use Tavily `basic` before `advanced` | ~50% cost reduction |
| Set appropriate `max_results` | Linear cost savings |
| Use Exa only for semantic queries | Avoid waste |
### Performance Optimization
| Tip | Impact |
|-----|--------|
| Serper is fastest (~200ms) | Use for time-sensitive queries |
| Tavily `basic` faster than `advanced` | ~2x faster |
| Lower `max_results` = faster response | Linear improvement |
---
## FAQ & Troubleshooting
### General Questions
**Q: Do I need API keys for all three providers?**
> No. You only need keys for providers you want to use. Auto-routing skips providers without keys.
**Q: Which provider should I start with?**
> Serper — it's the fastest, cheapest, and has the largest free tier (2,500 queries).
**Q: Can I use multiple providers in one workflow?**
> Yes! That's the recommended approach. See [Workflow Examples](#workflow-examples).
**Q: How do I reduce API costs?**
> Use auto-routing (defaults to cheapest), start with lower `max_results`, use Tavily `basic` before `advanced`.
### Auto-Routing Questions
**Q: Why did my query go to the wrong provider?**
> Use `--explain-routing` to debug. Add custom keywords to config.json if needed.
**Q: Can I add my own keywords?**
> Yes! Edit `config.json` → `auto_routing.keyword_mappings`.
**Q: How does keyword scoring work?**
> Multi-word phrases get higher weights. "companies like" (2 words) scores higher than "like" (1 word).
**Q: What if no keywords match?**
> Uses the fallback provider (default: Serper).
**Q: Can I force a specific provider?**
> Yes, use `-p serper`, `-p tavily`, or `-p exa`.
### Troubleshooting
**Error: "Missing API key"**
```bash
# Check if key is set
echo $SERPER_API_KEY
# Set it
export SERPER_API_KEY="your-key"
```
**Error: "API Error (401)"**
> Your API key is invalid or expired. Generate a new one.
**Error: "API Error (429)"**
> Rate limited. Wait and retry, or upgrade your plan.
**Empty results?**
> Try a different provider, broaden your query, or remove restrictive filters.
**Slow responses?**
> Reduce `max_results`, use Tavily `basic`, or use Serper (fastest).
---
## API Reference
### Output Format
All providers return unified JSON:
```json
{
"provider": "serper|tavily|exa",
"query": "original search query",
"results": [
{
"title": "Page Title",
"url": "https://example.com/page",
"snippet": "Content excerpt...",
"score": 0.95,
"date": "2024-01-15",
"raw_content": "Full page content (Tavily only)"
}
],
"images": ["url1", "url2"],
"answer": "Synthesized answer",
"knowledge_graph": { },
"routing": {
"auto_routed": true,
"selected_provider": "serper",
"reason": "matched_keywords (score=1)",
"matched_keywords": ["price"]
}
}
```
### CLI Options Reference
| Option | Providers | Description |
|--------|-----------|-------------|
| `-q, --query` | All | Search query |
| `-p, --provider` | All | Provider: auto, serper, tavily, querit, exa, perplexity, you, searxng |
| `-n, --max-results` | All | Max results (default: 5) |
| `--auto` | All | Force auto-routing |
| `--explain-routing` | All | Debug auto-routing |
| `--images` | Serper, Tavily | Include images |
| `--country` | Serper, You | Country code (default: us) |
| `--language` | Serper, SearXNG | Language code (default: en) |
| `--type` | Serper | search/news/images/videos/places/shopping |
| `--time-range` | Serper, SearXNG | hour/day/week/month/year |
| `--depth` | Tavily | basic/advanced |
| `--topic` | Tavily | general/news |
| `--raw-content` | Tavily | Include full page content |
| `--querit-base-url` | Querit | Override Querit API base URL |
| `--querit-base-path` | Querit | Override Querit API path |
| `--exa-type` | Exa | neural/keyword |
| `--category` | Exa | company/research paper/news/pdf/github/tweet |
| `--start-date` | Exa | Start date (YYYY-MM-DD) |
| `--end-date` | Exa | End date (YYYY-MM-DD) |
| `--similar-url` | Exa | Find similar pages |
| `--searxng-url` | SearXNG | Instance URL |
| `--searxng-safesearch` | SearXNG | 0=off, 1=moderate, 2=strict |
| `--engines` | SearXNG | Specific engines (google,bing,duckduckgo) |
| `--categories` | SearXNG | Search categories (general,images,news) |
| `--include-domains` | Tavily, Exa | Only these domains |
| `--exclude-domains` | Tavily, Exa | Exclude these domains |
| `--compact` | All | Compact JSON output |
---
## License
MIT
---
## Links
- [Serper](https://serper.dev) — Google Search API
- [Tavily](https://tavily.com) — AI Research Search
- [Exa](https://exa.ai) — Neural Search
- [ClawHub](https://clawhub.ai) — OpenClaw Skills

258
SKILL.md Normal file
View File

@@ -0,0 +1,258 @@
---
name: web-search-plus
version: 2.9.2
description: "具有智能自动路由的统一搜索技能。"
tags: [search, web-search, serper, tavily, querit, exa, perplexity, you, searxng, google, multilingual-search, research, semantic-search, auto-routing, multi-provider, shopping, rag, free-tier, privacy, self-hosted, kilo]
metadata: {"openclaw":{"requires":{"bins":["python3","bash"],"env":{"SERPER_API_KEY":"optional","TAVILY_API_KEY":"optional","QUERIT_API_KEY":"optional","EXA_API_KEY":"optional","YOU_API_KEY":"optional","SEARXNG_INSTANCE_URL":"optional","KILOCODE_API_KEY":"optional — required for Perplexity provider (via Kilo Gateway)"},"note":"Only ONE provider key needed. All are optional."}}}
---
# Web Search Plus
**Stop choosing search providers. Let the skill do it for you.**
This skill connects you to 7 search providers (Serper, Tavily, Querit, Exa, Perplexity, You.com, SearXNG) and automatically picks the best one for each query. Shopping question? → Google results. Research question? → Deep research engine. Need a direct answer? → AI-synthesized with citations. Want privacy? → Self-hosted option.
---
## ✨ What Makes This Different?
- **Just search** — No need to think about which provider to use
- **Smart routing** — Analyzes your query and picks the best provider automatically
- **7 providers, 1 interface** — Google results, research engines, neural search, AI answers with citations, RAG-optimized, and privacy-first all in one
- **Works with just 1 key** — Start with any single provider, add more later
- **Free options available** — SearXNG is completely free (self-hosted)
---
## 🚀 Quick Start
```bash
# Interactive setup (recommended for first run)
python3 scripts/setup.py
# Or manual: copy config and add your keys
cp config.example.json config.json
```
The wizard explains each provider, collects API keys, and configures defaults.
---
## 🔑 API Keys
You only need **ONE** key to get started. Add more providers later for better coverage.
| Provider | Free Tier | Best For | Sign Up |
|----------|-----------|----------|---------|
| **Serper** | 2,500/mo | Shopping, prices, local, news | [serper.dev](https://serper.dev) |
| **Tavily** | 1,000/mo | Research, explanations, academic | [tavily.com](https://tavily.com) |
| **Querit** | Contact sales/free tier varies | Multilingual AI search, international updates | [querit.ai](https://querit.ai) |
| **Exa** | 1,000/mo | "Similar to X", startups, papers | [exa.ai](https://exa.ai) |
| **Perplexity** | Via Kilo | Direct answers with citations | [kilo.ai](https://kilo.ai) |
| **You.com** | Limited | Real-time info, AI/RAG context | [api.you.com](https://api.you.com) |
| **SearXNG** | **FREE** ✅ | Privacy, multi-source, $0 cost | Self-hosted |
**Setting your keys:**
```bash
# Option A: .env file (recommended)
export SERPER_API_KEY="your-key"
export TAVILY_API_KEY="your-key"
export QUERIT_API_KEY="your-key"
# Option B: config.json
{ "serper": { "api_key": "your-key" } }
```
---
## 🎯 When to Use Which Provider
| I want to... | Provider | Example Query |
|--------------|----------|---------------|
| Find product prices | **Serper** | "iPhone 16 Pro Max price" |
| Find restaurants/stores nearby | **Serper** | "best pizza near me" |
| Understand how something works | **Tavily** | "how does HTTPS encryption work" |
| Do deep research | **Tavily** | "climate change research 2024" |
| Search across languages / international updates | **Querit** | "latest AI policy updates in Germany" |
| Find companies like X | **Exa** | "startups similar to Notion" |
| Find research papers | **Exa** | "transformer architecture papers" |
| Get a direct answer with sources | **Perplexity** | "events in Berlin this weekend" |
| Know the current status of something | **Perplexity** | "what is the status of Ethereum upgrades" |
| Get real-time info | **You.com** | "latest AI regulation news" |
| Search without being tracked | **SearXNG** | anything, privately |
**Pro tip:** Just search normally! Auto-routing handles most queries correctly. Override with `-p provider` when needed.
---
## 🧠 How Auto-Routing Works
The skill looks at your query and picks the best provider:
```bash
"iPhone 16 price" → Serper (shopping keywords)
"how does quantum computing work" → Tavily (research question)
"latest AI policy updates in Germany" → Querit (multilingual + recency)
"companies like stripe.com" → Exa (URL detected, similarity)
"events in Graz this weekend" → Perplexity (local + direct answer)
"latest news on AI" → You.com (real-time intent)
"search privately" → SearXNG (privacy keywords)
```
**What if it picks wrong?** Override it: `python3 scripts/search.py -p tavily -q "your query"`
**Debug routing:** `python3 scripts/search.py --explain-routing -q "your query"`
---
## 📖 Usage Examples
### Let Auto-Routing Choose (Recommended)
```bash
python3 scripts/search.py -q "Tesla Model 3 price"
python3 scripts/search.py -q "explain machine learning"
python3 scripts/search.py -q "latest AI policy updates in Germany"
python3 scripts/search.py -q "startups like Figma"
```
### Force a Specific Provider
```bash
python3 scripts/search.py -p serper -q "weather Berlin"
python3 scripts/search.py -p tavily -q "quantum computing" --depth advanced
python3 scripts/search.py -p querit -q "latest AI policy updates in Germany"
python3 scripts/search.py -p exa --similar-url "https://stripe.com" --category company
python3 scripts/search.py -p you -q "breaking tech news" --include-news
python3 scripts/search.py -p searxng -q "linux distros" --engines "google,bing"
```
---
## ⚙ Configuration
```json
{
"auto_routing": {
"enabled": true,
"fallback_provider": "serper",
"confidence_threshold": 0.3,
"disabled_providers": []
},
"serper": {"country": "us", "language": "en"},
"tavily": {"depth": "advanced"},
"exa": {"type": "neural"},
"you": {"country": "US", "include_news": true},
"searxng": {"instance_url": "https://your-instance.example.com"}
}
```
---
## 📊 Provider Comparison
| Feature | Serper | Tavily | Exa | Perplexity | You.com | SearXNG |
|---------|:------:|:------:|:---:|:----------:|:-------:|:-------:|
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡⚡ |
| Direct Answers | ✗ | ✗ | ✗ | ✓✓ | ✗ | ✗ |
| Citations | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
| Factual Accuracy | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
| Semantic Understanding | ⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐ |
| Full Page Content | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Shopping/Local | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Find Similar Pages | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| RAG-Optimized | ✗ | ✓ | ✗ | ✗ | ✓✓ | ✗ |
| Privacy-First | ✗ | ✗ | ✗ | ✗ | ✗ | ✓✓ |
| API Cost | $$ | $$ | $$ | Via Kilo | $ | **FREE** |
---
## ❓ Common Questions
### Do I need API keys for all providers?
**No.** You only need keys for providers you want to use. Start with one (Serper recommended), add more later.
### Which provider should I start with?
**Serper** — fastest, cheapest, largest free tier (2,500 queries/month), and handles most queries well.
### What if I run out of free queries?
The skill automatically falls back to your other configured providers. Or switch to SearXNG (unlimited, self-hosted).
### How much does this cost?
- **Free tiers:** 2,500 (Serper) + 1,000 (Tavily) + 1,000 (Exa) = 4,500+ free searches/month
- **SearXNG:** Completely free (just ~$5/mo if you self-host on a VPS)
- **Paid plans:** Start around $10-50/month depending on provider
### Is SearXNG really private?
**Yes, if self-hosted.** You control the server, no tracking, no profiling. Public instances depend on the operator's policy.
### How do I set up SearXNG?
```bash
# Docker (5 minutes)
docker run -d -p 8080:8080 searxng/searxng
```
Then enable JSON API in `settings.yml`. See [docs.searxng.org](https://docs.searxng.org/admin/installation.html).
### Why did it route my query to the "wrong" provider?
Sometimes queries are ambiguous. Use `--explain-routing` to see why, then override with `-p provider` if needed.
---
## 🔄 Automatic Fallback
If one provider fails (rate limit, timeout, error), the skill automatically tries the next provider. You'll see `routing.fallback_used: true` in the response when this happens.
---
## 📤 Output Format
```json
{
"provider": "serper",
"query": "iPhone 16 price",
"results": [{"title": "...", "url": "...", "snippet": "...", "score": 0.95}],
"routing": {
"auto_routed": true,
"provider": "serper",
"confidence": 0.78,
"confidence_level": "high"
}
}
```
---
## ⚠ Important Note
**Tavily, Serper, and Exa are NOT core OpenClaw providers.**
❌ Don't modify `~/.openclaw/openclaw.json` for these
✅ Use this skill's scripts — keys auto-load from `.env`
---
## 🔒 Security
**SearXNG SSRF Protection:** The SearXNG instance URL is validated with defense-in-depth:
- Enforces `http`/`https` schemes only
- Blocks cloud metadata endpoints (169.254.169.254, metadata.google.internal)
- Resolves hostnames and blocks private/internal IPs (loopback, RFC1918, link-local, reserved)
- Operators who intentionally self-host on private networks can set `SEARXNG_ALLOW_PRIVATE=1`
## 📚 More Documentation
- **[FAQ.md](FAQ.md)** — Detailed answers to more questions
- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** — Fix common errors
- **[README.md](README.md)** — Full technical reference
---
## 🔗 Quick Links
- [Serper](https://serper.dev) — Google Search API
- [Tavily](https://tavily.com) — AI Research Search
- [Exa](https://exa.ai) — Neural Search
- [Perplexity](https://www.perplexity.ai) — AI-Synthesized Answers (via [Kilo Gateway](https://kilo.ai))
- [You.com](https://api.you.com) — RAG/Real-time Search
- [SearXNG](https://docs.searxng.org) — Privacy-First Meta-Search

315
TROUBLESHOOTING.md Normal file
View File

@@ -0,0 +1,315 @@
# Troubleshooting Guide
## Caching Issues (v2.7.0+)
### Cache not working / always fetching fresh
**Symptoms:**
- Every request hits the API
- `"cached": false` even for repeated queries
**Solutions:**
1. Check cache directory exists and is writable:
```bash
ls -la .cache/ # Should exist in skill directory
```
2. Verify `--no-cache` isn't being passed
3. Check disk space isn't full
4. Ensure query is EXACTLY the same (including provider and max_results)
### Stale results from cache
**Symptoms:**
- Getting outdated information
- Cache TTL seems too long
**Solutions:**
1. Use `--no-cache` to force fresh results
2. Reduce TTL: `--cache-ttl 1800` (30 minutes)
3. Clear cache: `python3 scripts/search.py --clear-cache`
### Cache growing too large
**Symptoms:**
- Disk space filling up
- Many .json files in `.cache/`
**Solutions:**
1. Clear cache periodically:
```bash
python3 scripts/search.py --clear-cache
```
2. Set up a cron job to clear weekly
3. Use a smaller TTL so entries expire faster
### "Permission denied" when caching
**Symptoms:**
- Cache write errors in stderr
- Searches work but don't cache
**Solutions:**
1. Check directory permissions: `chmod 755 .cache/`
2. Use custom cache dir: `export WSP_CACHE_DIR="$TMP_DIR/wsp-cache"`
---
## Common Issues
### "No API key found" error
**Symptoms:**
```
Error: No API key found for serper
```
**Solutions:**
1. Check `.env` exists in skill folder with `export VAR=value` format
2. Keys auto-load from skill's `.env` since v2.2.0
3. Or set in system environment: `export SERPER_API_KEY="..."`
4. Verify key format in config.json:
```json
{ "serper": { "api_key": "your-key" } }
```
**Priority order:** config.json > .env > environment variable
---
### Getting empty results
**Symptoms:**
- Search returns no results
- `"results": []` in JSON output
**Solutions:**
1. Check API key is valid (try the provider's web dashboard)
2. Try a different provider with `-p`
3. Some queries have no results (very niche topics)
4. Check if provider is rate-limited
5. Verify internet connectivity
**Debug:**
```bash
python3 scripts/search.py -q "test query" --verbose
```
---
### Rate limited
**Symptoms:**
```
Error: 429 Too Many Requests
Error: Rate limit exceeded
```
**Good news:** Since v2.2.5, automatic fallback kicks in! If one provider hits rate limits, the script automatically tries the next provider.
**Solutions:**
1. Wait for rate limit to reset (usually 1 hour or end of day)
2. Use a different provider: `-p tavily` instead of `-p serper`
3. Check free tier limits:
- Serper: 2,500 free total
- Tavily: 1,000/month free
- Exa: 1,000/month free
4. Upgrade to paid tier for higher limits
5. Use SearXNG (self-hosted, unlimited)
**Fallback info:** Response will include `routing.fallback_used: true` when fallback was used.
---
### SearXNG: "403 Forbidden"
**Symptoms:**
```
Error: 403 Forbidden
Error: JSON format not allowed
```
**Cause:** Most public SearXNG instances disable JSON API to prevent bot abuse.
**Solution:** Self-host your own instance:
```bash
docker run -d -p 8080:8080 searxng/searxng
```
Then enable JSON in `settings.yml`:
```yaml
search:
formats:
- html
- json # Add this!
```
Restart the container and update your config:
```json
{
"searxng": {
"instance_url": "http://localhost:8080"
}
}
```
---
### SearXNG: Slow responses
**Symptoms:**
- SearXNG takes 2-5 seconds
- Other providers are faster
**Explanation:** This is expected behavior. SearXNG queries 70+ upstream engines in parallel, which takes longer than direct API calls.
**Trade-off:** Slower but privacy-preserving + multi-source + $0 cost.
**Solutions:**
1. Accept the trade-off for privacy benefits
2. Limit engines for faster results:
```bash
python3 scripts/search.py -p searxng -q "query" --engines "google,bing"
```
3. Use SearXNG as fallback (put last in priority list)
---
### Auto-routing picks wrong provider
**Symptoms:**
- Query about research goes to Serper
- Query about shopping goes to Tavily
**Debug:**
```bash
python3 scripts/search.py --explain-routing -q "your query"
```
This shows the full analysis:
```json
{
"query": "how much does iPhone 16 Pro cost",
"routing_decision": {
"provider": "serper",
"confidence": 0.68,
"reason": "moderate_confidence_match"
},
"scores": {"serper": 7.0, "tavily": 0.0, "exa": 0.0},
"top_signals": [
{"matched": "how much", "weight": 4.0},
{"matched": "brand + product detected", "weight": 3.0}
]
}
```
**Solutions:**
1. Override with explicit provider: `-p tavily`
2. Rephrase query to be more explicit about intent
3. Adjust `confidence_threshold` in config.json (default: 0.3)
---
### Config not loading
**Symptoms:**
- Changes to config.json not applied
- Using default values instead
**Solutions:**
1. Check JSON syntax (use a validator)
2. Ensure file is in skill directory: `/path/to/skills/web-search-plus/config.json`
3. Check file permissions
4. Run setup wizard to regenerate:
```bash
python3 scripts/setup.py --reset
```
**Validate JSON:**
```bash
python3 -m json.tool config.json
```
---
### Python dependencies missing
**Symptoms:**
```
ModuleNotFoundError: No module named 'requests'
```
**Solution:**
```bash
pip3 install requests
```
Or install all dependencies:
```bash
pip3 install -r requirements.txt
```
---
### Timeout errors
**Symptoms:**
```
Error: Request timeout after 30s
```
**Causes:**
- Slow network connection
- Provider API issues
- SearXNG instance overloaded
**Solutions:**
1. Try again (temporary issue)
2. Switch provider: `-p serper`
3. Check your internet connection
4. If using SearXNG, check instance health
---
### Duplicate results
**Symptoms:**
- Same result appears multiple times
- Results overlap between providers
**Solution:** This is expected when using auto-fallback or multiple providers. The skill doesn't deduplicate across providers.
For single-provider results:
```bash
python3 scripts/search.py -p serper -q "query"
```
---
## Debug Mode
For detailed debugging:
```bash
# Verbose output
python3 scripts/search.py -q "query" --verbose
# Show routing decision
python3 scripts/search.py -q "query" --explain-routing
# Dry run (no actual search)
python3 scripts/search.py -q "query" --dry-run
# Test specific provider
python3 scripts/search.py -p tavily -q "query" --verbose
```
---
## Getting Help
**Still stuck?**
1. Check the full documentation in `README.md`
2. Run the setup wizard: `python3 scripts/setup.py`
3. Review `FAQ.md` for common questions
4. Open an issue: https://github.com/robbyczgw-cla/web-search-plus/issues

6
_meta.json Normal file
View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn73gpe8xz2630jrknkb3ya96h7zb84h",
"slug": "web-search-plus",
"version": "2.9.2",
"publishedAt": 1774629265049
}

265
config.example.json Normal file
View File

@@ -0,0 +1,265 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$comment": "Web Search Plus configuration — intelligent routing and provider settings",
"defaults": {
"provider": "serper",
"max_results": 5
},
"auto_routing": {
"enabled": true,
"fallback_provider": "serper",
"provider_priority": [
"tavily",
"querit",
"exa",
"perplexity",
"serper",
"you",
"searxng"
],
"disabled_providers": [],
"confidence_threshold": 0.3,
"keyword_mappings": {
"serper": [
"price",
"buy",
"shop",
"shopping",
"cost",
"deal",
"sale",
"purchase",
"cheap",
"expensive",
"store",
"product",
"review",
"specs",
"specification",
"where to buy",
"near me",
"local",
"restaurant",
"hotel",
"weather",
"news",
"latest",
"breaking",
"map",
"directions",
"phone number",
"preis",
"kaufen",
"bestellen",
"günstig",
"billig",
"teuer",
"kosten",
"angebot",
"rabatt",
"shop",
"händler",
"geschäft",
"laden",
"test",
"bewertung",
"technische daten",
"spezifikationen",
"wo kaufen",
"in der nähe",
"wetter",
"nachrichten",
"aktuell",
"neu"
],
"tavily": [
"how does",
"how to",
"explain",
"research",
"what is",
"why does",
"analyze",
"compare",
"study",
"academic",
"detailed",
"comprehensive",
"in-depth",
"understand",
"learn",
"tutorial",
"guide",
"overview",
"history of",
"background",
"context",
"implications",
"pros and cons",
"wie funktioniert",
"erklärung",
"erklären",
"was ist",
"warum",
"analyse",
"vergleich",
"vergleichen",
"studie",
"verstehen",
"lernen",
"anleitung",
"tutorial",
"überblick",
"hintergrund",
"vor- und nachteile"
],
"exa": [
"similar to",
"companies like",
"find sites like",
"alternatives to",
"competitors",
"startup",
"github",
"paper",
"research paper",
"arxiv",
"pdf",
"academic paper",
"similar pages",
"related sites",
"who else",
"other companies",
"comparable to",
"ähnlich wie",
"firmen wie",
"alternativen zu",
"konkurrenten",
"vergleichbar mit",
"andere unternehmen"
],
"you": [
"rag",
"context for",
"summarize",
"brief",
"quick overview",
"tldr",
"key points",
"key facts",
"main points",
"main takeaways",
"latest news",
"latest updates",
"current events",
"current situation",
"current status",
"right now",
"as of today",
"up to date",
"real time",
"what's happening",
"what's the latest",
"updates on",
"status of",
"zusammenfassung",
"aktuelle nachrichten",
"neueste updates"
],
"searxng": [
"private",
"privately",
"anonymous",
"anonymously",
"without tracking",
"no tracking",
"privacy",
"privacy-focused",
"privacy-first",
"duckduckgo alternative",
"private search",
"aggregate results",
"multiple sources",
"diverse results",
"diverse perspectives",
"meta search",
"all engines",
"free search",
"no api cost",
"self-hosted search",
"zero cost",
"privat",
"anonym",
"ohne tracking",
"datenschutz",
"verschiedene quellen",
"aus mehreren quellen",
"alle suchmaschinen",
"kostenlose suche",
"keine api kosten"
],
"querit": [
"multilingual",
"current status",
"latest updates",
"status of",
"real-time",
"summarize",
"global search",
"cross-language",
"international",
"aktuell",
"zusammenfassung"
],
"perplexity": [
"what is",
"current status",
"status of",
"what happened with",
"events in",
"things to do in"
]
}
},
"serper": {
"country": "us",
"language": "en",
"type": "search",
"autocorrect": true,
"include_images": false
},
"tavily": {
"depth": "advanced",
"topic": "general",
"max_results": 8
},
"exa": {
"type": "neural",
"category": null,
"include_domains": [],
"exclude_domains": []
},
"you": {
"country": "US",
"language": "en",
"safesearch": "moderate",
"include_news": true
},
"searxng": {
"$comment": "SearXNG requires a self-hosted instance. No API key needed, just your instance URL.",
"instance_url": null,
"safesearch": 0,
"engines": null,
"language": "en"
},
"querit_api_key": "",
"querit": {
"base_url": "https://api.querit.ai",
"base_path": "/v1/search",
"timeout": 10
},
"perplexity": {
"api_url": "https://api.kilo.ai/api/gateway/chat/completions",
"model": "perplexity/sonar-pro"
}
}

88
package.json Normal file
View File

@@ -0,0 +1,88 @@
{
"name": "@openclaw/web-search-plus",
"version": "2.9.0",
"description": "Unified search skill with Intelligent Auto-Routing. Uses multi-signal analysis (intent classification, linguistic patterns, URL/brand detection) to automatically select between Serper (Google), Tavily (Research), Querit (Multilingual AI Search), Exa (Neural), Perplexity (AI Answers), You.com (RAG/Real-time), and SearXNG (Privacy/Self-hosted) with confidence scoring.",
"keywords": [
"openclaw",
"skill",
"search",
"web-search",
"serper",
"tavily",
"exa",
"you",
"you.com",
"google-search",
"research",
"semantic-search",
"ai-agent",
"auto-routing",
"smart-routing",
"multi-provider",
"shopping",
"product-search",
"similar-sites",
"company-discovery",
"rag",
"real-time",
"free-tier",
"api-aggregator",
"querit",
"multilingual-search"
],
"author": "robbyczgw-cla",
"license": "MIT",
"repository": {
"type": "git",
"url": "https://github.com/robbyczgw-cla/web-search-plus.git"
},
"homepage": "https://clawhub.ai/robbyczgw-cla/web-search-plus",
"bugs": {
"url": "https://github.com/robbyczgw-cla/web-search-plus/issues"
},
"openclaw": {
"skill": true,
"triggers": [
"search",
"find",
"look up",
"research"
],
"capabilities": [
"web-search",
"image-search",
"semantic-search",
"multi-provider"
],
"providers": [
"serper",
"tavily",
"querit",
"exa",
"perplexity",
"you",
"searxng"
],
"requirements": {
"bins": [
"python3",
"bash"
],
"env": {
"SERPER_API_KEY": "optional",
"TAVILY_API_KEY": "optional",
"EXA_API_KEY": "optional",
"YOU_API_KEY": "optional",
"SEARXNG_INSTANCE_URL": "optional",
"QUERIT_API_KEY": "optional",
"KILOCODE_API_KEY": "optional"
}
}
},
"files": [
"SKILL.md",
"README.md",
"scripts/",
".env.example"
]
}

2940
scripts/search.py Normal file

File diff suppressed because it is too large Load Diff

453
scripts/setup.py Normal file
View File

@@ -0,0 +1,453 @@
#!/usr/bin/env python3
"""
Web Search Plus - Interactive Setup Wizard
==========================================
Runs on first use (when no config.json exists) to configure providers and API keys.
Creates config.json with your settings. API keys are stored locally only.
Usage:
python3 scripts/setup.py # Interactive setup
python3 scripts/setup.py --reset # Reset and reconfigure
"""
import json
import os
import sys
from pathlib import Path
# ANSI colors for terminal output
class Colors:
HEADER = '\033[95m'
BLUE = '\033[94m'
CYAN = '\033[96m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[1m'
DIM = '\033[2m'
RESET = '\033[0m'
def color(text: str, c: str) -> str:
"""Wrap text in color codes."""
return f"{c}{text}{Colors.RESET}"
def print_header():
"""Print the setup wizard header."""
print()
print(color("╔════════════════════════════════════════════════════════════╗", Colors.CYAN))
print(color("║ 🔍 Web Search Plus - Setup Wizard ║", Colors.CYAN))
print(color("╚════════════════════════════════════════════════════════════╝", Colors.CYAN))
print()
print(color("This wizard will help you configure your search providers.", Colors.DIM))
print(color("API keys are stored locally in config.json (gitignored).", Colors.DIM))
print()
def print_provider_info():
"""Print information about each provider."""
print(color("📚 Available Providers:", Colors.BOLD))
print()
providers = [
{
"name": "Serper",
"emoji": "🔎",
"best_for": "Google results, shopping, local businesses, news",
"free_tier": "2,500 queries/month",
"signup": "https://serper.dev",
"strengths": ["Fastest response times", "Product prices & specs", "Knowledge Graph", "Local business data"]
},
{
"name": "Tavily",
"emoji": "📖",
"best_for": "Research, explanations, in-depth analysis",
"free_tier": "1,000 queries/month",
"signup": "https://tavily.com",
"strengths": ["AI-synthesized answers", "Full page content", "Domain filtering", "Academic research"]
},
{
"name": "Exa",
"emoji": "🧠",
"best_for": "Semantic search, finding similar content, discovery",
"free_tier": "1,000 queries/month",
"signup": "https://exa.ai",
"strengths": ["Neural/semantic understanding", "Similar page discovery", "Startup/company finder", "Date filtering"]
},
{
"name": "You.com",
"emoji": "🤖",
"best_for": "RAG applications, real-time info, LLM-ready snippets",
"free_tier": "Limited free tier",
"signup": "https://api.you.com",
"strengths": ["LLM-ready snippets", "Combined web + news", "Live page crawling", "Real-time information"]
},
{
"name": "SearXNG",
"emoji": "🔒",
"best_for": "Privacy-first search, multi-source aggregation, $0 API cost",
"free_tier": "FREE (self-hosted)",
"signup": "https://docs.searxng.org/admin/installation.html",
"strengths": ["Privacy-preserving (no tracking)", "70+ search engines", "Self-hosted = $0 API cost", "Diverse results"]
}
]
for p in providers:
print(f" {p['emoji']} {color(p['name'], Colors.BOLD)}")
print(f" Best for: {color(p['best_for'], Colors.GREEN)}")
print(f" Free tier: {p['free_tier']}")
print(f" Sign up: {color(p['signup'], Colors.BLUE)}")
print()
def ask_yes_no(prompt: str, default: bool = True) -> bool:
"""Ask a yes/no question."""
suffix = "[Y/n]" if default else "[y/N]"
while True:
response = input(f"{prompt} {color(suffix, Colors.DIM)}: ").strip().lower()
if response == "":
return default
if response in ("y", "yes"):
return True
if response in ("n", "no"):
return False
print(color(" Please enter 'y' or 'n'", Colors.YELLOW))
def ask_choice(prompt: str, options: list, default: str = None) -> str:
"""Ask user to choose from a list of options."""
print(f"\n{prompt}")
for i, opt in enumerate(options, 1):
marker = color("", Colors.GREEN) if opt == default else " "
print(f" {marker} {i}. {opt}")
while True:
hint = f" [default: {default}]" if default else ""
response = input(f"Enter number (1-{len(options)}){color(hint, Colors.DIM)}: ").strip()
if response == "" and default:
return default
try:
idx = int(response)
if 1 <= idx <= len(options):
return options[idx - 1]
except ValueError:
pass
print(color(f" Please enter a number between 1 and {len(options)}", Colors.YELLOW))
def ask_api_key(provider: str, signup_url: str) -> str:
"""Ask for an API key with validation."""
print()
print(f" {color(f'Get your {provider} API key:', Colors.DIM)} {color(signup_url, Colors.BLUE)}")
while True:
key = input(f" Enter your {provider} API key: ").strip()
if not key:
print(color(" ⚠️ No key entered. This provider will be disabled.", Colors.YELLOW))
return None
# Basic validation
if len(key) < 10:
print(color(" ⚠️ Key seems too short. Please check and try again.", Colors.YELLOW))
continue
# Mask key for confirmation
masked = key[:4] + "..." + key[-4:] if len(key) > 12 else key[:2] + "..."
print(color(f" ✓ Key saved: {masked}", Colors.GREEN))
return key
def ask_searxng_instance(docs_url: str) -> str:
"""Ask for SearXNG instance URL with connection test."""
print()
print(f" {color('SearXNG is self-hosted. You need your own instance.', Colors.DIM)}")
print(f" {color('Setup guide:', Colors.DIM)} {color(docs_url, Colors.BLUE)}")
print()
print(f" {color('Example URLs:', Colors.DIM)}")
print(f" • http://localhost:8080 (local Docker)")
print(f" • https://searx.your-domain.com (self-hosted)")
print()
while True:
url = input(f" Enter your SearXNG instance URL: ").strip()
if not url:
print(color(" ⚠️ No URL entered. SearXNG will be disabled.", Colors.YELLOW))
return None
# Basic URL validation
if not url.startswith(("http://", "https://")):
print(color(" ⚠️ URL must start with http:// or https://", Colors.YELLOW))
continue
# SSRF protection: validate URL before connecting
try:
import ipaddress
import socket
from urllib.parse import urlparse as _urlparse
_parsed = _urlparse(url)
_hostname = _parsed.hostname or ""
_blocked = {"169.254.169.254", "metadata.google.internal", "metadata.internal"}
if _hostname in _blocked:
print(color(f" ❌ Blocked: {_hostname} is a cloud metadata endpoint.", Colors.RED))
continue
if not os.environ.get("SEARXNG_ALLOW_PRIVATE", "").strip() == "1":
_resolved = socket.getaddrinfo(_hostname, _parsed.port or 80, proto=socket.IPPROTO_TCP)
for _fam, _t, _p, _cn, _sa in _resolved:
_ip = ipaddress.ip_address(_sa[0])
if _ip.is_loopback or _ip.is_private or _ip.is_link_local or _ip.is_reserved:
print(color(f" ❌ Blocked: {_hostname} resolves to private IP {_ip}.", Colors.RED))
print(color(f" Set SEARXNG_ALLOW_PRIVATE=1 if intentional.", Colors.DIM))
raise ValueError("private_ip")
except ValueError as _ve:
if str(_ve) == "private_ip":
continue
raise
except socket.gaierror:
print(color(f" ❌ Cannot resolve hostname: {_hostname}", Colors.RED))
continue
# Test connection
print(color(f" Testing connection to {url}...", Colors.DIM))
try:
import urllib.request
import urllib.error
test_url = f"{url.rstrip('/')}/search?q=test&format=json"
req = urllib.request.Request(
test_url,
headers={"User-Agent": "ClawdBot-WebSearchPlus/2.5", "Accept": "application/json"}
)
with urllib.request.urlopen(req, timeout=10) as response:
data = response.read().decode("utf-8")
import json
result = json.loads(data)
# Check if it looks like SearXNG JSON response
if "results" in result or "query" in result:
print(color(f" ✓ Connection successful! SearXNG instance is working.", Colors.GREEN))
return url.rstrip("/")
else:
print(color(f" ⚠️ Connected but response doesn't look like SearXNG JSON.", Colors.YELLOW))
if ask_yes_no(" Use this URL anyway?", default=False):
return url.rstrip("/")
except urllib.error.HTTPError as e:
if e.code == 403:
print(color(f" ⚠️ JSON API is disabled (403 Forbidden).", Colors.YELLOW))
print(color(f" Enable JSON in settings.yml: search.formats: [html, json]", Colors.DIM))
else:
print(color(f" ⚠️ HTTP error: {e.code} {e.reason}", Colors.YELLOW))
if ask_yes_no(" Try a different URL?", default=True):
continue
return None
except urllib.error.URLError as e:
print(color(f" ⚠️ Cannot reach instance: {e.reason}", Colors.YELLOW))
if ask_yes_no(" Try a different URL?", default=True):
continue
return None
except Exception as e:
print(color(f" ⚠️ Error: {e}", Colors.YELLOW))
if ask_yes_no(" Try a different URL?", default=True):
continue
return None
def ask_result_count() -> int:
"""Ask for default result count."""
options = ["3 (fast, minimal)", "5 (balanced - recommended)", "10 (comprehensive)"]
choice = ask_choice("Default number of results per search?", options, "5 (balanced - recommended)")
if "3" in choice:
return 3
elif "10" in choice:
return 10
return 5
def run_setup(skill_dir: Path, force_reset: bool = False):
"""Run the interactive setup wizard."""
config_path = skill_dir / "config.json"
example_path = skill_dir / "config.example.json"
# Check if config already exists
if config_path.exists() and not force_reset:
print(color("✓ config.json already exists!", Colors.GREEN))
print()
if not ask_yes_no("Do you want to reconfigure?", default=False):
print(color("Setup cancelled. Your existing config is unchanged.", Colors.DIM))
return False
print()
print_header()
print_provider_info()
# Load example config as base
if example_path.exists():
with open(example_path) as f:
config = json.load(f)
else:
config = {
"defaults": {"provider": "serper", "max_results": 5},
"auto_routing": {"enabled": True, "fallback_provider": "serper"},
"serper": {},
"tavily": {},
"exa": {}
}
# Remove any existing API keys from example
for provider in ["serper", "tavily", "exa"]:
if provider in config:
config[provider].pop("api_key", None)
enabled_providers = []
# ===== Question 1: Which providers to enable =====
print(color("" * 60, Colors.DIM))
print(color("\n📋 Step 1: Choose Your Providers\n", Colors.BOLD))
print("Select which search providers you want to enable.")
print(color("(You need at least one API key to use this skill)", Colors.DIM))
print()
providers_info = {
"serper": ("Serper", "https://serper.dev", "Google results, shopping, local"),
"tavily": ("Tavily", "https://tavily.com", "Research, explanations, analysis"),
"exa": ("Exa", "https://exa.ai", "Semantic search, similar content"),
"you": ("You.com", "https://api.you.com", "RAG applications, real-time info"),
"searxng": ("SearXNG", "https://docs.searxng.org/admin/installation.html", "Privacy-first, self-hosted, $0 cost")
}
for provider, (name, url, desc) in providers_info.items():
print(f" {color(name, Colors.BOLD)}: {desc}")
# Special handling for SearXNG
if provider == "searxng":
print(color(" Note: SearXNG requires a self-hosted instance (no API key needed)", Colors.DIM))
if ask_yes_no(f" Do you have a SearXNG instance?", default=False):
instance_url = ask_searxng_instance(url)
if instance_url:
if "searxng" not in config:
config["searxng"] = {}
config["searxng"]["instance_url"] = instance_url
enabled_providers.append(provider)
else:
print(color(f"{name} disabled (no instance URL)", Colors.DIM))
else:
print(color(f"{name} skipped (no instance)", Colors.DIM))
else:
if ask_yes_no(f" Enable {name}?", default=True):
# ===== Question 2: API key for each enabled provider =====
api_key = ask_api_key(name, url)
if api_key:
config[provider]["api_key"] = api_key
enabled_providers.append(provider)
else:
print(color(f"{name} disabled (no API key)", Colors.DIM))
else:
print(color(f"{name} disabled", Colors.DIM))
print()
if not enabled_providers:
print()
print(color("⚠️ No providers enabled!", Colors.RED))
print("You need at least one API key to use web-search-plus.")
print("Run this setup again when you have an API key.")
return False
# ===== Question 3: Default provider =====
print(color("" * 60, Colors.DIM))
print(color("\n⚙️ Step 2: Default Settings\n", Colors.BOLD))
if len(enabled_providers) > 1:
default_provider = ask_choice(
"Which provider should be the default for general queries?",
enabled_providers,
enabled_providers[0]
)
else:
default_provider = enabled_providers[0]
print(f"Default provider: {color(default_provider, Colors.GREEN)} (only one enabled)")
config["defaults"]["provider"] = default_provider
config["auto_routing"]["fallback_provider"] = default_provider
# ===== Question 4: Auto-routing =====
print()
print(color("Auto-routing", Colors.BOLD) + " automatically picks the best provider for each query:")
print(color("'iPhone price' → Serper (shopping intent)", Colors.DIM))
print(color("'how does TCP work' → Tavily (research intent)", Colors.DIM))
print(color("'companies like Stripe' → Exa (discovery intent)", Colors.DIM))
print()
auto_routing = ask_yes_no("Enable auto-routing?", default=True)
config["auto_routing"]["enabled"] = auto_routing
if not auto_routing:
print(color(f" → All queries will use {default_provider}", Colors.DIM))
# ===== Question 5: Result count =====
print()
max_results = ask_result_count()
config["defaults"]["max_results"] = max_results
# Set disabled providers
all_providers = ["serper", "tavily", "exa", "you", "searxng"]
disabled = [p for p in all_providers if p not in enabled_providers]
config["auto_routing"]["disabled_providers"] = disabled
# ===== Save config =====
print()
print(color("" * 60, Colors.DIM))
print(color("\n💾 Saving Configuration\n", Colors.BOLD))
with open(config_path, 'w') as f:
json.dump(config, f, indent=2)
print(color(f"✓ Configuration saved to: {config_path}", Colors.GREEN))
print()
# ===== Summary =====
print(color("📋 Configuration Summary:", Colors.BOLD))
print(f" Enabled providers: {', '.join(enabled_providers)}")
print(f" Default provider: {default_provider}")
print(f" Auto-routing: {'enabled' if auto_routing else 'disabled'}")
print(f" Results per search: {max_results}")
print()
# ===== Test suggestion =====
print(color("🚀 Ready to search! Try:", Colors.BOLD))
print(color(f" python3 scripts/search.py -q \"your query here\"", Colors.CYAN))
print()
return True
def check_first_run(skill_dir: Path) -> bool:
"""Check if this is the first run (no config.json)."""
config_path = skill_dir / "config.json"
return not config_path.exists()
def main():
# Determine skill directory
script_path = Path(__file__).resolve()
skill_dir = script_path.parent.parent
# Check for --reset flag
force_reset = "--reset" in sys.argv
# Check for --check flag (just check if setup needed)
if "--check" in sys.argv:
if check_first_run(skill_dir):
print("Setup required: config.json not found")
sys.exit(1)
else:
print("Setup complete: config.json exists")
sys.exit(0)
# Run setup
success = run_setup(skill_dir, force_reset)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

20
test-auto-routing.sh Normal file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
# Test Auto-Routing Feature
# Tests various query types to verify routing works correctly
# Load from environment or .env file
if [ -f .env ]; then
source .env
fi
# Check required keys
if [ -z "$SERPER_API_KEY" ]; then
echo "Error: SERPER_API_KEY not set. Copy .env.example to .env and add your keys."
exit 1
fi
echo "Testing auto-routing..."
python3 scripts/search.py -q "buy iPhone 15 price" --auto
python3 scripts/search.py -q "how does quantum computing work" --auto
python3 scripts/search.py -q "companies like Stripe" --auto