Initial commit with translated description
This commit is contained in:
536
CHANGELOG.md
Normal file
536
CHANGELOG.md
Normal file
@@ -0,0 +1,536 @@
|
||||
# Changelog - Web Search Plus
|
||||
|
||||
## [2.9.2] - 2026-03-27
|
||||
|
||||
### Fixed
|
||||
- Replaced hardcoded temporary cache path examples with portable `$TMP_DIR` placeholders in `TROUBLESHOOTING.md`
|
||||
|
||||
## [2.9.0] - 2026-03-12
|
||||
|
||||
### ✨ New Provider: Querit (Multilingual AI Search)
|
||||
|
||||
[Querit.ai](https://querit.ai) is a Singapore-based multilingual AI search API purpose-built for LLMs and RAG pipelines. 300 billion page index, 20+ countries, 10+ languages.
|
||||
|
||||
- Added **Querit** as the 7th search provider via `https://api.querit.ai/v1/search`
|
||||
- Configure via `QUERIT_API_KEY` — optional, gracefully skipped if not set
|
||||
- Routing score: `research * 0.65 + rag * 0.35 + recency * 0.45` — favored for multilingual and real-time queries
|
||||
- Handles Querit's quirky `error_code=200` responses as success (not an error)
|
||||
- Handles `IncompleteRead` as transient/retryable failure
|
||||
- Live-tested with 10 benchmark queries ✅
|
||||
|
||||
### 🔧 Fixed: Fallback chain dies on unconfigured provider
|
||||
|
||||
- `sys.exit(1)` in `validate_api_key()` raised `SystemExit` (inherits from `BaseException`), which bypassed the `except Exception` fallback loop and killed the entire process instead of trying the next provider
|
||||
- Replaced with catchable `ProviderConfigError` — fallback chain now continues correctly through all configured providers
|
||||
|
||||
### 🔧 Fixed: Perplexity citations are generic placeholders
|
||||
|
||||
- Previously extracted citation URLs via regex from the answer text, resulting in generic "Source 1" / "Source 2" labels
|
||||
- Now uses the structured `data["citations"]` array from the Perplexity API response directly — results have readable titles
|
||||
- Regex extraction kept as fallback when API doesn't return a `citations` field
|
||||
|
||||
### ✨ Improved: German locale routing patterns
|
||||
|
||||
- Added German-language signal patterns for local and news queries
|
||||
- Improves auto-routing for queries like `"aktuelle Nachrichten"`, `"beste Restaurants Graz"`, `"KI Regulierung Europa"`
|
||||
|
||||
### 📝 Documentation
|
||||
|
||||
- Added Querit to README provider tables, routing examples, and API key setup section
|
||||
- Added `querit_api_key` to `config.example.json`
|
||||
- Updated `SKILL.md` provider mentions and env metadata
|
||||
- Bumped package version to `2.9.0`
|
||||
|
||||
|
||||
## [2.8.6] - 2026-03-03
|
||||
|
||||
### Changed
|
||||
- Documented Perplexity Sonar Pro usage and refreshed release docs.
|
||||
|
||||
|
||||
## [2.8.5] - 2026-02-20
|
||||
|
||||
### ✨ Feature: Perplexity freshness filter
|
||||
|
||||
- Added `freshness` parameter to Perplexity provider (`day`, `week`, `month`, `year`)
|
||||
- Maps to Perplexity's native `search_recency_filter` parameter
|
||||
- Example: `python3 scripts/search.py -p perplexity -q "latest AI news" --freshness day`
|
||||
- Consistent with freshness support in Serper and Brave providers
|
||||
|
||||
## [2.8.4] - 2026-02-20
|
||||
|
||||
### 🔒 Security Fix: SSRF protection in setup wizard
|
||||
|
||||
- **Fixed:** `setup.py` SearXNG connection test had no SSRF protection (unlike `search.py`)
|
||||
- **Before:** Operator could be tricked into probing internal networks during setup
|
||||
- **After:** Same IP validation as `search.py` — blocks private IPs, cloud metadata, loopback
|
||||
- **Credit:** ClawHub security scanner
|
||||
|
||||
## [2.8.3] - 2026-02-20
|
||||
|
||||
### 🐛 Critical Fix: Perplexity results empty
|
||||
|
||||
- **Fixed:** Perplexity provider returned 0 results because the AI-synthesized answer wasn't mapped into the results array
|
||||
- **Before:** Only extracted URLs from the answer text were returned as results (often 0)
|
||||
- **After:** The full answer is now the primary result (title, snippet with cleaned text), extracted source URLs follow as additional results
|
||||
- **Impact:** Perplexity queries now always return at least 1 result with the synthesized answer
|
||||
|
||||
## [2.8.0] - 2026-02-20
|
||||
|
||||
### 🆕 New Provider: Perplexity (AI-Synthesized Answers)
|
||||
|
||||
Added Perplexity as the 6th search provider via Kilo Gateway — the first provider that returns **direct answers with citations** instead of just links:
|
||||
|
||||
#### Features
|
||||
- **AI-Synthesized Answers**: Get a complete answer, not a list of links
|
||||
- **Inline Citations**: Every claim backed by `[1][2][3]` source references
|
||||
- **Real-Time Web Search**: Perplexity searches the web live, reads pages, and summarizes
|
||||
- **Zero Extra Config**: Works through Kilo Gateway with your existing `KILOCODE_API_KEY`
|
||||
- **Model**: `perplexity/sonar-pro` (best quality, supports complex queries)
|
||||
|
||||
#### Auto-Routing Signals
|
||||
New direct-answer intent detection routes to Perplexity for:
|
||||
- Status queries: "status of", "current state of", "what is the status"
|
||||
- Local info: "events in [city]", "things to do in", "what's happening in"
|
||||
- Direct questions: "what is", "who is", "when did", "how many"
|
||||
- Current affairs: "this week", "this weekend", "right now", "today"
|
||||
|
||||
#### Usage Examples
|
||||
```bash
|
||||
# Auto-routed
|
||||
python3 scripts/search.py -q "events in Graz Austria this weekend" # → Perplexity
|
||||
python3 scripts/search.py -q "what is the current status of Ethereum" # → Perplexity
|
||||
|
||||
# Explicit
|
||||
python3 scripts/search.py -p perplexity -q "latest AI regulation news"
|
||||
```
|
||||
|
||||
#### Configuration
|
||||
Requires `KILOCODE_API_KEY` environment variable (Kilo Gateway account).
|
||||
No additional API key needed — Perplexity is accessed through Kilo's unified API.
|
||||
|
||||
```bash
|
||||
export KILOCODE_API_KEY="your-kilo-key"
|
||||
```
|
||||
|
||||
### 🔧 Routing Rebalance
|
||||
|
||||
Major overhaul of the auto-routing confidence scoring to fix Serper dominance:
|
||||
|
||||
#### Problem
|
||||
Serper (Google) was winning ~90% of queries due to:
|
||||
- High recency multiplier boosting Serper on any query with dates/years
|
||||
- Default provider priority placing Serper first in ties
|
||||
- Research and discovery signals not strong enough to override
|
||||
|
||||
#### Changes
|
||||
- **Lowered Serper recency multiplier** — date mentions no longer auto-route to Google
|
||||
- **Strengthened research signals** for Tavily:
|
||||
- Added: "status of", "what happened with", "how does X compare"
|
||||
- Boosted weights for comparison patterns (4.0 → 5.0)
|
||||
- **Strengthened discovery signals** for Exa:
|
||||
- Added: "events in", "things to do in", "startups similar to"
|
||||
- Boosted weights for local discovery patterns
|
||||
- **Updated provider priority order**: `tavily → exa → perplexity → serper → you → searxng`
|
||||
- Serper moved from 1st to 4th in tie-breaking
|
||||
- Research/discovery providers now win on ambiguous queries
|
||||
|
||||
#### Routing Test Results
|
||||
|
||||
| Query | Before | After | ✓ |
|
||||
|-------|--------|-------|---|
|
||||
| "latest OpenClaw version Feb 2026" | Serper | Serper | ✅ |
|
||||
| "Ethereum Pectra upgrade status" | Serper | **Tavily** | ✅ |
|
||||
| "events in Graz this weekend" | Serper | **Perplexity** | ✅ |
|
||||
| "compare SearXNG vs Brave for AI agents" | Serper | **Tavily** | ✅ |
|
||||
| "Sam Altman OpenAI news this week" | Serper | Serper | ✅ |
|
||||
| "find startups similar to Kilo Code" | Serper | **Exa** | ✅ |
|
||||
|
||||
### 📊 Updated Provider Comparison
|
||||
|
||||
| Feature | Serper | Tavily | Exa | Perplexity | You.com | SearXNG |
|
||||
|---------|:------:|:------:|:---:|:----------:|:-------:|:-------:|
|
||||
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡ |
|
||||
| Direct Answers | ✗ | ✗ | ✗ | ✓✓ | ✗ | ✗ |
|
||||
| Citations | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
|
||||
| Local Events | ✓ | ✗ | ✓ | ✓✓ | ✗ | ✓ |
|
||||
| Research | ✗ | ✓✓ | ✓ | ✓ | ✓ | ✗ |
|
||||
| Discovery | ✗ | ✗ | ✓✓ | ✗ | ✗ | ✗ |
|
||||
| Self-Hosted | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
|
||||
|
||||
## [2.7.0] - 2026-02-14
|
||||
|
||||
### ✨ Added
|
||||
- Provider cooldown tracking in `.cache/provider_health.json`
|
||||
- Exponential cooldown on provider failures: **1m → 5m → 25m → 1h (cap)**
|
||||
- Retry strategy for transient failures (timeout, 429, 503): up to 2 retries with backoff **1s → 3s → 9s**
|
||||
- Smarter cache keys hashed from full request context (query/provider/max_results + locale, freshness, time_range, topic, search_engines, include_news, and related params)
|
||||
- Cross-provider result deduplication by normalized URL during fallback merge
|
||||
|
||||
### 🔧 Changed
|
||||
- Cooldown providers are skipped in routing while their cooldown is active
|
||||
- Provider health is reset automatically after successful requests
|
||||
- Fallback output now includes dedup metadata:
|
||||
- `deduplicated: true|false`
|
||||
- `metadata.dedup_count`
|
||||
|
||||
|
||||
## [2.6.5] - 2026-02-11
|
||||
|
||||
### 🆕 File-Based Result Caching
|
||||
|
||||
Added local caching to save API costs on repeated searches:
|
||||
|
||||
#### Features
|
||||
- **Automatic Caching**: Search results cached locally by default
|
||||
- **1-Hour TTL**: Results expire after 3600 seconds (configurable)
|
||||
- **Cache Indicators**: Response includes `cached: true/false` and `cache_age_seconds`
|
||||
- **Zero-Cost Repeats**: Cached requests don't hit APIs
|
||||
|
||||
#### New CLI Options
|
||||
- `--cache-ttl SECONDS` — Custom cache TTL (default: 3600)
|
||||
- `--no-cache` — Bypass cache, always fetch fresh
|
||||
- `--clear-cache` — Delete all cached results
|
||||
- `--cache-stats` — Show cache statistics (entries, size, age)
|
||||
|
||||
#### Configuration
|
||||
- **Cache directory**: `.cache/` in skill directory
|
||||
- **Environment variable**: `WSP_CACHE_DIR` to override location
|
||||
- **Cache key**: Based on query + provider + max_results (SHA256)
|
||||
|
||||
#### Usage Examples
|
||||
```bash
|
||||
# First request costs API credits
|
||||
python3 scripts/search.py -q "AI startups"
|
||||
|
||||
# Second request is FREE (uses cache)
|
||||
python3 scripts/search.py -q "AI startups"
|
||||
|
||||
# Force fresh results
|
||||
python3 scripts/search.py -q "AI startups" --no-cache
|
||||
|
||||
# View stats
|
||||
python3 scripts/search.py --cache-stats
|
||||
|
||||
# Clear everything
|
||||
python3 scripts/search.py --clear-cache
|
||||
```
|
||||
|
||||
#### Technical Details
|
||||
- Cache files: JSON with metadata (_cache_timestamp, _cache_key, etc.)
|
||||
- Automatic cleanup of expired entries on access
|
||||
- Graceful handling of corrupted cache files
|
||||
|
||||
## [2.6.1] - 2026-02-04
|
||||
|
||||
- Privacy cleanup: removed hardcoded paths and personal info from docs
|
||||
|
||||
## [2.5.0] - 2026-02-03
|
||||
|
||||
### 🆕 New Provider: SearXNG (Privacy-First Meta-Search)
|
||||
|
||||
Added SearXNG as the 5th search provider, focused on privacy and self-hosted search:
|
||||
|
||||
#### Features
|
||||
- **Privacy-Preserving**: No tracking, no profiling — your searches stay private
|
||||
- **Multi-Source Aggregation**: Queries 70+ upstream engines (Google, Bing, DuckDuckGo, etc.)
|
||||
- **$0 API Cost**: Self-hosted = unlimited queries with no API fees
|
||||
- **Diverse Results**: Get perspectives from multiple search engines in one query
|
||||
- **Customizable**: Choose which engines to use, set SafeSearch levels, language preferences
|
||||
|
||||
#### Auto-Routing Signals
|
||||
New privacy/multi-source intent detection routes to SearXNG for:
|
||||
- Privacy queries: "private", "anonymous", "without tracking", "no tracking"
|
||||
- Multi-source: "aggregate results", "multiple sources", "diverse perspectives"
|
||||
- Budget/free: "free search", "no api cost", "self-hosted search"
|
||||
- German: "privat", "anonym", "ohne tracking", "verschiedene quellen"
|
||||
|
||||
#### Usage Examples
|
||||
```bash
|
||||
# Auto-routed
|
||||
python3 scripts/search.py -q "search privately without tracking" # → SearXNG
|
||||
|
||||
# Explicit
|
||||
python3 scripts/search.py -p searxng -q "linux distros"
|
||||
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"
|
||||
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2
|
||||
```
|
||||
|
||||
#### Configuration
|
||||
```json
|
||||
{
|
||||
"searxng": {
|
||||
"instance_url": "https://your-instance.example.com",
|
||||
"safesearch": 0,
|
||||
"engines": null,
|
||||
"language": "en"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Setup
|
||||
SearXNG requires a self-hosted instance with JSON format enabled:
|
||||
```bash
|
||||
# Docker setup (5 minutes)
|
||||
docker run -d -p 8080:8080 searxng/searxng
|
||||
|
||||
# Enable JSON in settings.yml:
|
||||
# search:
|
||||
# formats: [html, json]
|
||||
|
||||
# Set instance URL
|
||||
export SEARXNG_INSTANCE_URL="http://localhost:8080"
|
||||
```
|
||||
|
||||
See: https://docs.searxng.org/admin/installation.html
|
||||
|
||||
### 📊 Updated Provider Comparison
|
||||
|
||||
| Feature | Serper | Tavily | Exa | You.com | SearXNG |
|
||||
|---------|:------:|:------:|:---:|:-------:|:-------:|
|
||||
| Privacy-First | ✗ | ✗ | ✗ | ✗ | ✓✓ |
|
||||
| Self-Hosted | ✗ | ✗ | ✗ | ✗ | ✓ |
|
||||
| API Cost | $$ | $$ | $$ | $ | **FREE** |
|
||||
| Multi-Engine | ✗ | ✗ | ✗ | ✗ | ✓ (70+) |
|
||||
|
||||
### 🔧 Technical Changes
|
||||
|
||||
- Added `search_searxng()` function with full error handling
|
||||
- Added `PRIVACY_SIGNALS` to QueryAnalyzer for auto-routing
|
||||
- Updated setup wizard with SearXNG option (instance URL validation)
|
||||
- Updated config.example.json with searxng section
|
||||
- New CLI args: `--searxng-url`, `--searxng-safesearch`, `--engines`, `--categories`
|
||||
|
||||
---
|
||||
|
||||
## [2.4.4] - 2026-02-03
|
||||
|
||||
### 📝 Documentation: Provider Count Fix
|
||||
|
||||
- **Fixed:** "You can use 1, 2, or all 3" → "1, 2, 3, or all 4" (we have 4 providers now!)
|
||||
- **Impact:** Accurate documentation for setup wizard
|
||||
|
||||
## [2.4.3] - 2026-02-03
|
||||
|
||||
### 📝 Documentation: Updated README
|
||||
|
||||
- **Added:** "NEW in v2.4.2" badge for You.com in SKILL.md
|
||||
- **Impact:** ClawHub README now properly highlights You.com as new feature
|
||||
|
||||
## [2.4.2] - 2026-02-03
|
||||
|
||||
### 🐛 Critical Fix: You.com API Configuration
|
||||
|
||||
- **Fixed:** Incorrect hostname (`api.ydc-index.io` → `ydc-index.io`)
|
||||
- **Fixed:** Incorrect header name (`X-API-Key` → `X-API-KEY` uppercase)
|
||||
- **Impact:** You.com now works correctly - was giving 403 Forbidden before
|
||||
- **Status:** ✅ Fully tested and working
|
||||
|
||||
## [2.4.1] - 2026-02-03
|
||||
|
||||
### 🐛 Bugfix: You.com URL Encoding
|
||||
|
||||
- **Fixed:** URL encoding for You.com queries - spaces and special characters now properly encoded
|
||||
- **Impact:** Queries with spaces (e.g., "OpenClaw AI framework") work correctly now
|
||||
- **Technical:** Added `urllib.parse.quote` for parameter encoding
|
||||
|
||||
## [2.4.0] - 2026-02-03
|
||||
|
||||
### 🆕 New Provider: You.com
|
||||
|
||||
Added You.com as the 4th search provider, optimized for RAG applications and real-time information:
|
||||
|
||||
#### Features
|
||||
- **LLM-Ready Snippets**: Pre-extracted, query-aware text excerpts perfect for feeding into AI models
|
||||
- **Unified Web + News**: Get both web pages and news articles in a single API call
|
||||
- **Live Crawling**: Fetch full page content on-demand in Markdown format (`--livecrawl`)
|
||||
- **Automatic News Classification**: Intelligently includes news results based on query intent
|
||||
- **Freshness Controls**: Filter by recency (day, week, month, year, or date range)
|
||||
- **SafeSearch Support**: Content filtering (off, moderate, strict)
|
||||
|
||||
#### Auto-Routing Signals
|
||||
New RAG/Real-time intent detection routes to You.com for:
|
||||
- RAG context queries: "summarize", "key points", "tldr", "context for"
|
||||
- Real-time info: "latest news", "current status", "right now", "what's happening"
|
||||
- Information synthesis: "updates on", "situation", "main takeaways"
|
||||
|
||||
#### Usage Examples
|
||||
```bash
|
||||
# Auto-routed
|
||||
python3 scripts/search.py -q "summarize key points about AI regulation" # → You.com
|
||||
|
||||
# Explicit
|
||||
python3 scripts/search.py -p you -q "climate change" --livecrawl all
|
||||
python3 scripts/search.py -p you -q "tech news" --freshness week
|
||||
```
|
||||
|
||||
#### Configuration
|
||||
```json
|
||||
{
|
||||
"you": {
|
||||
"country": "US",
|
||||
"language": "en",
|
||||
"safesearch": "moderate",
|
||||
"include_news": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### API Key Setup
|
||||
```bash
|
||||
export YOU_API_KEY="your-key" # Get from https://api.you.com
|
||||
```
|
||||
|
||||
### 📊 Updated Provider Comparison
|
||||
|
||||
| Feature | Serper | Tavily | Exa | You.com |
|
||||
|---------|:------:|:------:|:---:|:-------:|
|
||||
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ |
|
||||
| News Integration | ✓ | ✗ | ✗ | ✓ |
|
||||
| RAG-Optimized | ✗ | ✓ | ✗ | ✓✓ |
|
||||
| Full Page Content | ✗ | ✓ | ✓ | ✓ |
|
||||
|
||||
---
|
||||
|
||||
## [2.1.5] - 2026-01-27
|
||||
|
||||
### 📝 Documentation
|
||||
|
||||
- Added warning about NOT using Tavily/Serper/Exa in core OpenClaw config
|
||||
- Core OpenClaw only supports `brave` as the built-in provider
|
||||
- This skill's providers must be used via environment variables and scripts, not `openclaw.json`
|
||||
|
||||
## [2.1.0] - 2026-01-23
|
||||
|
||||
### 🧠 Intelligent Multi-Signal Routing
|
||||
|
||||
Completely overhauled auto-routing with sophisticated query analysis:
|
||||
|
||||
#### Intent Classification
|
||||
- **Shopping Intent**: Detects price patterns ("how much", "cost of"), purchase signals ("buy", "order"), deal keywords, and product+brand combinations
|
||||
- **Research Intent**: Identifies explanation patterns ("how does", "why does"), analysis signals ("pros and cons", "compare"), learning keywords, and complex multi-clause queries
|
||||
- **Discovery Intent**: Recognizes similarity patterns ("similar to", "alternatives"), company discovery signals, URL/domain detection, and academic patterns
|
||||
|
||||
#### Linguistic Pattern Detection
|
||||
- "How much" / "price of" → Shopping (Serper)
|
||||
- "How does" / "Why does" / "Explain" → Research (Tavily)
|
||||
- "Companies like" / "Similar to" / "Alternatives" → Discovery (Exa)
|
||||
- Product + Brand name combos → Shopping (Serper)
|
||||
- URLs and domains in query → Similar search (Exa)
|
||||
|
||||
#### Query Analysis Features
|
||||
- **Complexity scoring**: Long, multi-clause queries get routed to research providers
|
||||
- **URL detection**: Automatic detection of URLs/domains triggers Exa similar search
|
||||
- **Brand recognition**: Tech brands (Apple, Samsung, Sony, etc.) with product terms → shopping
|
||||
- **Recency signals**: "latest", "2026", "breaking" boost news mode
|
||||
|
||||
#### Confidence Scoring
|
||||
- **HIGH (70-100%)**: Strong signal match, very reliable routing
|
||||
- **MEDIUM (40-69%)**: Good match, should work well
|
||||
- **LOW (0-39%)**: Ambiguous query, using fallback provider
|
||||
- Confidence based on absolute signal strength + relative margin over alternatives
|
||||
|
||||
#### Enhanced Debug Mode
|
||||
```bash
|
||||
python3 scripts/search.py --explain-routing -q "your query"
|
||||
```
|
||||
|
||||
Now shows:
|
||||
- Routing decision with confidence level
|
||||
- All provider scores
|
||||
- Top matched signals with weights
|
||||
- Query analysis (complexity, URL detection, recency focus)
|
||||
- All matched patterns per provider
|
||||
|
||||
### 🔧 Technical Changes
|
||||
|
||||
#### QueryAnalyzer Class
|
||||
New `QueryAnalyzer` class with:
|
||||
- `SHOPPING_SIGNALS`: 25+ weighted patterns for shopping intent
|
||||
- `RESEARCH_SIGNALS`: 30+ weighted patterns for research intent
|
||||
- `DISCOVERY_SIGNALS`: 20+ weighted patterns for discovery intent
|
||||
- `LOCAL_NEWS_SIGNALS`: 25+ patterns for local/news queries
|
||||
- `BRAND_PATTERNS`: Tech brand detection regex
|
||||
|
||||
#### Signal Weighting
|
||||
- Multi-word phrases get higher weights (e.g., "how much" = 4.0 vs "price" = 3.0)
|
||||
- Strong signals: price patterns (4.0), similarity patterns (5.0), URLs (5.0)
|
||||
- Medium signals: product terms (2.5), learning keywords (2.5)
|
||||
- Bonus scoring: Product+brand combo (+3.0), complex query (+2.5)
|
||||
|
||||
#### Improved Output Format
|
||||
```json
|
||||
{
|
||||
"routing": {
|
||||
"auto_routed": true,
|
||||
"provider": "serper",
|
||||
"confidence": 0.78,
|
||||
"confidence_level": "high",
|
||||
"reason": "high_confidence_match",
|
||||
"top_signals": [{"matched": "price", "weight": 3.0}],
|
||||
"scores": {"serper": 7.0, "tavily": 0.0, "exa": 0.0}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 📚 Documentation Updates
|
||||
|
||||
- **SKILL.md**: Complete rewrite with signal tables and confidence scoring guide
|
||||
- **README.md**: Updated with intelligent routing examples and confidence levels
|
||||
- **FAQ**: Updated to explain multi-signal analysis
|
||||
|
||||
### 🧪 Test Results
|
||||
|
||||
| Query | Provider | Confidence | Signals |
|
||||
|-------|----------|------------|---------|
|
||||
| "how much does iPhone 16 cost" | Serper | 68% | "how much", brand+product |
|
||||
| "how does quantum entanglement work" | Tavily | 86% HIGH | "how does", "what are", "implications" |
|
||||
| "startups similar to Notion" | Exa | 76% HIGH | "similar to", "Series A" |
|
||||
| "companies like stripe.com" | Exa | 100% HIGH | URL detected, "companies like" |
|
||||
| "MacBook Pro M3 specs review" | Serper | 70% HIGH | brand+product, "specs", "review" |
|
||||
| "Tesla" | Serper | 0% LOW | No signals (fallback) |
|
||||
| "arxiv papers on transformers" | Exa | 58% | "arxiv" |
|
||||
| "latest AI news 2026" | Serper | 77% HIGH | "latest", "news", "2026" |
|
||||
|
||||
---
|
||||
|
||||
## [2.0.0] - 2026-01-23
|
||||
|
||||
### 🎉 Major Features
|
||||
|
||||
#### Smart Auto-Routing
|
||||
- **Automatic provider selection** based on query analysis
|
||||
- No need to manually choose provider - just search!
|
||||
- Intelligent keyword matching for routing decisions
|
||||
- Pattern detection for query types (shopping, research, discovery)
|
||||
- Scoring system for provider selection
|
||||
|
||||
#### User Configuration
|
||||
- **config.json**: Full control over auto-routing behavior
|
||||
- **Configurable keyword mappings**: Add your own routing keywords
|
||||
- **Provider priority**: Set tie-breaker order
|
||||
- **Disable providers**: Turn off providers you don't have API keys for
|
||||
- **Enable/disable auto-routing**: Opt-in or opt-out as needed
|
||||
|
||||
#### Debugging Tools
|
||||
- **--explain-routing** flag: See exactly why a provider was selected
|
||||
- Detailed routing metadata in JSON responses
|
||||
- Shows matched keywords and routing scores
|
||||
|
||||
### 📚 Documentation
|
||||
|
||||
- **README.md**: Complete auto-routing guide with examples
|
||||
- **SKILL.md**: Detailed routing logic and configuration reference
|
||||
- **FAQ section**: Common questions about auto-routing
|
||||
- **Configuration examples**: Pre-built configs for common use cases
|
||||
|
||||
---
|
||||
|
||||
## [1.0.x] - Initial Release
|
||||
|
||||
- Multi-provider search: Serper, Tavily, Exa
|
||||
- Manual provider selection with `-p` flag
|
||||
- Unified JSON output format
|
||||
- Provider-specific options (--depth, --category, --similar-url, etc.)
|
||||
- Domain filtering for Tavily/Exa
|
||||
- Date filtering for Exa
|
||||
263
FAQ.md
Normal file
263
FAQ.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Frequently Asked Questions
|
||||
|
||||
## Caching (NEW in v2.7.0!)
|
||||
|
||||
### How does caching work?
|
||||
Search results are automatically cached locally for 1 hour (3600 seconds). When you make the same query again, you get instant results at $0 API cost. The cache key is based on: query text + provider + max_results.
|
||||
|
||||
### Where are cached results stored?
|
||||
In `.cache/` directory inside the skill folder by default. Override with `WSP_CACHE_DIR` environment variable:
|
||||
```bash
|
||||
export WSP_CACHE_DIR="/path/to/custom/cache"
|
||||
```
|
||||
|
||||
### How do I see cache stats?
|
||||
```bash
|
||||
python3 scripts/search.py --cache-stats
|
||||
```
|
||||
This shows total entries, size, oldest/newest entries, and breakdown by provider.
|
||||
|
||||
### How do I clear the cache?
|
||||
```bash
|
||||
python3 scripts/search.py --clear-cache
|
||||
```
|
||||
|
||||
### Can I change the cache TTL?
|
||||
Yes! Default is 3600 seconds (1 hour). Set a custom TTL per request:
|
||||
```bash
|
||||
python3 scripts/search.py -q "query" --cache-ttl 7200 # 2 hours
|
||||
```
|
||||
|
||||
### How do I skip the cache?
|
||||
Use `--no-cache` to always fetch fresh results:
|
||||
```bash
|
||||
python3 scripts/search.py -q "query" --no-cache
|
||||
```
|
||||
|
||||
### How do I know if a result was cached?
|
||||
The response includes:
|
||||
- `"cached": true/false` — whether result came from cache
|
||||
- `"cache_age_seconds": 1234` — how old the cached result is (when cached)
|
||||
|
||||
---
|
||||
|
||||
## General
|
||||
|
||||
### How does auto-routing decide which provider to use?
|
||||
Multi-signal analysis scores each provider based on: price patterns, explanation phrases, similarity keywords, URLs, product+brand combos, and query complexity. Highest score wins. Use `--explain-routing` to see the decision breakdown.
|
||||
|
||||
### What if it picks the wrong provider?
|
||||
Override with `-p serper/tavily/exa`. Check `--explain-routing` to understand why it chose differently.
|
||||
|
||||
### What does "low confidence" mean?
|
||||
Query is ambiguous (e.g., "Tesla" could be cars, stock, or company). Falls back to Serper. Results may vary.
|
||||
|
||||
### Can I disable a provider?
|
||||
Yes! In config.json: `"disabled_providers": ["exa"]`
|
||||
|
||||
---
|
||||
|
||||
## API Keys
|
||||
|
||||
### Which API keys do I need?
|
||||
At minimum ONE key (or SearXNG instance). You can use just Serper, just Tavily, just Exa, just You.com, or just SearXNG. Missing keys = that provider is skipped.
|
||||
|
||||
### Where do I get API keys?
|
||||
- Serper: https://serper.dev (2,500 free queries, no credit card)
|
||||
- Tavily: https://tavily.com (1,000 free searches/month)
|
||||
- Exa: https://exa.ai (1,000 free searches/month)
|
||||
- You.com: https://api.you.com (Limited free tier for testing)
|
||||
- SearXNG: Self-hosted, no key needed! https://docs.searxng.org/admin/installation.html
|
||||
|
||||
### How do I set API keys?
|
||||
Two options (both auto-load):
|
||||
|
||||
**Option A: .env file**
|
||||
```bash
|
||||
export SERPER_API_KEY="your-key"
|
||||
```
|
||||
|
||||
**Option B: config.json** (v2.2.1+)
|
||||
```json
|
||||
{ "serper": { "api_key": "your-key" } }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Routing Details
|
||||
|
||||
### How do I know which provider handled my search?
|
||||
Check `routing.provider` in JSON output, or `[🔍 Searched with: Provider]` in chat responses.
|
||||
|
||||
### Why does it sometimes choose Serper for research questions?
|
||||
If the query has brand/product signals (e.g., "how does Tesla FSD work"), shopping intent may outweigh research intent. Override with `-p tavily`.
|
||||
|
||||
### What's the confidence threshold?
|
||||
Default: 0.3 (30%). Below this = low confidence, uses fallback. Adjustable in config.json.
|
||||
|
||||
---
|
||||
|
||||
## You.com Specific
|
||||
|
||||
### When should I use You.com over other providers?
|
||||
You.com excels at:
|
||||
- **RAG applications**: Pre-extracted snippets ready for LLM consumption
|
||||
- **Real-time information**: Current events, breaking news, status updates
|
||||
- **Combined sources**: Web + news results in a single API call
|
||||
- **Summarization tasks**: "What's the latest on...", "Key points about..."
|
||||
|
||||
### What's the livecrawl feature?
|
||||
You.com can fetch full page content on-demand. Use `--livecrawl web` for web results, `--livecrawl news` for news articles, or `--livecrawl all` for both. Content is returned in Markdown format.
|
||||
|
||||
### Does You.com include news automatically?
|
||||
Yes! You.com's intelligent classification automatically includes relevant news results when your query has news intent. You can also use `--include-news` to explicitly enable it.
|
||||
|
||||
---
|
||||
|
||||
## SearXNG Specific
|
||||
|
||||
### Do I need my own SearXNG instance?
|
||||
Yes! SearXNG is self-hosted. Most public instances disable the JSON API to prevent bot abuse. You need to run your own instance with JSON format enabled. See: https://docs.searxng.org/admin/installation.html
|
||||
|
||||
### How do I set up SearXNG?
|
||||
Docker is the easiest way:
|
||||
```bash
|
||||
docker run -d -p 8080:8080 searxng/searxng
|
||||
```
|
||||
Then enable JSON in `settings.yml`:
|
||||
```yaml
|
||||
search:
|
||||
formats:
|
||||
- html
|
||||
- json
|
||||
```
|
||||
|
||||
### Why am I getting "403 Forbidden"?
|
||||
The JSON API is disabled on your instance. Enable it in `settings.yml` under `search.formats`.
|
||||
|
||||
### What's the API cost for SearXNG?
|
||||
**$0!** SearXNG is free and open-source. You only pay for hosting (~$5/month VPS). Unlimited queries.
|
||||
|
||||
### When should I use SearXNG?
|
||||
- **Privacy-sensitive queries**: No tracking, no profiling
|
||||
- **Budget-conscious**: $0 API cost
|
||||
- **Diverse results**: Aggregates 70+ search engines
|
||||
- **Self-hosted requirements**: Full control over your search infrastructure
|
||||
- **Fallback provider**: When paid APIs are rate-limited
|
||||
|
||||
### Can I limit which search engines SearXNG uses?
|
||||
Yes! Use `--engines google,bing,duckduckgo` to specify engines, or configure defaults in `config.json`.
|
||||
|
||||
---
|
||||
|
||||
## Provider Selection
|
||||
|
||||
### Which provider should I use?
|
||||
|
||||
| Query Type | Best Provider | Why |
|
||||
|------------|---------------|-----|
|
||||
| **Shopping** ("buy laptop", "cheap shoes") | **Serper** | Google Shopping, price comparisons, local stores |
|
||||
| **Research** ("how does X work?", "explain Y") | **Tavily** | Deep research, academic quality, full-page content |
|
||||
| **Startups/Papers** ("companies like X", "arxiv papers") | **Exa** | Semantic/neural search, startup discovery |
|
||||
| **RAG/Real-time** ("summarize latest", "current events") | **You.com** | LLM-ready snippets, combined web+news |
|
||||
| **Privacy** ("search without tracking") | **SearXNG** | No tracking, multi-source, self-hosted |
|
||||
|
||||
**Tip:** Enable auto-routing and let the skill choose automatically! 🎯
|
||||
|
||||
### Do I need all 5 providers?
|
||||
**No!** All providers are optional. You can use:
|
||||
- **1 provider** (e.g., just Serper for everything)
|
||||
- **2-3 providers** (e.g., Serper + You.com for most needs)
|
||||
- **All 5** (maximum flexibility + fallback options)
|
||||
|
||||
### How much do the APIs cost?
|
||||
|
||||
| Provider | Free Tier | Paid Plan |
|
||||
|----------|-----------|-----------|
|
||||
| **Serper** | 2,500 queries/mo | $50/mo (5,000 queries) |
|
||||
| **Tavily** | 1,000 queries/mo | $150/mo (10,000 queries) |
|
||||
| **Exa** | 1,000 queries/mo | $1,000/mo (100,000 queries) |
|
||||
| **You.com** | Limited free | ~$10/mo (varies by usage) |
|
||||
| **SearXNG** | **FREE** ✅ | Only VPS cost (~$5/mo if self-hosting) |
|
||||
|
||||
**Budget tip:** Use SearXNG as primary + others as fallback for specialized queries!
|
||||
|
||||
### How private is SearXNG really?
|
||||
|
||||
| Setup | Privacy Level |
|
||||
|-------|---------------|
|
||||
| **Self-hosted (your VPS)** | ⭐⭐⭐⭐⭐ You control everything |
|
||||
| **Self-hosted (Docker local)** | ⭐⭐⭐⭐⭐ Fully private |
|
||||
| **Public instance** | ⭐⭐⭐ Depends on operator's logging policy |
|
||||
|
||||
**Best practice:** Self-host if privacy is critical.
|
||||
|
||||
### Which provider has the best results?
|
||||
|
||||
| Metric | Winner |
|
||||
|--------|--------|
|
||||
| **Most accurate for facts** | Serper (Google) |
|
||||
| **Best for research depth** | Tavily |
|
||||
| **Best for semantic queries** | Exa |
|
||||
| **Best for RAG/AI context** | You.com |
|
||||
| **Most diverse sources** | SearXNG (70+ engines) |
|
||||
| **Most private** | SearXNG (self-hosted) |
|
||||
|
||||
**Recommendation:** Enable multiple providers + auto-routing for best overall experience.
|
||||
|
||||
### How does auto-routing work?
|
||||
The skill analyzes your query for keywords and patterns:
|
||||
|
||||
```python
|
||||
"buy cheap laptop" → Serper (shopping signals)
|
||||
"how does AI work?" → Tavily (research/explanation)
|
||||
"companies like X" → Exa (semantic/similar)
|
||||
"summarize latest news" → You.com (RAG/real-time)
|
||||
"search privately" → SearXNG (privacy signals)
|
||||
```
|
||||
|
||||
**Confidence threshold:** Only routes if confidence > 30%. Otherwise uses default provider.
|
||||
|
||||
**Override:** Use `-p provider` to force a specific provider.
|
||||
|
||||
---
|
||||
|
||||
## Production Use
|
||||
|
||||
### Can I use this in production?
|
||||
**Yes!** Web-search-plus is production-ready:
|
||||
- ✅ Error handling with automatic fallback
|
||||
- ✅ Rate limit protection
|
||||
- ✅ Timeout handling (30s per provider)
|
||||
- ✅ API key security (.env + config.json gitignored)
|
||||
- ✅ 5 providers for redundancy
|
||||
|
||||
**Tip:** Monitor API usage to avoid exceeding free tiers!
|
||||
|
||||
### What if I run out of API credits?
|
||||
1. **Fallback chain:** Other enabled providers automatically take over
|
||||
2. **Use SearXNG:** Switch to self-hosted (unlimited queries)
|
||||
3. **Upgrade plan:** Paid tiers have higher limits
|
||||
4. **Rate limit:** Use `disabled_providers` to skip exhausted APIs temporarily
|
||||
|
||||
---
|
||||
|
||||
## Updates
|
||||
|
||||
### How do I update to the latest version?
|
||||
|
||||
**Via ClawHub (recommended):**
|
||||
```bash
|
||||
clawhub update web-search-plus --registry "https://www.clawhub.ai" --no-input
|
||||
```
|
||||
|
||||
**Manually:**
|
||||
```bash
|
||||
cd /path/to/workspace/skills/web-search-plus/
|
||||
git pull origin main
|
||||
python3 scripts/setup.py # Re-run to configure new features
|
||||
```
|
||||
|
||||
### Where can I report bugs or request features?
|
||||
- **GitHub Issues:** https://github.com/robbyczgw-cla/web-search-plus/issues
|
||||
- **ClawHub:** https://www.clawhub.ai/skills/web-search-plus
|
||||
800
README.md
Normal file
800
README.md
Normal file
@@ -0,0 +1,800 @@
|
||||
# Web Search Plus
|
||||
|
||||
> Unified multi-provider web search with **Intelligent Auto-Routing** — uses multi-signal analysis to automatically select between **Serper**, **Tavily**, **Querit**, **Exa**, **Perplexity (Sonar Pro)**, **You.com**, and **SearXNG** with confidence scoring.
|
||||
|
||||
[](https://clawhub.ai)
|
||||
[](https://clawhub.ai)
|
||||
[](https://github.com/robbyczgw-cla/web-search-plus)
|
||||
|
||||
---
|
||||
|
||||
## 🧠 Features (v2.9.0)
|
||||
|
||||
**Intelligent Multi-Signal Routing** — The skill uses sophisticated query analysis:
|
||||
|
||||
- **Intent Classification**: Shopping vs Research vs Discovery vs RAG/Real-time vs Privacy
|
||||
- **Linguistic Patterns**: "how much" (price) vs "how does" (research) vs "privately" (privacy)
|
||||
- **Entity Detection**: Product+brand combos, URLs, domains
|
||||
- **Complexity Analysis**: Long queries favor research providers
|
||||
- **Confidence Scoring**: Know how reliable the routing decision is
|
||||
|
||||
```bash
|
||||
python3 scripts/search.py -q "how much does iPhone 16 cost" # → Serper (68% confidence)
|
||||
python3 scripts/search.py -q "how does quantum entanglement work" # → Tavily (86% HIGH)
|
||||
python3 scripts/search.py -q "startups similar to Notion" # → Exa (76% HIGH)
|
||||
python3 scripts/search.py -q "companies like stripe.com" # → Exa (100% HIGH - URL detected)
|
||||
python3 scripts/search.py -q "summarize key points on AI" # → You.com (68% MEDIUM - RAG intent)
|
||||
python3 scripts/search.py -q "search privately without tracking" # → SearXNG (74% HIGH - privacy intent)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 When to Use Which Provider
|
||||
|
||||
### Built-in Brave Search (OpenClaw default)
|
||||
- ✅ General web searches
|
||||
- ✅ Privacy-focused
|
||||
- ✅ Quick lookups
|
||||
- ✅ Default fallback
|
||||
|
||||
### Serper (Google Results)
|
||||
- 🛍 **Product specs, prices, shopping**
|
||||
- 📍 **Local businesses, places**
|
||||
- 🎯 **"Google it" - explicit Google results**
|
||||
- 📰 **Shopping/images needed**
|
||||
- 🏆 **Knowledge Graph data**
|
||||
|
||||
### Tavily (AI-Optimized Research)
|
||||
- 📚 **Research questions, deep dives**
|
||||
- 🔬 **Complex multi-part queries**
|
||||
- 📄 **Need full page content** (not just snippets)
|
||||
- 🎓 **Academic/technical research**
|
||||
- 🔒 **Domain filtering** (trusted sources)
|
||||
|
||||
### Querit (Multilingual AI Search)
|
||||
- 🌏 **Multilingual AI search** across 10+ languages
|
||||
- ⚡ **Fast real-time answers** with ~400ms latency
|
||||
- 🗺️ **International / cross-language queries**
|
||||
- 📰 **Recency-aware results** for current information
|
||||
- 🤖 **Good fit for AI workflows** with clean metadata
|
||||
|
||||
### Exa (Neural Semantic Search)
|
||||
- 🔗 **Find similar pages**
|
||||
- 🏢 **Company/startup discovery**
|
||||
- 📝 **Research papers**
|
||||
- 💻 **GitHub projects**
|
||||
- 📅 **Date-specific content**
|
||||
|
||||
### Perplexity (Sonar Pro via Kilo Gateway)
|
||||
- ⚡ **Direct answers** (great for “who/what/define”)
|
||||
- 🧾 **Cited, answer-first output**
|
||||
- 🕒 **Current events / “as of” questions**
|
||||
- 🔑 Auth via `KILOCODE_API_KEY` (routes to `https://api.kilo.ai`)
|
||||
|
||||
### You.com (RAG/Real-time)
|
||||
- 🤖 **RAG applications** (LLM-ready snippets)
|
||||
- 📰 **Combined web + news** (single API call)
|
||||
- ⚡ **Real-time information** (current events)
|
||||
- 📋 **Summarization context** ("What's the latest...")
|
||||
- 🔄 **Live crawling** (full page content on demand)
|
||||
|
||||
### SearXNG (Privacy-First/Self-Hosted)
|
||||
- 🔒 **Privacy-preserving search** (no tracking)
|
||||
- 🌐 **Multi-source aggregation** (70+ engines)
|
||||
- 💰 **$0 API cost** (self-hosted)
|
||||
- 🎯 **Diverse perspectives** (results from multiple engines)
|
||||
- 🏠 **Self-hosted environments** (full control)
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Quick Start](#quick-start)
|
||||
- [Smart Auto-Routing](#smart-auto-routing)
|
||||
- [Configuration Guide](#configuration-guide)
|
||||
- [Provider Deep Dives](#provider-deep-dives)
|
||||
- [Usage Examples](#usage-examples)
|
||||
- [Workflow Examples](#workflow-examples)
|
||||
- [Optimization Tips](#optimization-tips)
|
||||
- [FAQ & Troubleshooting](#faq--troubleshooting)
|
||||
- [API Reference](#api-reference)
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
```bash
|
||||
# Run the setup wizard - it guides you through everything
|
||||
python3 scripts/setup.py
|
||||
```
|
||||
|
||||
The wizard explains each provider, collects your API keys, and creates `config.json` automatically.
|
||||
|
||||
### Option B: Manual Setup
|
||||
|
||||
```bash
|
||||
# 1. Set up at least one API key (or SearXNG instance)
|
||||
export SERPER_API_KEY="your-key" # https://serper.dev
|
||||
export TAVILY_API_KEY="your-key" # https://tavily.com
|
||||
export QUERIT_API_KEY="your-key" # https://querit.ai
|
||||
export EXA_API_KEY="your-key" # https://exa.ai
|
||||
export KILOCODE_API_KEY="your-key" # enables Perplexity Sonar Pro via https://api.kilo.ai
|
||||
export YOU_API_KEY="your-key" # https://api.you.com
|
||||
export SEARXNG_INSTANCE_URL="https://your-instance.example.com" # Self-hosted
|
||||
|
||||
# 2. Run a search (auto-routed!)
|
||||
python3 scripts/search.py -q "best laptop 2024"
|
||||
```
|
||||
|
||||
### Run a Search
|
||||
|
||||
```bash
|
||||
# Auto-routed to best provider
|
||||
python3 scripts/search.py -q "best laptop 2024"
|
||||
|
||||
# Or specify a provider explicitly
|
||||
python3 scripts/search.py -p serper -q "iPhone 16 specs"
|
||||
python3 scripts/search.py -p tavily -q "quantum computing explained" --depth advanced
|
||||
python3 scripts/search.py -p querit -q "latest AI policy updates in Germany"
|
||||
python3 scripts/search.py -p exa -q "AI startups 2024" --category company
|
||||
python3 scripts/search.py -p perplexity -q "Who is the president of Austria?"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Smart Auto-Routing
|
||||
|
||||
### How It Works
|
||||
|
||||
When you don't specify a provider, the skill analyzes your query and routes it to the best provider:
|
||||
|
||||
| Query Contains | Routes To | Example |
|
||||
|---------------|-----------|---------|
|
||||
| "price", "buy", "shop", "cost" | **Serper** | "iPhone 16 price" |
|
||||
| "near me", "restaurant", "hotel" | **Serper** | "pizza near me" |
|
||||
| "weather", "news", "latest" | **Serper** | "weather Berlin" |
|
||||
| "how does", "explain", "what is" | **Tavily** | "how does TCP work" |
|
||||
| "research", "study", "analyze" | **Tavily** | "climate research" |
|
||||
| "tutorial", "guide", "learn" | **Tavily** | "python tutorial" |
|
||||
| multilingual, current status, latest updates | **Querit** | "latest AI policy updates in Germany" |
|
||||
| "similar to", "companies like" | **Exa** | "companies like Stripe" |
|
||||
| "startup", "Series A" | **Exa** | "AI startups Series A" |
|
||||
| "github", "research paper" | **Exa** | "LLM papers arxiv" |
|
||||
| "private", "anonymous", "no tracking" | **SearXNG** | "search privately" |
|
||||
| "multiple sources", "aggregate" | **SearXNG** | "results from all engines" |
|
||||
|
||||
### Examples
|
||||
|
||||
```bash
|
||||
# These are all auto-routed to the optimal provider:
|
||||
python3 scripts/search.py -q "MacBook Pro M3 price" # → Serper
|
||||
python3 scripts/search.py -q "how does HTTPS work" # → Tavily
|
||||
python3 scripts/search.py -q "latest AI policy updates in Germany" # → Querit
|
||||
python3 scripts/search.py -q "startups like Notion" # → Exa
|
||||
python3 scripts/search.py -q "best sushi restaurant near me" # → Serper
|
||||
python3 scripts/search.py -q "explain attention mechanism" # → Tavily
|
||||
python3 scripts/search.py -q "alternatives to Figma" # → Exa
|
||||
python3 scripts/search.py -q "search privately without tracking" # → SearXNG
|
||||
```
|
||||
|
||||
### Result Caching (introduced in v2.7.x)
|
||||
|
||||
Search results are **automatically cached** for 1 hour to save API costs:
|
||||
|
||||
```bash
|
||||
# First request: fetches from API ($)
|
||||
python3 scripts/search.py -q "AI startups 2024"
|
||||
|
||||
# Second request: uses cache (FREE!)
|
||||
python3 scripts/search.py -q "AI startups 2024"
|
||||
# Output includes: "cached": true
|
||||
|
||||
# Bypass cache (force fresh results)
|
||||
python3 scripts/search.py -q "AI startups 2024" --no-cache
|
||||
|
||||
# View cache stats
|
||||
python3 scripts/search.py --cache-stats
|
||||
|
||||
# Clear all cached results
|
||||
python3 scripts/search.py --clear-cache
|
||||
|
||||
# Custom TTL (in seconds, default: 3600 = 1 hour)
|
||||
python3 scripts/search.py -q "query" --cache-ttl 7200
|
||||
```
|
||||
|
||||
**Cache location:** `.cache/` in skill directory (override with `WSP_CACHE_DIR` environment variable)
|
||||
|
||||
### Debug Auto-Routing
|
||||
|
||||
See exactly why a provider was selected:
|
||||
|
||||
```bash
|
||||
python3 scripts/search.py --explain-routing -q "best laptop to buy"
|
||||
```
|
||||
|
||||
Output:
|
||||
```json
|
||||
{
|
||||
"query": "best laptop to buy",
|
||||
"selected_provider": "serper",
|
||||
"reason": "matched_keywords (score=2)",
|
||||
"matched_keywords": ["buy", "best"],
|
||||
"available_providers": ["serper", "tavily", "exa"]
|
||||
}
|
||||
```
|
||||
|
||||
### Routing Info in Results
|
||||
|
||||
Every search result includes routing information:
|
||||
|
||||
```json
|
||||
{
|
||||
"provider": "serper",
|
||||
"query": "iPhone 16 price",
|
||||
"results": [...],
|
||||
"routing": {
|
||||
"auto_routed": true,
|
||||
"selected_provider": "serper",
|
||||
"reason": "matched_keywords (score=1)",
|
||||
"matched_keywords": ["price"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Guide
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file or set these in your shell:
|
||||
|
||||
```bash
|
||||
# Required: Set at least one
|
||||
export SERPER_API_KEY="your-serper-key"
|
||||
export TAVILY_API_KEY="your-tavily-key"
|
||||
export EXA_API_KEY="your-exa-key"
|
||||
```
|
||||
|
||||
### Config File (config.json)
|
||||
|
||||
The `config.json` file lets you customize auto-routing and provider defaults:
|
||||
|
||||
```json
|
||||
{
|
||||
"defaults": {
|
||||
"provider": "serper",
|
||||
"max_results": 5
|
||||
},
|
||||
|
||||
"auto_routing": {
|
||||
"enabled": true,
|
||||
"fallback_provider": "serper",
|
||||
"provider_priority": ["serper", "tavily", "exa"],
|
||||
"disabled_providers": [],
|
||||
"keyword_mappings": {
|
||||
"serper": ["price", "buy", "shop", "cost", "deal", "near me", "weather"],
|
||||
"tavily": ["how does", "explain", "research", "what is", "tutorial"],
|
||||
"exa": ["similar to", "companies like", "alternatives", "startup", "github"]
|
||||
}
|
||||
},
|
||||
|
||||
"serper": {
|
||||
"country": "us",
|
||||
"language": "en"
|
||||
},
|
||||
|
||||
"tavily": {
|
||||
"depth": "basic",
|
||||
"topic": "general"
|
||||
},
|
||||
|
||||
"exa": {
|
||||
"type": "neural"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration Examples
|
||||
|
||||
#### Example 1: Disable Exa (Only Use Serper + Tavily)
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"disabled_providers": ["exa"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Example 2: Make Tavily the Default
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"fallback_provider": "tavily"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Example 3: Add Custom Keywords
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"keyword_mappings": {
|
||||
"serper": [
|
||||
"price", "buy", "shop", "amazon", "ebay", "walmart",
|
||||
"deal", "discount", "coupon", "sale", "cheap"
|
||||
],
|
||||
"tavily": [
|
||||
"how does", "explain", "research", "what is",
|
||||
"coursera", "udemy", "learn", "course", "certification"
|
||||
],
|
||||
"exa": [
|
||||
"similar to", "companies like", "competitors",
|
||||
"YC company", "funded startup", "Series A", "Series B"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Example 4: German Locale for Serper
|
||||
|
||||
```json
|
||||
{
|
||||
"serper": {
|
||||
"country": "de",
|
||||
"language": "de"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Example 5: Disable Auto-Routing
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"enabled": false
|
||||
},
|
||||
"defaults": {
|
||||
"provider": "serper"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Example 6: Research-Heavy Config
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"fallback_provider": "tavily",
|
||||
"provider_priority": ["tavily", "serper", "exa"]
|
||||
},
|
||||
"tavily": {
|
||||
"depth": "advanced",
|
||||
"include_raw_content": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Provider Deep Dives
|
||||
|
||||
### Serper (Google Search API)
|
||||
|
||||
**What it is:** Direct access to Google Search results via API — the same results you'd see on google.com.
|
||||
|
||||
#### Strengths
|
||||
| Strength | Description |
|
||||
|----------|-------------|
|
||||
| 🎯 **Accuracy** | Google's search quality, knowledge graph, featured snippets |
|
||||
| 🛒 **Shopping** | Product prices, reviews, shopping results |
|
||||
| 📍 **Local** | Business listings, maps, places |
|
||||
| 📰 **News** | Real-time news with Google News integration |
|
||||
| 🖼 **Images** | Google Images search |
|
||||
| ⚡ **Speed** | Fastest response times (~200-400ms) |
|
||||
|
||||
#### Best Use Cases
|
||||
- ✅ Product specifications and comparisons
|
||||
- ✅ Shopping and price lookups
|
||||
- ✅ Local business searches ("restaurants near me")
|
||||
- ✅ Quick factual queries (weather, conversions, definitions)
|
||||
- ✅ News headlines and current events
|
||||
- ✅ Image searches
|
||||
- ✅ When you need "what Google shows"
|
||||
|
||||
#### Getting Your API Key
|
||||
1. Go to [serper.dev](https://serper.dev)
|
||||
2. Sign up with email or Google
|
||||
3. Copy your API key from the dashboard
|
||||
4. Set `SERPER_API_KEY` environment variable
|
||||
|
||||
---
|
||||
|
||||
### Tavily (Research Search)
|
||||
|
||||
**What it is:** AI-optimized search engine built for research and RAG applications — returns synthesized answers plus full content.
|
||||
|
||||
#### Strengths
|
||||
| Strength | Description |
|
||||
|----------|-------------|
|
||||
| 📚 **Research Quality** | Optimized for comprehensive, accurate research |
|
||||
| 💬 **AI Answers** | Returns synthesized answers, not just links |
|
||||
| 📄 **Full Content** | Can return complete page content (raw_content) |
|
||||
| 🎯 **Domain Filtering** | Include/exclude specific domains |
|
||||
| 🔬 **Deep Mode** | Advanced search for thorough research |
|
||||
| 📰 **Topic Modes** | Specialized for general vs news content |
|
||||
|
||||
#### Best Use Cases
|
||||
- ✅ Research questions requiring synthesized answers
|
||||
- ✅ Academic or technical deep dives
|
||||
- ✅ When you need actual page content (not just snippets)
|
||||
- ✅ Multi-source information comparison
|
||||
- ✅ Domain-specific research (filter to authoritative sources)
|
||||
- ✅ News research with context
|
||||
- ✅ RAG/LLM applications
|
||||
|
||||
#### Getting Your API Key
|
||||
1. Go to [tavily.com](https://tavily.com)
|
||||
2. Sign up and verify email
|
||||
3. Navigate to API Keys section
|
||||
4. Generate and copy your key
|
||||
5. Set `TAVILY_API_KEY` environment variable
|
||||
|
||||
---
|
||||
|
||||
### Exa (Neural Search)
|
||||
|
||||
**What it is:** Neural/semantic search engine that understands meaning, not just keywords — finds conceptually similar content.
|
||||
|
||||
#### Strengths
|
||||
| Strength | Description |
|
||||
|----------|-------------|
|
||||
| 🧠 **Semantic Understanding** | Finds results by meaning, not keywords |
|
||||
| 🔗 **Similar Pages** | Find pages similar to a reference URL |
|
||||
| 🏢 **Company Discovery** | Excellent for finding startups, companies |
|
||||
| 📑 **Category Filters** | Filter by type (company, paper, tweet, etc.) |
|
||||
| 📅 **Date Filtering** | Precise date range searches |
|
||||
| 🎓 **Academic** | Great for research papers and technical content |
|
||||
|
||||
#### Best Use Cases
|
||||
- ✅ Conceptual queries ("companies building X")
|
||||
- ✅ Finding similar companies or pages
|
||||
- ✅ Startup and company discovery
|
||||
- ✅ Research paper discovery
|
||||
- ✅ Finding GitHub projects
|
||||
- ✅ Date-filtered searches for recent content
|
||||
- ✅ When keyword matching fails
|
||||
|
||||
#### Getting Your API Key
|
||||
1. Go to [exa.ai](https://exa.ai)
|
||||
2. Sign up with email or Google
|
||||
3. Navigate to API section in dashboard
|
||||
4. Copy your API key
|
||||
5. Set `EXA_API_KEY` environment variable
|
||||
|
||||
---
|
||||
|
||||
### SearXNG (Privacy-First Meta-Search)
|
||||
|
||||
**What it is:** Open-source, self-hosted meta-search engine that aggregates results from 70+ search engines without tracking.
|
||||
|
||||
#### Strengths
|
||||
| Strength | Description |
|
||||
|----------|-------------|
|
||||
| 🔒 **Privacy-First** | No tracking, no profiling, no data collection |
|
||||
| 🌐 **Multi-Engine** | Aggregates Google, Bing, DuckDuckGo, and 70+ more |
|
||||
| 💰 **Free** | $0 API cost (self-hosted, unlimited queries) |
|
||||
| 🎯 **Diverse Results** | Get perspectives from multiple search engines |
|
||||
| ⚙ **Customizable** | Choose which engines to use, SafeSearch, language |
|
||||
| 🏠 **Self-Hosted** | Full control over your search infrastructure |
|
||||
|
||||
#### Best Use Cases
|
||||
- ✅ Privacy-sensitive searches (no tracking)
|
||||
- ✅ When you want diverse results from multiple engines
|
||||
- ✅ Budget-conscious (no API fees)
|
||||
- ✅ Self-hosted/air-gapped environments
|
||||
- ✅ Fallback when paid APIs are rate-limited
|
||||
- ✅ When "aggregate everything" is the goal
|
||||
|
||||
#### Setting Up Your Instance
|
||||
```bash
|
||||
# Docker (recommended, 5 minutes)
|
||||
docker run -d -p 8080:8080 searxng/searxng
|
||||
|
||||
# Enable JSON API in settings.yml:
|
||||
# search:
|
||||
# formats: [html, json]
|
||||
```
|
||||
|
||||
1. See [docs.searxng.org](https://docs.searxng.org/admin/installation.html)
|
||||
2. Deploy via Docker, pip, or your preferred method
|
||||
3. Enable JSON format in `settings.yml`
|
||||
4. Set `SEARXNG_INSTANCE_URL` environment variable
|
||||
|
||||
---
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Auto-Routed Searches (Recommended)
|
||||
|
||||
```bash
|
||||
# Just search — the skill picks the best provider
|
||||
python3 scripts/search.py -q "Tesla Model 3 price"
|
||||
python3 scripts/search.py -q "how do neural networks learn"
|
||||
python3 scripts/search.py -q "YC startups like Stripe"
|
||||
python3 scripts/search.py -q "search privately without tracking"
|
||||
```
|
||||
|
||||
### Serper Options
|
||||
|
||||
```bash
|
||||
# Different search types
|
||||
python3 scripts/search.py -p serper -q "gaming monitor" --type shopping
|
||||
python3 scripts/search.py -p serper -q "coffee shop" --type places
|
||||
python3 scripts/search.py -p serper -q "AI news" --type news
|
||||
|
||||
# With time filter
|
||||
python3 scripts/search.py -p serper -q "OpenAI news" --time-range day
|
||||
|
||||
# Include images
|
||||
python3 scripts/search.py -p serper -q "iPhone 16 Pro" --images
|
||||
|
||||
# Different locale
|
||||
python3 scripts/search.py -p serper -q "Wetter Wien" --country at --language de
|
||||
```
|
||||
|
||||
### Tavily Options
|
||||
|
||||
```bash
|
||||
# Deep research mode
|
||||
python3 scripts/search.py -p tavily -q "quantum computing applications" --depth advanced
|
||||
|
||||
# With full page content
|
||||
python3 scripts/search.py -p tavily -q "transformer architecture" --raw-content
|
||||
|
||||
# Domain filtering
|
||||
python3 scripts/search.py -p tavily -q "AI research" --include-domains arxiv.org nature.com
|
||||
```
|
||||
|
||||
### Exa Options
|
||||
|
||||
```bash
|
||||
# Category filtering
|
||||
python3 scripts/search.py -p exa -q "AI startups Series A" --category company
|
||||
python3 scripts/search.py -p exa -q "attention mechanism" --category "research paper"
|
||||
|
||||
# Date filtering
|
||||
python3 scripts/search.py -p exa -q "YC companies" --start-date 2024-01-01
|
||||
|
||||
# Find similar pages
|
||||
python3 scripts/search.py -p exa --similar-url "https://stripe.com" --category company
|
||||
```
|
||||
|
||||
### SearXNG Options
|
||||
|
||||
```bash
|
||||
# Basic search
|
||||
python3 scripts/search.py -p searxng -q "linux distros"
|
||||
|
||||
# Specific engines only
|
||||
python3 scripts/search.py -p searxng -q "AI news" --engines "google,bing,duckduckgo"
|
||||
|
||||
# SafeSearch (0=off, 1=moderate, 2=strict)
|
||||
python3 scripts/search.py -p searxng -q "privacy tools" --searxng-safesearch 2
|
||||
|
||||
# With time filter
|
||||
python3 scripts/search.py -p searxng -q "open source projects" --time-range week
|
||||
|
||||
# Custom instance URL
|
||||
python3 scripts/search.py -p searxng -q "test" --searxng-url "http://localhost:8080"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow Examples
|
||||
|
||||
### 🛒 Product Research Workflow
|
||||
|
||||
```bash
|
||||
# Step 1: Get product specs (auto-routed to Serper)
|
||||
python3 scripts/search.py -q "MacBook Pro M3 Max specs"
|
||||
|
||||
# Step 2: Check prices (auto-routed to Serper)
|
||||
python3 scripts/search.py -q "MacBook Pro M3 Max price comparison"
|
||||
|
||||
# Step 3: In-depth reviews (auto-routed to Tavily)
|
||||
python3 scripts/search.py -q "detailed MacBook Pro M3 Max review"
|
||||
```
|
||||
|
||||
### 📚 Academic Research Workflow
|
||||
|
||||
```bash
|
||||
# Step 1: Understand the topic (auto-routed to Tavily)
|
||||
python3 scripts/search.py -q "explain transformer architecture in deep learning"
|
||||
|
||||
# Step 2: Find recent papers (Exa)
|
||||
python3 scripts/search.py -p exa -q "transformer improvements" --category "research paper" --start-date 2024-01-01
|
||||
|
||||
# Step 3: Find implementations (Exa)
|
||||
python3 scripts/search.py -p exa -q "transformer implementation" --category github
|
||||
```
|
||||
|
||||
### 🏢 Competitive Analysis Workflow
|
||||
|
||||
```bash
|
||||
# Step 1: Find competitors (auto-routed to Exa)
|
||||
python3 scripts/search.py -q "companies like Notion"
|
||||
|
||||
# Step 2: Find similar products (Exa)
|
||||
python3 scripts/search.py -p exa --similar-url "https://notion.so" --category company
|
||||
|
||||
# Step 3: Deep dive comparison (Tavily)
|
||||
python3 scripts/search.py -p tavily -q "Notion vs Coda comparison" --depth advanced
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Optimization Tips
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
| Tip | Savings |
|
||||
|-----|---------|
|
||||
| Use SearXNG for routine queries | **$0 API cost** |
|
||||
| Use auto-routing (defaults to Serper, cheapest paid) | Best value |
|
||||
| Use Tavily `basic` before `advanced` | ~50% cost reduction |
|
||||
| Set appropriate `max_results` | Linear cost savings |
|
||||
| Use Exa only for semantic queries | Avoid waste |
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
| Tip | Impact |
|
||||
|-----|--------|
|
||||
| Serper is fastest (~200ms) | Use for time-sensitive queries |
|
||||
| Tavily `basic` faster than `advanced` | ~2x faster |
|
||||
| Lower `max_results` = faster response | Linear improvement |
|
||||
|
||||
---
|
||||
|
||||
## FAQ & Troubleshooting
|
||||
|
||||
### General Questions
|
||||
|
||||
**Q: Do I need API keys for all three providers?**
|
||||
> No. You only need keys for providers you want to use. Auto-routing skips providers without keys.
|
||||
|
||||
**Q: Which provider should I start with?**
|
||||
> Serper — it's the fastest, cheapest, and has the largest free tier (2,500 queries).
|
||||
|
||||
**Q: Can I use multiple providers in one workflow?**
|
||||
> Yes! That's the recommended approach. See [Workflow Examples](#workflow-examples).
|
||||
|
||||
**Q: How do I reduce API costs?**
|
||||
> Use auto-routing (defaults to cheapest), start with lower `max_results`, use Tavily `basic` before `advanced`.
|
||||
|
||||
### Auto-Routing Questions
|
||||
|
||||
**Q: Why did my query go to the wrong provider?**
|
||||
> Use `--explain-routing` to debug. Add custom keywords to config.json if needed.
|
||||
|
||||
**Q: Can I add my own keywords?**
|
||||
> Yes! Edit `config.json` → `auto_routing.keyword_mappings`.
|
||||
|
||||
**Q: How does keyword scoring work?**
|
||||
> Multi-word phrases get higher weights. "companies like" (2 words) scores higher than "like" (1 word).
|
||||
|
||||
**Q: What if no keywords match?**
|
||||
> Uses the fallback provider (default: Serper).
|
||||
|
||||
**Q: Can I force a specific provider?**
|
||||
> Yes, use `-p serper`, `-p tavily`, or `-p exa`.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**Error: "Missing API key"**
|
||||
```bash
|
||||
# Check if key is set
|
||||
echo $SERPER_API_KEY
|
||||
|
||||
# Set it
|
||||
export SERPER_API_KEY="your-key"
|
||||
```
|
||||
|
||||
**Error: "API Error (401)"**
|
||||
> Your API key is invalid or expired. Generate a new one.
|
||||
|
||||
**Error: "API Error (429)"**
|
||||
> Rate limited. Wait and retry, or upgrade your plan.
|
||||
|
||||
**Empty results?**
|
||||
> Try a different provider, broaden your query, or remove restrictive filters.
|
||||
|
||||
**Slow responses?**
|
||||
> Reduce `max_results`, use Tavily `basic`, or use Serper (fastest).
|
||||
|
||||
---
|
||||
|
||||
## API Reference
|
||||
|
||||
### Output Format
|
||||
|
||||
All providers return unified JSON:
|
||||
|
||||
```json
|
||||
{
|
||||
"provider": "serper|tavily|exa",
|
||||
"query": "original search query",
|
||||
"results": [
|
||||
{
|
||||
"title": "Page Title",
|
||||
"url": "https://example.com/page",
|
||||
"snippet": "Content excerpt...",
|
||||
"score": 0.95,
|
||||
"date": "2024-01-15",
|
||||
"raw_content": "Full page content (Tavily only)"
|
||||
}
|
||||
],
|
||||
"images": ["url1", "url2"],
|
||||
"answer": "Synthesized answer",
|
||||
"knowledge_graph": { },
|
||||
"routing": {
|
||||
"auto_routed": true,
|
||||
"selected_provider": "serper",
|
||||
"reason": "matched_keywords (score=1)",
|
||||
"matched_keywords": ["price"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### CLI Options Reference
|
||||
|
||||
| Option | Providers | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `-q, --query` | All | Search query |
|
||||
| `-p, --provider` | All | Provider: auto, serper, tavily, querit, exa, perplexity, you, searxng |
|
||||
| `-n, --max-results` | All | Max results (default: 5) |
|
||||
| `--auto` | All | Force auto-routing |
|
||||
| `--explain-routing` | All | Debug auto-routing |
|
||||
| `--images` | Serper, Tavily | Include images |
|
||||
| `--country` | Serper, You | Country code (default: us) |
|
||||
| `--language` | Serper, SearXNG | Language code (default: en) |
|
||||
| `--type` | Serper | search/news/images/videos/places/shopping |
|
||||
| `--time-range` | Serper, SearXNG | hour/day/week/month/year |
|
||||
| `--depth` | Tavily | basic/advanced |
|
||||
| `--topic` | Tavily | general/news |
|
||||
| `--raw-content` | Tavily | Include full page content |
|
||||
| `--querit-base-url` | Querit | Override Querit API base URL |
|
||||
| `--querit-base-path` | Querit | Override Querit API path |
|
||||
| `--exa-type` | Exa | neural/keyword |
|
||||
| `--category` | Exa | company/research paper/news/pdf/github/tweet |
|
||||
| `--start-date` | Exa | Start date (YYYY-MM-DD) |
|
||||
| `--end-date` | Exa | End date (YYYY-MM-DD) |
|
||||
| `--similar-url` | Exa | Find similar pages |
|
||||
| `--searxng-url` | SearXNG | Instance URL |
|
||||
| `--searxng-safesearch` | SearXNG | 0=off, 1=moderate, 2=strict |
|
||||
| `--engines` | SearXNG | Specific engines (google,bing,duckduckgo) |
|
||||
| `--categories` | SearXNG | Search categories (general,images,news) |
|
||||
| `--include-domains` | Tavily, Exa | Only these domains |
|
||||
| `--exclude-domains` | Tavily, Exa | Exclude these domains |
|
||||
| `--compact` | All | Compact JSON output |
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Serper](https://serper.dev) — Google Search API
|
||||
- [Tavily](https://tavily.com) — AI Research Search
|
||||
- [Exa](https://exa.ai) — Neural Search
|
||||
- [ClawHub](https://clawhub.ai) — OpenClaw Skills
|
||||
258
SKILL.md
Normal file
258
SKILL.md
Normal file
@@ -0,0 +1,258 @@
|
||||
---
|
||||
name: web-search-plus
|
||||
version: 2.9.2
|
||||
description: "具有智能自动路由的统一搜索技能。"
|
||||
tags: [search, web-search, serper, tavily, querit, exa, perplexity, you, searxng, google, multilingual-search, research, semantic-search, auto-routing, multi-provider, shopping, rag, free-tier, privacy, self-hosted, kilo]
|
||||
metadata: {"openclaw":{"requires":{"bins":["python3","bash"],"env":{"SERPER_API_KEY":"optional","TAVILY_API_KEY":"optional","QUERIT_API_KEY":"optional","EXA_API_KEY":"optional","YOU_API_KEY":"optional","SEARXNG_INSTANCE_URL":"optional","KILOCODE_API_KEY":"optional — required for Perplexity provider (via Kilo Gateway)"},"note":"Only ONE provider key needed. All are optional."}}}
|
||||
---
|
||||
|
||||
# Web Search Plus
|
||||
|
||||
**Stop choosing search providers. Let the skill do it for you.**
|
||||
|
||||
This skill connects you to 7 search providers (Serper, Tavily, Querit, Exa, Perplexity, You.com, SearXNG) and automatically picks the best one for each query. Shopping question? → Google results. Research question? → Deep research engine. Need a direct answer? → AI-synthesized with citations. Want privacy? → Self-hosted option.
|
||||
|
||||
---
|
||||
|
||||
## ✨ What Makes This Different?
|
||||
|
||||
- **Just search** — No need to think about which provider to use
|
||||
- **Smart routing** — Analyzes your query and picks the best provider automatically
|
||||
- **7 providers, 1 interface** — Google results, research engines, neural search, AI answers with citations, RAG-optimized, and privacy-first all in one
|
||||
- **Works with just 1 key** — Start with any single provider, add more later
|
||||
- **Free options available** — SearXNG is completely free (self-hosted)
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
```bash
|
||||
# Interactive setup (recommended for first run)
|
||||
python3 scripts/setup.py
|
||||
|
||||
# Or manual: copy config and add your keys
|
||||
cp config.example.json config.json
|
||||
```
|
||||
|
||||
The wizard explains each provider, collects API keys, and configures defaults.
|
||||
|
||||
---
|
||||
|
||||
## 🔑 API Keys
|
||||
|
||||
You only need **ONE** key to get started. Add more providers later for better coverage.
|
||||
|
||||
| Provider | Free Tier | Best For | Sign Up |
|
||||
|----------|-----------|----------|---------|
|
||||
| **Serper** | 2,500/mo | Shopping, prices, local, news | [serper.dev](https://serper.dev) |
|
||||
| **Tavily** | 1,000/mo | Research, explanations, academic | [tavily.com](https://tavily.com) |
|
||||
| **Querit** | Contact sales/free tier varies | Multilingual AI search, international updates | [querit.ai](https://querit.ai) |
|
||||
| **Exa** | 1,000/mo | "Similar to X", startups, papers | [exa.ai](https://exa.ai) |
|
||||
| **Perplexity** | Via Kilo | Direct answers with citations | [kilo.ai](https://kilo.ai) |
|
||||
| **You.com** | Limited | Real-time info, AI/RAG context | [api.you.com](https://api.you.com) |
|
||||
| **SearXNG** | **FREE** ✅ | Privacy, multi-source, $0 cost | Self-hosted |
|
||||
|
||||
**Setting your keys:**
|
||||
|
||||
```bash
|
||||
# Option A: .env file (recommended)
|
||||
export SERPER_API_KEY="your-key"
|
||||
export TAVILY_API_KEY="your-key"
|
||||
export QUERIT_API_KEY="your-key"
|
||||
|
||||
# Option B: config.json
|
||||
{ "serper": { "api_key": "your-key" } }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 When to Use Which Provider
|
||||
|
||||
| I want to... | Provider | Example Query |
|
||||
|--------------|----------|---------------|
|
||||
| Find product prices | **Serper** | "iPhone 16 Pro Max price" |
|
||||
| Find restaurants/stores nearby | **Serper** | "best pizza near me" |
|
||||
| Understand how something works | **Tavily** | "how does HTTPS encryption work" |
|
||||
| Do deep research | **Tavily** | "climate change research 2024" |
|
||||
| Search across languages / international updates | **Querit** | "latest AI policy updates in Germany" |
|
||||
| Find companies like X | **Exa** | "startups similar to Notion" |
|
||||
| Find research papers | **Exa** | "transformer architecture papers" |
|
||||
| Get a direct answer with sources | **Perplexity** | "events in Berlin this weekend" |
|
||||
| Know the current status of something | **Perplexity** | "what is the status of Ethereum upgrades" |
|
||||
| Get real-time info | **You.com** | "latest AI regulation news" |
|
||||
| Search without being tracked | **SearXNG** | anything, privately |
|
||||
|
||||
**Pro tip:** Just search normally! Auto-routing handles most queries correctly. Override with `-p provider` when needed.
|
||||
|
||||
---
|
||||
|
||||
## 🧠 How Auto-Routing Works
|
||||
|
||||
The skill looks at your query and picks the best provider:
|
||||
|
||||
```bash
|
||||
"iPhone 16 price" → Serper (shopping keywords)
|
||||
"how does quantum computing work" → Tavily (research question)
|
||||
"latest AI policy updates in Germany" → Querit (multilingual + recency)
|
||||
"companies like stripe.com" → Exa (URL detected, similarity)
|
||||
"events in Graz this weekend" → Perplexity (local + direct answer)
|
||||
"latest news on AI" → You.com (real-time intent)
|
||||
"search privately" → SearXNG (privacy keywords)
|
||||
```
|
||||
|
||||
**What if it picks wrong?** Override it: `python3 scripts/search.py -p tavily -q "your query"`
|
||||
|
||||
**Debug routing:** `python3 scripts/search.py --explain-routing -q "your query"`
|
||||
|
||||
---
|
||||
|
||||
## 📖 Usage Examples
|
||||
|
||||
### Let Auto-Routing Choose (Recommended)
|
||||
|
||||
```bash
|
||||
python3 scripts/search.py -q "Tesla Model 3 price"
|
||||
python3 scripts/search.py -q "explain machine learning"
|
||||
python3 scripts/search.py -q "latest AI policy updates in Germany"
|
||||
python3 scripts/search.py -q "startups like Figma"
|
||||
```
|
||||
|
||||
### Force a Specific Provider
|
||||
|
||||
```bash
|
||||
python3 scripts/search.py -p serper -q "weather Berlin"
|
||||
python3 scripts/search.py -p tavily -q "quantum computing" --depth advanced
|
||||
python3 scripts/search.py -p querit -q "latest AI policy updates in Germany"
|
||||
python3 scripts/search.py -p exa --similar-url "https://stripe.com" --category company
|
||||
python3 scripts/search.py -p you -q "breaking tech news" --include-news
|
||||
python3 scripts/search.py -p searxng -q "linux distros" --engines "google,bing"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙ Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"auto_routing": {
|
||||
"enabled": true,
|
||||
"fallback_provider": "serper",
|
||||
"confidence_threshold": 0.3,
|
||||
"disabled_providers": []
|
||||
},
|
||||
"serper": {"country": "us", "language": "en"},
|
||||
"tavily": {"depth": "advanced"},
|
||||
"exa": {"type": "neural"},
|
||||
"you": {"country": "US", "include_news": true},
|
||||
"searxng": {"instance_url": "https://your-instance.example.com"}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Provider Comparison
|
||||
|
||||
| Feature | Serper | Tavily | Exa | Perplexity | You.com | SearXNG |
|
||||
|---------|:------:|:------:|:---:|:----------:|:-------:|:-------:|
|
||||
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡ | ⚡⚡⚡ | ⚡⚡ |
|
||||
| Direct Answers | ✗ | ✗ | ✗ | ✓✓ | ✗ | ✗ |
|
||||
| Citations | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ |
|
||||
| Factual Accuracy | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ |
|
||||
| Semantic Understanding | ⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐ |
|
||||
| Full Page Content | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ |
|
||||
| Shopping/Local | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
|
||||
| Find Similar Pages | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
|
||||
| RAG-Optimized | ✗ | ✓ | ✗ | ✗ | ✓✓ | ✗ |
|
||||
| Privacy-First | ✗ | ✗ | ✗ | ✗ | ✗ | ✓✓ |
|
||||
| API Cost | $$ | $$ | $$ | Via Kilo | $ | **FREE** |
|
||||
|
||||
---
|
||||
|
||||
## ❓ Common Questions
|
||||
|
||||
### Do I need API keys for all providers?
|
||||
**No.** You only need keys for providers you want to use. Start with one (Serper recommended), add more later.
|
||||
|
||||
### Which provider should I start with?
|
||||
**Serper** — fastest, cheapest, largest free tier (2,500 queries/month), and handles most queries well.
|
||||
|
||||
### What if I run out of free queries?
|
||||
The skill automatically falls back to your other configured providers. Or switch to SearXNG (unlimited, self-hosted).
|
||||
|
||||
### How much does this cost?
|
||||
- **Free tiers:** 2,500 (Serper) + 1,000 (Tavily) + 1,000 (Exa) = 4,500+ free searches/month
|
||||
- **SearXNG:** Completely free (just ~$5/mo if you self-host on a VPS)
|
||||
- **Paid plans:** Start around $10-50/month depending on provider
|
||||
|
||||
### Is SearXNG really private?
|
||||
**Yes, if self-hosted.** You control the server, no tracking, no profiling. Public instances depend on the operator's policy.
|
||||
|
||||
### How do I set up SearXNG?
|
||||
```bash
|
||||
# Docker (5 minutes)
|
||||
docker run -d -p 8080:8080 searxng/searxng
|
||||
```
|
||||
Then enable JSON API in `settings.yml`. See [docs.searxng.org](https://docs.searxng.org/admin/installation.html).
|
||||
|
||||
### Why did it route my query to the "wrong" provider?
|
||||
Sometimes queries are ambiguous. Use `--explain-routing` to see why, then override with `-p provider` if needed.
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Automatic Fallback
|
||||
|
||||
If one provider fails (rate limit, timeout, error), the skill automatically tries the next provider. You'll see `routing.fallback_used: true` in the response when this happens.
|
||||
|
||||
---
|
||||
|
||||
## 📤 Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"provider": "serper",
|
||||
"query": "iPhone 16 price",
|
||||
"results": [{"title": "...", "url": "...", "snippet": "...", "score": 0.95}],
|
||||
"routing": {
|
||||
"auto_routed": true,
|
||||
"provider": "serper",
|
||||
"confidence": 0.78,
|
||||
"confidence_level": "high"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚠ Important Note
|
||||
|
||||
**Tavily, Serper, and Exa are NOT core OpenClaw providers.**
|
||||
|
||||
❌ Don't modify `~/.openclaw/openclaw.json` for these
|
||||
✅ Use this skill's scripts — keys auto-load from `.env`
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security
|
||||
|
||||
**SearXNG SSRF Protection:** The SearXNG instance URL is validated with defense-in-depth:
|
||||
- Enforces `http`/`https` schemes only
|
||||
- Blocks cloud metadata endpoints (169.254.169.254, metadata.google.internal)
|
||||
- Resolves hostnames and blocks private/internal IPs (loopback, RFC1918, link-local, reserved)
|
||||
- Operators who intentionally self-host on private networks can set `SEARXNG_ALLOW_PRIVATE=1`
|
||||
|
||||
## 📚 More Documentation
|
||||
|
||||
- **[FAQ.md](FAQ.md)** — Detailed answers to more questions
|
||||
- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** — Fix common errors
|
||||
- **[README.md](README.md)** — Full technical reference
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Quick Links
|
||||
|
||||
- [Serper](https://serper.dev) — Google Search API
|
||||
- [Tavily](https://tavily.com) — AI Research Search
|
||||
- [Exa](https://exa.ai) — Neural Search
|
||||
- [Perplexity](https://www.perplexity.ai) — AI-Synthesized Answers (via [Kilo Gateway](https://kilo.ai))
|
||||
- [You.com](https://api.you.com) — RAG/Real-time Search
|
||||
- [SearXNG](https://docs.searxng.org) — Privacy-First Meta-Search
|
||||
315
TROUBLESHOOTING.md
Normal file
315
TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Troubleshooting Guide
|
||||
|
||||
## Caching Issues (v2.7.0+)
|
||||
|
||||
### Cache not working / always fetching fresh
|
||||
|
||||
**Symptoms:**
|
||||
- Every request hits the API
|
||||
- `"cached": false` even for repeated queries
|
||||
|
||||
**Solutions:**
|
||||
1. Check cache directory exists and is writable:
|
||||
```bash
|
||||
ls -la .cache/ # Should exist in skill directory
|
||||
```
|
||||
2. Verify `--no-cache` isn't being passed
|
||||
3. Check disk space isn't full
|
||||
4. Ensure query is EXACTLY the same (including provider and max_results)
|
||||
|
||||
### Stale results from cache
|
||||
|
||||
**Symptoms:**
|
||||
- Getting outdated information
|
||||
- Cache TTL seems too long
|
||||
|
||||
**Solutions:**
|
||||
1. Use `--no-cache` to force fresh results
|
||||
2. Reduce TTL: `--cache-ttl 1800` (30 minutes)
|
||||
3. Clear cache: `python3 scripts/search.py --clear-cache`
|
||||
|
||||
### Cache growing too large
|
||||
|
||||
**Symptoms:**
|
||||
- Disk space filling up
|
||||
- Many .json files in `.cache/`
|
||||
|
||||
**Solutions:**
|
||||
1. Clear cache periodically:
|
||||
```bash
|
||||
python3 scripts/search.py --clear-cache
|
||||
```
|
||||
2. Set up a cron job to clear weekly
|
||||
3. Use a smaller TTL so entries expire faster
|
||||
|
||||
### "Permission denied" when caching
|
||||
|
||||
**Symptoms:**
|
||||
- Cache write errors in stderr
|
||||
- Searches work but don't cache
|
||||
|
||||
**Solutions:**
|
||||
1. Check directory permissions: `chmod 755 .cache/`
|
||||
2. Use custom cache dir: `export WSP_CACHE_DIR="$TMP_DIR/wsp-cache"`
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### "No API key found" error
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Error: No API key found for serper
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
1. Check `.env` exists in skill folder with `export VAR=value` format
|
||||
2. Keys auto-load from skill's `.env` since v2.2.0
|
||||
3. Or set in system environment: `export SERPER_API_KEY="..."`
|
||||
4. Verify key format in config.json:
|
||||
```json
|
||||
{ "serper": { "api_key": "your-key" } }
|
||||
```
|
||||
|
||||
**Priority order:** config.json > .env > environment variable
|
||||
|
||||
---
|
||||
|
||||
### Getting empty results
|
||||
|
||||
**Symptoms:**
|
||||
- Search returns no results
|
||||
- `"results": []` in JSON output
|
||||
|
||||
**Solutions:**
|
||||
1. Check API key is valid (try the provider's web dashboard)
|
||||
2. Try a different provider with `-p`
|
||||
3. Some queries have no results (very niche topics)
|
||||
4. Check if provider is rate-limited
|
||||
5. Verify internet connectivity
|
||||
|
||||
**Debug:**
|
||||
```bash
|
||||
python3 scripts/search.py -q "test query" --verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Rate limited
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Error: 429 Too Many Requests
|
||||
Error: Rate limit exceeded
|
||||
```
|
||||
|
||||
**Good news:** Since v2.2.5, automatic fallback kicks in! If one provider hits rate limits, the script automatically tries the next provider.
|
||||
|
||||
**Solutions:**
|
||||
1. Wait for rate limit to reset (usually 1 hour or end of day)
|
||||
2. Use a different provider: `-p tavily` instead of `-p serper`
|
||||
3. Check free tier limits:
|
||||
- Serper: 2,500 free total
|
||||
- Tavily: 1,000/month free
|
||||
- Exa: 1,000/month free
|
||||
4. Upgrade to paid tier for higher limits
|
||||
5. Use SearXNG (self-hosted, unlimited)
|
||||
|
||||
**Fallback info:** Response will include `routing.fallback_used: true` when fallback was used.
|
||||
|
||||
---
|
||||
|
||||
### SearXNG: "403 Forbidden"
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Error: 403 Forbidden
|
||||
Error: JSON format not allowed
|
||||
```
|
||||
|
||||
**Cause:** Most public SearXNG instances disable JSON API to prevent bot abuse.
|
||||
|
||||
**Solution:** Self-host your own instance:
|
||||
```bash
|
||||
docker run -d -p 8080:8080 searxng/searxng
|
||||
```
|
||||
|
||||
Then enable JSON in `settings.yml`:
|
||||
```yaml
|
||||
search:
|
||||
formats:
|
||||
- html
|
||||
- json # Add this!
|
||||
```
|
||||
|
||||
Restart the container and update your config:
|
||||
```json
|
||||
{
|
||||
"searxng": {
|
||||
"instance_url": "http://localhost:8080"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### SearXNG: Slow responses
|
||||
|
||||
**Symptoms:**
|
||||
- SearXNG takes 2-5 seconds
|
||||
- Other providers are faster
|
||||
|
||||
**Explanation:** This is expected behavior. SearXNG queries 70+ upstream engines in parallel, which takes longer than direct API calls.
|
||||
|
||||
**Trade-off:** Slower but privacy-preserving + multi-source + $0 cost.
|
||||
|
||||
**Solutions:**
|
||||
1. Accept the trade-off for privacy benefits
|
||||
2. Limit engines for faster results:
|
||||
```bash
|
||||
python3 scripts/search.py -p searxng -q "query" --engines "google,bing"
|
||||
```
|
||||
3. Use SearXNG as fallback (put last in priority list)
|
||||
|
||||
---
|
||||
|
||||
### Auto-routing picks wrong provider
|
||||
|
||||
**Symptoms:**
|
||||
- Query about research goes to Serper
|
||||
- Query about shopping goes to Tavily
|
||||
|
||||
**Debug:**
|
||||
```bash
|
||||
python3 scripts/search.py --explain-routing -q "your query"
|
||||
```
|
||||
|
||||
This shows the full analysis:
|
||||
```json
|
||||
{
|
||||
"query": "how much does iPhone 16 Pro cost",
|
||||
"routing_decision": {
|
||||
"provider": "serper",
|
||||
"confidence": 0.68,
|
||||
"reason": "moderate_confidence_match"
|
||||
},
|
||||
"scores": {"serper": 7.0, "tavily": 0.0, "exa": 0.0},
|
||||
"top_signals": [
|
||||
{"matched": "how much", "weight": 4.0},
|
||||
{"matched": "brand + product detected", "weight": 3.0}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
1. Override with explicit provider: `-p tavily`
|
||||
2. Rephrase query to be more explicit about intent
|
||||
3. Adjust `confidence_threshold` in config.json (default: 0.3)
|
||||
|
||||
---
|
||||
|
||||
### Config not loading
|
||||
|
||||
**Symptoms:**
|
||||
- Changes to config.json not applied
|
||||
- Using default values instead
|
||||
|
||||
**Solutions:**
|
||||
1. Check JSON syntax (use a validator)
|
||||
2. Ensure file is in skill directory: `/path/to/skills/web-search-plus/config.json`
|
||||
3. Check file permissions
|
||||
4. Run setup wizard to regenerate:
|
||||
```bash
|
||||
python3 scripts/setup.py --reset
|
||||
```
|
||||
|
||||
**Validate JSON:**
|
||||
```bash
|
||||
python3 -m json.tool config.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Python dependencies missing
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
ModuleNotFoundError: No module named 'requests'
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
pip3 install requests
|
||||
```
|
||||
|
||||
Or install all dependencies:
|
||||
```bash
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Timeout errors
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Error: Request timeout after 30s
|
||||
```
|
||||
|
||||
**Causes:**
|
||||
- Slow network connection
|
||||
- Provider API issues
|
||||
- SearXNG instance overloaded
|
||||
|
||||
**Solutions:**
|
||||
1. Try again (temporary issue)
|
||||
2. Switch provider: `-p serper`
|
||||
3. Check your internet connection
|
||||
4. If using SearXNG, check instance health
|
||||
|
||||
---
|
||||
|
||||
### Duplicate results
|
||||
|
||||
**Symptoms:**
|
||||
- Same result appears multiple times
|
||||
- Results overlap between providers
|
||||
|
||||
**Solution:** This is expected when using auto-fallback or multiple providers. The skill doesn't deduplicate across providers.
|
||||
|
||||
For single-provider results:
|
||||
```bash
|
||||
python3 scripts/search.py -p serper -q "query"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debug Mode
|
||||
|
||||
For detailed debugging:
|
||||
|
||||
```bash
|
||||
# Verbose output
|
||||
python3 scripts/search.py -q "query" --verbose
|
||||
|
||||
# Show routing decision
|
||||
python3 scripts/search.py -q "query" --explain-routing
|
||||
|
||||
# Dry run (no actual search)
|
||||
python3 scripts/search.py -q "query" --dry-run
|
||||
|
||||
# Test specific provider
|
||||
python3 scripts/search.py -p tavily -q "query" --verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting Help
|
||||
|
||||
**Still stuck?**
|
||||
|
||||
1. Check the full documentation in `README.md`
|
||||
2. Run the setup wizard: `python3 scripts/setup.py`
|
||||
3. Review `FAQ.md` for common questions
|
||||
4. Open an issue: https://github.com/robbyczgw-cla/web-search-plus/issues
|
||||
6
_meta.json
Normal file
6
_meta.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"ownerId": "kn73gpe8xz2630jrknkb3ya96h7zb84h",
|
||||
"slug": "web-search-plus",
|
||||
"version": "2.9.2",
|
||||
"publishedAt": 1774629265049
|
||||
}
|
||||
265
config.example.json
Normal file
265
config.example.json
Normal file
@@ -0,0 +1,265 @@
|
||||
{
|
||||
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||
"$comment": "Web Search Plus configuration — intelligent routing and provider settings",
|
||||
"defaults": {
|
||||
"provider": "serper",
|
||||
"max_results": 5
|
||||
},
|
||||
"auto_routing": {
|
||||
"enabled": true,
|
||||
"fallback_provider": "serper",
|
||||
"provider_priority": [
|
||||
"tavily",
|
||||
"querit",
|
||||
"exa",
|
||||
"perplexity",
|
||||
"serper",
|
||||
"you",
|
||||
"searxng"
|
||||
],
|
||||
"disabled_providers": [],
|
||||
"confidence_threshold": 0.3,
|
||||
"keyword_mappings": {
|
||||
"serper": [
|
||||
"price",
|
||||
"buy",
|
||||
"shop",
|
||||
"shopping",
|
||||
"cost",
|
||||
"deal",
|
||||
"sale",
|
||||
"purchase",
|
||||
"cheap",
|
||||
"expensive",
|
||||
"store",
|
||||
"product",
|
||||
"review",
|
||||
"specs",
|
||||
"specification",
|
||||
"where to buy",
|
||||
"near me",
|
||||
"local",
|
||||
"restaurant",
|
||||
"hotel",
|
||||
"weather",
|
||||
"news",
|
||||
"latest",
|
||||
"breaking",
|
||||
"map",
|
||||
"directions",
|
||||
"phone number",
|
||||
"preis",
|
||||
"kaufen",
|
||||
"bestellen",
|
||||
"günstig",
|
||||
"billig",
|
||||
"teuer",
|
||||
"kosten",
|
||||
"angebot",
|
||||
"rabatt",
|
||||
"shop",
|
||||
"händler",
|
||||
"geschäft",
|
||||
"laden",
|
||||
"test",
|
||||
"bewertung",
|
||||
"technische daten",
|
||||
"spezifikationen",
|
||||
"wo kaufen",
|
||||
"in der nähe",
|
||||
"wetter",
|
||||
"nachrichten",
|
||||
"aktuell",
|
||||
"neu"
|
||||
],
|
||||
"tavily": [
|
||||
"how does",
|
||||
"how to",
|
||||
"explain",
|
||||
"research",
|
||||
"what is",
|
||||
"why does",
|
||||
"analyze",
|
||||
"compare",
|
||||
"study",
|
||||
"academic",
|
||||
"detailed",
|
||||
"comprehensive",
|
||||
"in-depth",
|
||||
"understand",
|
||||
"learn",
|
||||
"tutorial",
|
||||
"guide",
|
||||
"overview",
|
||||
"history of",
|
||||
"background",
|
||||
"context",
|
||||
"implications",
|
||||
"pros and cons",
|
||||
"wie funktioniert",
|
||||
"erklärung",
|
||||
"erklären",
|
||||
"was ist",
|
||||
"warum",
|
||||
"analyse",
|
||||
"vergleich",
|
||||
"vergleichen",
|
||||
"studie",
|
||||
"verstehen",
|
||||
"lernen",
|
||||
"anleitung",
|
||||
"tutorial",
|
||||
"überblick",
|
||||
"hintergrund",
|
||||
"vor- und nachteile"
|
||||
],
|
||||
"exa": [
|
||||
"similar to",
|
||||
"companies like",
|
||||
"find sites like",
|
||||
"alternatives to",
|
||||
"competitors",
|
||||
"startup",
|
||||
"github",
|
||||
"paper",
|
||||
"research paper",
|
||||
"arxiv",
|
||||
"pdf",
|
||||
"academic paper",
|
||||
"similar pages",
|
||||
"related sites",
|
||||
"who else",
|
||||
"other companies",
|
||||
"comparable to",
|
||||
"ähnlich wie",
|
||||
"firmen wie",
|
||||
"alternativen zu",
|
||||
"konkurrenten",
|
||||
"vergleichbar mit",
|
||||
"andere unternehmen"
|
||||
],
|
||||
"you": [
|
||||
"rag",
|
||||
"context for",
|
||||
"summarize",
|
||||
"brief",
|
||||
"quick overview",
|
||||
"tldr",
|
||||
"key points",
|
||||
"key facts",
|
||||
"main points",
|
||||
"main takeaways",
|
||||
"latest news",
|
||||
"latest updates",
|
||||
"current events",
|
||||
"current situation",
|
||||
"current status",
|
||||
"right now",
|
||||
"as of today",
|
||||
"up to date",
|
||||
"real time",
|
||||
"what's happening",
|
||||
"what's the latest",
|
||||
"updates on",
|
||||
"status of",
|
||||
"zusammenfassung",
|
||||
"aktuelle nachrichten",
|
||||
"neueste updates"
|
||||
],
|
||||
"searxng": [
|
||||
"private",
|
||||
"privately",
|
||||
"anonymous",
|
||||
"anonymously",
|
||||
"without tracking",
|
||||
"no tracking",
|
||||
"privacy",
|
||||
"privacy-focused",
|
||||
"privacy-first",
|
||||
"duckduckgo alternative",
|
||||
"private search",
|
||||
"aggregate results",
|
||||
"multiple sources",
|
||||
"diverse results",
|
||||
"diverse perspectives",
|
||||
"meta search",
|
||||
"all engines",
|
||||
"free search",
|
||||
"no api cost",
|
||||
"self-hosted search",
|
||||
"zero cost",
|
||||
"privat",
|
||||
"anonym",
|
||||
"ohne tracking",
|
||||
"datenschutz",
|
||||
"verschiedene quellen",
|
||||
"aus mehreren quellen",
|
||||
"alle suchmaschinen",
|
||||
"kostenlose suche",
|
||||
"keine api kosten"
|
||||
],
|
||||
"querit": [
|
||||
"multilingual",
|
||||
"current status",
|
||||
"latest updates",
|
||||
"status of",
|
||||
"real-time",
|
||||
"summarize",
|
||||
"global search",
|
||||
"cross-language",
|
||||
"international",
|
||||
"aktuell",
|
||||
"zusammenfassung"
|
||||
],
|
||||
"perplexity": [
|
||||
"what is",
|
||||
"current status",
|
||||
"status of",
|
||||
"what happened with",
|
||||
"events in",
|
||||
"things to do in"
|
||||
]
|
||||
}
|
||||
},
|
||||
"serper": {
|
||||
"country": "us",
|
||||
"language": "en",
|
||||
"type": "search",
|
||||
"autocorrect": true,
|
||||
"include_images": false
|
||||
},
|
||||
"tavily": {
|
||||
"depth": "advanced",
|
||||
"topic": "general",
|
||||
"max_results": 8
|
||||
},
|
||||
"exa": {
|
||||
"type": "neural",
|
||||
"category": null,
|
||||
"include_domains": [],
|
||||
"exclude_domains": []
|
||||
},
|
||||
"you": {
|
||||
"country": "US",
|
||||
"language": "en",
|
||||
"safesearch": "moderate",
|
||||
"include_news": true
|
||||
},
|
||||
"searxng": {
|
||||
"$comment": "SearXNG requires a self-hosted instance. No API key needed, just your instance URL.",
|
||||
"instance_url": null,
|
||||
"safesearch": 0,
|
||||
"engines": null,
|
||||
"language": "en"
|
||||
},
|
||||
"querit_api_key": "",
|
||||
"querit": {
|
||||
"base_url": "https://api.querit.ai",
|
||||
"base_path": "/v1/search",
|
||||
"timeout": 10
|
||||
},
|
||||
"perplexity": {
|
||||
"api_url": "https://api.kilo.ai/api/gateway/chat/completions",
|
||||
"model": "perplexity/sonar-pro"
|
||||
}
|
||||
}
|
||||
88
package.json
Normal file
88
package.json
Normal file
@@ -0,0 +1,88 @@
|
||||
{
|
||||
"name": "@openclaw/web-search-plus",
|
||||
"version": "2.9.0",
|
||||
"description": "Unified search skill with Intelligent Auto-Routing. Uses multi-signal analysis (intent classification, linguistic patterns, URL/brand detection) to automatically select between Serper (Google), Tavily (Research), Querit (Multilingual AI Search), Exa (Neural), Perplexity (AI Answers), You.com (RAG/Real-time), and SearXNG (Privacy/Self-hosted) with confidence scoring.",
|
||||
"keywords": [
|
||||
"openclaw",
|
||||
"skill",
|
||||
"search",
|
||||
"web-search",
|
||||
"serper",
|
||||
"tavily",
|
||||
"exa",
|
||||
"you",
|
||||
"you.com",
|
||||
"google-search",
|
||||
"research",
|
||||
"semantic-search",
|
||||
"ai-agent",
|
||||
"auto-routing",
|
||||
"smart-routing",
|
||||
"multi-provider",
|
||||
"shopping",
|
||||
"product-search",
|
||||
"similar-sites",
|
||||
"company-discovery",
|
||||
"rag",
|
||||
"real-time",
|
||||
"free-tier",
|
||||
"api-aggregator",
|
||||
"querit",
|
||||
"multilingual-search"
|
||||
],
|
||||
"author": "robbyczgw-cla",
|
||||
"license": "MIT",
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "https://github.com/robbyczgw-cla/web-search-plus.git"
|
||||
},
|
||||
"homepage": "https://clawhub.ai/robbyczgw-cla/web-search-plus",
|
||||
"bugs": {
|
||||
"url": "https://github.com/robbyczgw-cla/web-search-plus/issues"
|
||||
},
|
||||
"openclaw": {
|
||||
"skill": true,
|
||||
"triggers": [
|
||||
"search",
|
||||
"find",
|
||||
"look up",
|
||||
"research"
|
||||
],
|
||||
"capabilities": [
|
||||
"web-search",
|
||||
"image-search",
|
||||
"semantic-search",
|
||||
"multi-provider"
|
||||
],
|
||||
"providers": [
|
||||
"serper",
|
||||
"tavily",
|
||||
"querit",
|
||||
"exa",
|
||||
"perplexity",
|
||||
"you",
|
||||
"searxng"
|
||||
],
|
||||
"requirements": {
|
||||
"bins": [
|
||||
"python3",
|
||||
"bash"
|
||||
],
|
||||
"env": {
|
||||
"SERPER_API_KEY": "optional",
|
||||
"TAVILY_API_KEY": "optional",
|
||||
"EXA_API_KEY": "optional",
|
||||
"YOU_API_KEY": "optional",
|
||||
"SEARXNG_INSTANCE_URL": "optional",
|
||||
"QUERIT_API_KEY": "optional",
|
||||
"KILOCODE_API_KEY": "optional"
|
||||
}
|
||||
}
|
||||
},
|
||||
"files": [
|
||||
"SKILL.md",
|
||||
"README.md",
|
||||
"scripts/",
|
||||
".env.example"
|
||||
]
|
||||
}
|
||||
2940
scripts/search.py
Normal file
2940
scripts/search.py
Normal file
File diff suppressed because it is too large
Load Diff
453
scripts/setup.py
Normal file
453
scripts/setup.py
Normal file
@@ -0,0 +1,453 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Web Search Plus - Interactive Setup Wizard
|
||||
==========================================
|
||||
|
||||
Runs on first use (when no config.json exists) to configure providers and API keys.
|
||||
Creates config.json with your settings. API keys are stored locally only.
|
||||
|
||||
Usage:
|
||||
python3 scripts/setup.py # Interactive setup
|
||||
python3 scripts/setup.py --reset # Reset and reconfigure
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# ANSI colors for terminal output
|
||||
class Colors:
|
||||
HEADER = '\033[95m'
|
||||
BLUE = '\033[94m'
|
||||
CYAN = '\033[96m'
|
||||
GREEN = '\033[92m'
|
||||
YELLOW = '\033[93m'
|
||||
RED = '\033[91m'
|
||||
BOLD = '\033[1m'
|
||||
DIM = '\033[2m'
|
||||
RESET = '\033[0m'
|
||||
|
||||
def color(text: str, c: str) -> str:
|
||||
"""Wrap text in color codes."""
|
||||
return f"{c}{text}{Colors.RESET}"
|
||||
|
||||
def print_header():
|
||||
"""Print the setup wizard header."""
|
||||
print()
|
||||
print(color("╔════════════════════════════════════════════════════════════╗", Colors.CYAN))
|
||||
print(color("║ 🔍 Web Search Plus - Setup Wizard ║", Colors.CYAN))
|
||||
print(color("╚════════════════════════════════════════════════════════════╝", Colors.CYAN))
|
||||
print()
|
||||
print(color("This wizard will help you configure your search providers.", Colors.DIM))
|
||||
print(color("API keys are stored locally in config.json (gitignored).", Colors.DIM))
|
||||
print()
|
||||
|
||||
def print_provider_info():
|
||||
"""Print information about each provider."""
|
||||
print(color("📚 Available Providers:", Colors.BOLD))
|
||||
print()
|
||||
|
||||
providers = [
|
||||
{
|
||||
"name": "Serper",
|
||||
"emoji": "🔎",
|
||||
"best_for": "Google results, shopping, local businesses, news",
|
||||
"free_tier": "2,500 queries/month",
|
||||
"signup": "https://serper.dev",
|
||||
"strengths": ["Fastest response times", "Product prices & specs", "Knowledge Graph", "Local business data"]
|
||||
},
|
||||
{
|
||||
"name": "Tavily",
|
||||
"emoji": "📖",
|
||||
"best_for": "Research, explanations, in-depth analysis",
|
||||
"free_tier": "1,000 queries/month",
|
||||
"signup": "https://tavily.com",
|
||||
"strengths": ["AI-synthesized answers", "Full page content", "Domain filtering", "Academic research"]
|
||||
},
|
||||
{
|
||||
"name": "Exa",
|
||||
"emoji": "🧠",
|
||||
"best_for": "Semantic search, finding similar content, discovery",
|
||||
"free_tier": "1,000 queries/month",
|
||||
"signup": "https://exa.ai",
|
||||
"strengths": ["Neural/semantic understanding", "Similar page discovery", "Startup/company finder", "Date filtering"]
|
||||
},
|
||||
{
|
||||
"name": "You.com",
|
||||
"emoji": "🤖",
|
||||
"best_for": "RAG applications, real-time info, LLM-ready snippets",
|
||||
"free_tier": "Limited free tier",
|
||||
"signup": "https://api.you.com",
|
||||
"strengths": ["LLM-ready snippets", "Combined web + news", "Live page crawling", "Real-time information"]
|
||||
},
|
||||
{
|
||||
"name": "SearXNG",
|
||||
"emoji": "🔒",
|
||||
"best_for": "Privacy-first search, multi-source aggregation, $0 API cost",
|
||||
"free_tier": "FREE (self-hosted)",
|
||||
"signup": "https://docs.searxng.org/admin/installation.html",
|
||||
"strengths": ["Privacy-preserving (no tracking)", "70+ search engines", "Self-hosted = $0 API cost", "Diverse results"]
|
||||
}
|
||||
]
|
||||
|
||||
for p in providers:
|
||||
print(f" {p['emoji']} {color(p['name'], Colors.BOLD)}")
|
||||
print(f" Best for: {color(p['best_for'], Colors.GREEN)}")
|
||||
print(f" Free tier: {p['free_tier']}")
|
||||
print(f" Sign up: {color(p['signup'], Colors.BLUE)}")
|
||||
print()
|
||||
|
||||
def ask_yes_no(prompt: str, default: bool = True) -> bool:
|
||||
"""Ask a yes/no question."""
|
||||
suffix = "[Y/n]" if default else "[y/N]"
|
||||
while True:
|
||||
response = input(f"{prompt} {color(suffix, Colors.DIM)}: ").strip().lower()
|
||||
if response == "":
|
||||
return default
|
||||
if response in ("y", "yes"):
|
||||
return True
|
||||
if response in ("n", "no"):
|
||||
return False
|
||||
print(color(" Please enter 'y' or 'n'", Colors.YELLOW))
|
||||
|
||||
def ask_choice(prompt: str, options: list, default: str = None) -> str:
|
||||
"""Ask user to choose from a list of options."""
|
||||
print(f"\n{prompt}")
|
||||
for i, opt in enumerate(options, 1):
|
||||
marker = color("→", Colors.GREEN) if opt == default else " "
|
||||
print(f" {marker} {i}. {opt}")
|
||||
|
||||
while True:
|
||||
hint = f" [default: {default}]" if default else ""
|
||||
response = input(f"Enter number (1-{len(options)}){color(hint, Colors.DIM)}: ").strip()
|
||||
|
||||
if response == "" and default:
|
||||
return default
|
||||
|
||||
try:
|
||||
idx = int(response)
|
||||
if 1 <= idx <= len(options):
|
||||
return options[idx - 1]
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
print(color(f" Please enter a number between 1 and {len(options)}", Colors.YELLOW))
|
||||
|
||||
def ask_api_key(provider: str, signup_url: str) -> str:
|
||||
"""Ask for an API key with validation."""
|
||||
print()
|
||||
print(f" {color(f'Get your {provider} API key:', Colors.DIM)} {color(signup_url, Colors.BLUE)}")
|
||||
|
||||
while True:
|
||||
key = input(f" Enter your {provider} API key: ").strip()
|
||||
|
||||
if not key:
|
||||
print(color(" ⚠️ No key entered. This provider will be disabled.", Colors.YELLOW))
|
||||
return None
|
||||
|
||||
# Basic validation
|
||||
if len(key) < 10:
|
||||
print(color(" ⚠️ Key seems too short. Please check and try again.", Colors.YELLOW))
|
||||
continue
|
||||
|
||||
# Mask key for confirmation
|
||||
masked = key[:4] + "..." + key[-4:] if len(key) > 12 else key[:2] + "..."
|
||||
print(color(f" ✓ Key saved: {masked}", Colors.GREEN))
|
||||
return key
|
||||
|
||||
|
||||
def ask_searxng_instance(docs_url: str) -> str:
|
||||
"""Ask for SearXNG instance URL with connection test."""
|
||||
print()
|
||||
print(f" {color('SearXNG is self-hosted. You need your own instance.', Colors.DIM)}")
|
||||
print(f" {color('Setup guide:', Colors.DIM)} {color(docs_url, Colors.BLUE)}")
|
||||
print()
|
||||
print(f" {color('Example URLs:', Colors.DIM)}")
|
||||
print(f" • http://localhost:8080 (local Docker)")
|
||||
print(f" • https://searx.your-domain.com (self-hosted)")
|
||||
print()
|
||||
|
||||
while True:
|
||||
url = input(f" Enter your SearXNG instance URL: ").strip()
|
||||
|
||||
if not url:
|
||||
print(color(" ⚠️ No URL entered. SearXNG will be disabled.", Colors.YELLOW))
|
||||
return None
|
||||
|
||||
# Basic URL validation
|
||||
if not url.startswith(("http://", "https://")):
|
||||
print(color(" ⚠️ URL must start with http:// or https://", Colors.YELLOW))
|
||||
continue
|
||||
|
||||
# SSRF protection: validate URL before connecting
|
||||
try:
|
||||
import ipaddress
|
||||
import socket
|
||||
from urllib.parse import urlparse as _urlparse
|
||||
_parsed = _urlparse(url)
|
||||
_hostname = _parsed.hostname or ""
|
||||
_blocked = {"169.254.169.254", "metadata.google.internal", "metadata.internal"}
|
||||
if _hostname in _blocked:
|
||||
print(color(f" ❌ Blocked: {_hostname} is a cloud metadata endpoint.", Colors.RED))
|
||||
continue
|
||||
if not os.environ.get("SEARXNG_ALLOW_PRIVATE", "").strip() == "1":
|
||||
_resolved = socket.getaddrinfo(_hostname, _parsed.port or 80, proto=socket.IPPROTO_TCP)
|
||||
for _fam, _t, _p, _cn, _sa in _resolved:
|
||||
_ip = ipaddress.ip_address(_sa[0])
|
||||
if _ip.is_loopback or _ip.is_private or _ip.is_link_local or _ip.is_reserved:
|
||||
print(color(f" ❌ Blocked: {_hostname} resolves to private IP {_ip}.", Colors.RED))
|
||||
print(color(f" Set SEARXNG_ALLOW_PRIVATE=1 if intentional.", Colors.DIM))
|
||||
raise ValueError("private_ip")
|
||||
except ValueError as _ve:
|
||||
if str(_ve) == "private_ip":
|
||||
continue
|
||||
raise
|
||||
except socket.gaierror:
|
||||
print(color(f" ❌ Cannot resolve hostname: {_hostname}", Colors.RED))
|
||||
continue
|
||||
|
||||
# Test connection
|
||||
print(color(f" Testing connection to {url}...", Colors.DIM))
|
||||
try:
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
test_url = f"{url.rstrip('/')}/search?q=test&format=json"
|
||||
req = urllib.request.Request(
|
||||
test_url,
|
||||
headers={"User-Agent": "ClawdBot-WebSearchPlus/2.5", "Accept": "application/json"}
|
||||
)
|
||||
|
||||
with urllib.request.urlopen(req, timeout=10) as response:
|
||||
data = response.read().decode("utf-8")
|
||||
import json
|
||||
result = json.loads(data)
|
||||
|
||||
# Check if it looks like SearXNG JSON response
|
||||
if "results" in result or "query" in result:
|
||||
print(color(f" ✓ Connection successful! SearXNG instance is working.", Colors.GREEN))
|
||||
return url.rstrip("/")
|
||||
else:
|
||||
print(color(f" ⚠️ Connected but response doesn't look like SearXNG JSON.", Colors.YELLOW))
|
||||
if ask_yes_no(" Use this URL anyway?", default=False):
|
||||
return url.rstrip("/")
|
||||
|
||||
except urllib.error.HTTPError as e:
|
||||
if e.code == 403:
|
||||
print(color(f" ⚠️ JSON API is disabled (403 Forbidden).", Colors.YELLOW))
|
||||
print(color(f" Enable JSON in settings.yml: search.formats: [html, json]", Colors.DIM))
|
||||
else:
|
||||
print(color(f" ⚠️ HTTP error: {e.code} {e.reason}", Colors.YELLOW))
|
||||
|
||||
if ask_yes_no(" Try a different URL?", default=True):
|
||||
continue
|
||||
return None
|
||||
|
||||
except urllib.error.URLError as e:
|
||||
print(color(f" ⚠️ Cannot reach instance: {e.reason}", Colors.YELLOW))
|
||||
if ask_yes_no(" Try a different URL?", default=True):
|
||||
continue
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(color(f" ⚠️ Error: {e}", Colors.YELLOW))
|
||||
if ask_yes_no(" Try a different URL?", default=True):
|
||||
continue
|
||||
return None
|
||||
|
||||
def ask_result_count() -> int:
|
||||
"""Ask for default result count."""
|
||||
options = ["3 (fast, minimal)", "5 (balanced - recommended)", "10 (comprehensive)"]
|
||||
choice = ask_choice("Default number of results per search?", options, "5 (balanced - recommended)")
|
||||
|
||||
if "3" in choice:
|
||||
return 3
|
||||
elif "10" in choice:
|
||||
return 10
|
||||
return 5
|
||||
|
||||
def run_setup(skill_dir: Path, force_reset: bool = False):
|
||||
"""Run the interactive setup wizard."""
|
||||
config_path = skill_dir / "config.json"
|
||||
example_path = skill_dir / "config.example.json"
|
||||
|
||||
# Check if config already exists
|
||||
if config_path.exists() and not force_reset:
|
||||
print(color("✓ config.json already exists!", Colors.GREEN))
|
||||
print()
|
||||
if not ask_yes_no("Do you want to reconfigure?", default=False):
|
||||
print(color("Setup cancelled. Your existing config is unchanged.", Colors.DIM))
|
||||
return False
|
||||
print()
|
||||
|
||||
print_header()
|
||||
print_provider_info()
|
||||
|
||||
# Load example config as base
|
||||
if example_path.exists():
|
||||
with open(example_path) as f:
|
||||
config = json.load(f)
|
||||
else:
|
||||
config = {
|
||||
"defaults": {"provider": "serper", "max_results": 5},
|
||||
"auto_routing": {"enabled": True, "fallback_provider": "serper"},
|
||||
"serper": {},
|
||||
"tavily": {},
|
||||
"exa": {}
|
||||
}
|
||||
|
||||
# Remove any existing API keys from example
|
||||
for provider in ["serper", "tavily", "exa"]:
|
||||
if provider in config:
|
||||
config[provider].pop("api_key", None)
|
||||
|
||||
enabled_providers = []
|
||||
|
||||
# ===== Question 1: Which providers to enable =====
|
||||
print(color("─" * 60, Colors.DIM))
|
||||
print(color("\n📋 Step 1: Choose Your Providers\n", Colors.BOLD))
|
||||
print("Select which search providers you want to enable.")
|
||||
print(color("(You need at least one API key to use this skill)", Colors.DIM))
|
||||
print()
|
||||
|
||||
providers_info = {
|
||||
"serper": ("Serper", "https://serper.dev", "Google results, shopping, local"),
|
||||
"tavily": ("Tavily", "https://tavily.com", "Research, explanations, analysis"),
|
||||
"exa": ("Exa", "https://exa.ai", "Semantic search, similar content"),
|
||||
"you": ("You.com", "https://api.you.com", "RAG applications, real-time info"),
|
||||
"searxng": ("SearXNG", "https://docs.searxng.org/admin/installation.html", "Privacy-first, self-hosted, $0 cost")
|
||||
}
|
||||
|
||||
for provider, (name, url, desc) in providers_info.items():
|
||||
print(f" {color(name, Colors.BOLD)}: {desc}")
|
||||
|
||||
# Special handling for SearXNG
|
||||
if provider == "searxng":
|
||||
print(color(" Note: SearXNG requires a self-hosted instance (no API key needed)", Colors.DIM))
|
||||
if ask_yes_no(f" Do you have a SearXNG instance?", default=False):
|
||||
instance_url = ask_searxng_instance(url)
|
||||
if instance_url:
|
||||
if "searxng" not in config:
|
||||
config["searxng"] = {}
|
||||
config["searxng"]["instance_url"] = instance_url
|
||||
enabled_providers.append(provider)
|
||||
else:
|
||||
print(color(f" → {name} disabled (no instance URL)", Colors.DIM))
|
||||
else:
|
||||
print(color(f" → {name} skipped (no instance)", Colors.DIM))
|
||||
else:
|
||||
if ask_yes_no(f" Enable {name}?", default=True):
|
||||
# ===== Question 2: API key for each enabled provider =====
|
||||
api_key = ask_api_key(name, url)
|
||||
if api_key:
|
||||
config[provider]["api_key"] = api_key
|
||||
enabled_providers.append(provider)
|
||||
else:
|
||||
print(color(f" → {name} disabled (no API key)", Colors.DIM))
|
||||
else:
|
||||
print(color(f" → {name} disabled", Colors.DIM))
|
||||
print()
|
||||
|
||||
if not enabled_providers:
|
||||
print()
|
||||
print(color("⚠️ No providers enabled!", Colors.RED))
|
||||
print("You need at least one API key to use web-search-plus.")
|
||||
print("Run this setup again when you have an API key.")
|
||||
return False
|
||||
|
||||
# ===== Question 3: Default provider =====
|
||||
print(color("─" * 60, Colors.DIM))
|
||||
print(color("\n⚙️ Step 2: Default Settings\n", Colors.BOLD))
|
||||
|
||||
if len(enabled_providers) > 1:
|
||||
default_provider = ask_choice(
|
||||
"Which provider should be the default for general queries?",
|
||||
enabled_providers,
|
||||
enabled_providers[0]
|
||||
)
|
||||
else:
|
||||
default_provider = enabled_providers[0]
|
||||
print(f"Default provider: {color(default_provider, Colors.GREEN)} (only one enabled)")
|
||||
|
||||
config["defaults"]["provider"] = default_provider
|
||||
config["auto_routing"]["fallback_provider"] = default_provider
|
||||
|
||||
# ===== Question 4: Auto-routing =====
|
||||
print()
|
||||
print(color("Auto-routing", Colors.BOLD) + " automatically picks the best provider for each query:")
|
||||
print(color(" • 'iPhone price' → Serper (shopping intent)", Colors.DIM))
|
||||
print(color(" • 'how does TCP work' → Tavily (research intent)", Colors.DIM))
|
||||
print(color(" • 'companies like Stripe' → Exa (discovery intent)", Colors.DIM))
|
||||
print()
|
||||
|
||||
auto_routing = ask_yes_no("Enable auto-routing?", default=True)
|
||||
config["auto_routing"]["enabled"] = auto_routing
|
||||
|
||||
if not auto_routing:
|
||||
print(color(f" → All queries will use {default_provider}", Colors.DIM))
|
||||
|
||||
# ===== Question 5: Result count =====
|
||||
print()
|
||||
max_results = ask_result_count()
|
||||
config["defaults"]["max_results"] = max_results
|
||||
|
||||
# Set disabled providers
|
||||
all_providers = ["serper", "tavily", "exa", "you", "searxng"]
|
||||
disabled = [p for p in all_providers if p not in enabled_providers]
|
||||
config["auto_routing"]["disabled_providers"] = disabled
|
||||
|
||||
# ===== Save config =====
|
||||
print()
|
||||
print(color("─" * 60, Colors.DIM))
|
||||
print(color("\n💾 Saving Configuration\n", Colors.BOLD))
|
||||
|
||||
with open(config_path, 'w') as f:
|
||||
json.dump(config, f, indent=2)
|
||||
|
||||
print(color(f"✓ Configuration saved to: {config_path}", Colors.GREEN))
|
||||
print()
|
||||
|
||||
# ===== Summary =====
|
||||
print(color("📋 Configuration Summary:", Colors.BOLD))
|
||||
print(f" Enabled providers: {', '.join(enabled_providers)}")
|
||||
print(f" Default provider: {default_provider}")
|
||||
print(f" Auto-routing: {'enabled' if auto_routing else 'disabled'}")
|
||||
print(f" Results per search: {max_results}")
|
||||
print()
|
||||
|
||||
# ===== Test suggestion =====
|
||||
print(color("🚀 Ready to search! Try:", Colors.BOLD))
|
||||
print(color(f" python3 scripts/search.py -q \"your query here\"", Colors.CYAN))
|
||||
print()
|
||||
|
||||
return True
|
||||
|
||||
def check_first_run(skill_dir: Path) -> bool:
|
||||
"""Check if this is the first run (no config.json)."""
|
||||
config_path = skill_dir / "config.json"
|
||||
return not config_path.exists()
|
||||
|
||||
def main():
|
||||
# Determine skill directory
|
||||
script_path = Path(__file__).resolve()
|
||||
skill_dir = script_path.parent.parent
|
||||
|
||||
# Check for --reset flag
|
||||
force_reset = "--reset" in sys.argv
|
||||
|
||||
# Check for --check flag (just check if setup needed)
|
||||
if "--check" in sys.argv:
|
||||
if check_first_run(skill_dir):
|
||||
print("Setup required: config.json not found")
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("Setup complete: config.json exists")
|
||||
sys.exit(0)
|
||||
|
||||
# Run setup
|
||||
success = run_setup(skill_dir, force_reset)
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
20
test-auto-routing.sh
Normal file
20
test-auto-routing.sh
Normal file
@@ -0,0 +1,20 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Test Auto-Routing Feature
|
||||
# Tests various query types to verify routing works correctly
|
||||
|
||||
# Load from environment or .env file
|
||||
if [ -f .env ]; then
|
||||
source .env
|
||||
fi
|
||||
|
||||
# Check required keys
|
||||
if [ -z "$SERPER_API_KEY" ]; then
|
||||
echo "Error: SERPER_API_KEY not set. Copy .env.example to .env and add your keys."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Testing auto-routing..."
|
||||
python3 scripts/search.py -q "buy iPhone 15 price" --auto
|
||||
python3 scripts/search.py -q "how does quantum computing work" --auto
|
||||
python3 scripts/search.py -q "companies like Stripe" --auto
|
||||
Reference in New Issue
Block a user