Files

264 lines
9.1 KiB
Markdown
Raw Permalink Normal View History

# Frequently Asked Questions
## Caching (NEW in v2.7.0!)
### How does caching work?
Search results are automatically cached locally for 1 hour (3600 seconds). When you make the same query again, you get instant results at $0 API cost. The cache key is based on: query text + provider + max_results.
### Where are cached results stored?
In `.cache/` directory inside the skill folder by default. Override with `WSP_CACHE_DIR` environment variable:
```bash
export WSP_CACHE_DIR="/path/to/custom/cache"
```
### How do I see cache stats?
```bash
python3 scripts/search.py --cache-stats
```
This shows total entries, size, oldest/newest entries, and breakdown by provider.
### How do I clear the cache?
```bash
python3 scripts/search.py --clear-cache
```
### Can I change the cache TTL?
Yes! Default is 3600 seconds (1 hour). Set a custom TTL per request:
```bash
python3 scripts/search.py -q "query" --cache-ttl 7200 # 2 hours
```
### How do I skip the cache?
Use `--no-cache` to always fetch fresh results:
```bash
python3 scripts/search.py -q "query" --no-cache
```
### How do I know if a result was cached?
The response includes:
- `"cached": true/false` — whether result came from cache
- `"cache_age_seconds": 1234` — how old the cached result is (when cached)
---
## General
### How does auto-routing decide which provider to use?
Multi-signal analysis scores each provider based on: price patterns, explanation phrases, similarity keywords, URLs, product+brand combos, and query complexity. Highest score wins. Use `--explain-routing` to see the decision breakdown.
### What if it picks the wrong provider?
Override with `-p serper/tavily/exa`. Check `--explain-routing` to understand why it chose differently.
### What does "low confidence" mean?
Query is ambiguous (e.g., "Tesla" could be cars, stock, or company). Falls back to Serper. Results may vary.
### Can I disable a provider?
Yes! In config.json: `"disabled_providers": ["exa"]`
---
## API Keys
### Which API keys do I need?
At minimum ONE key (or SearXNG instance). You can use just Serper, just Tavily, just Exa, just You.com, or just SearXNG. Missing keys = that provider is skipped.
### Where do I get API keys?
- Serper: https://serper.dev (2,500 free queries, no credit card)
- Tavily: https://tavily.com (1,000 free searches/month)
- Exa: https://exa.ai (1,000 free searches/month)
- You.com: https://api.you.com (Limited free tier for testing)
- SearXNG: Self-hosted, no key needed! https://docs.searxng.org/admin/installation.html
### How do I set API keys?
Two options (both auto-load):
**Option A: .env file**
```bash
export SERPER_API_KEY="your-key"
```
**Option B: config.json** (v2.2.1+)
```json
{ "serper": { "api_key": "your-key" } }
```
---
## Routing Details
### How do I know which provider handled my search?
Check `routing.provider` in JSON output, or `[🔍 Searched with: Provider]` in chat responses.
### Why does it sometimes choose Serper for research questions?
If the query has brand/product signals (e.g., "how does Tesla FSD work"), shopping intent may outweigh research intent. Override with `-p tavily`.
### What's the confidence threshold?
Default: 0.3 (30%). Below this = low confidence, uses fallback. Adjustable in config.json.
---
## You.com Specific
### When should I use You.com over other providers?
You.com excels at:
- **RAG applications**: Pre-extracted snippets ready for LLM consumption
- **Real-time information**: Current events, breaking news, status updates
- **Combined sources**: Web + news results in a single API call
- **Summarization tasks**: "What's the latest on...", "Key points about..."
### What's the livecrawl feature?
You.com can fetch full page content on-demand. Use `--livecrawl web` for web results, `--livecrawl news` for news articles, or `--livecrawl all` for both. Content is returned in Markdown format.
### Does You.com include news automatically?
Yes! You.com's intelligent classification automatically includes relevant news results when your query has news intent. You can also use `--include-news` to explicitly enable it.
---
## SearXNG Specific
### Do I need my own SearXNG instance?
Yes! SearXNG is self-hosted. Most public instances disable the JSON API to prevent bot abuse. You need to run your own instance with JSON format enabled. See: https://docs.searxng.org/admin/installation.html
### How do I set up SearXNG?
Docker is the easiest way:
```bash
docker run -d -p 8080:8080 searxng/searxng
```
Then enable JSON in `settings.yml`:
```yaml
search:
formats:
- html
- json
```
### Why am I getting "403 Forbidden"?
The JSON API is disabled on your instance. Enable it in `settings.yml` under `search.formats`.
### What's the API cost for SearXNG?
**$0!** SearXNG is free and open-source. You only pay for hosting (~$5/month VPS). Unlimited queries.
### When should I use SearXNG?
- **Privacy-sensitive queries**: No tracking, no profiling
- **Budget-conscious**: $0 API cost
- **Diverse results**: Aggregates 70+ search engines
- **Self-hosted requirements**: Full control over your search infrastructure
- **Fallback provider**: When paid APIs are rate-limited
### Can I limit which search engines SearXNG uses?
Yes! Use `--engines google,bing,duckduckgo` to specify engines, or configure defaults in `config.json`.
---
## Provider Selection
### Which provider should I use?
| Query Type | Best Provider | Why |
|------------|---------------|-----|
| **Shopping** ("buy laptop", "cheap shoes") | **Serper** | Google Shopping, price comparisons, local stores |
| **Research** ("how does X work?", "explain Y") | **Tavily** | Deep research, academic quality, full-page content |
| **Startups/Papers** ("companies like X", "arxiv papers") | **Exa** | Semantic/neural search, startup discovery |
| **RAG/Real-time** ("summarize latest", "current events") | **You.com** | LLM-ready snippets, combined web+news |
| **Privacy** ("search without tracking") | **SearXNG** | No tracking, multi-source, self-hosted |
**Tip:** Enable auto-routing and let the skill choose automatically! 🎯
### Do I need all 5 providers?
**No!** All providers are optional. You can use:
- **1 provider** (e.g., just Serper for everything)
- **2-3 providers** (e.g., Serper + You.com for most needs)
- **All 5** (maximum flexibility + fallback options)
### How much do the APIs cost?
| Provider | Free Tier | Paid Plan |
|----------|-----------|-----------|
| **Serper** | 2,500 queries/mo | $50/mo (5,000 queries) |
| **Tavily** | 1,000 queries/mo | $150/mo (10,000 queries) |
| **Exa** | 1,000 queries/mo | $1,000/mo (100,000 queries) |
| **You.com** | Limited free | ~$10/mo (varies by usage) |
| **SearXNG** | **FREE** ✅ | Only VPS cost (~$5/mo if self-hosting) |
**Budget tip:** Use SearXNG as primary + others as fallback for specialized queries!
### How private is SearXNG really?
| Setup | Privacy Level |
|-------|---------------|
| **Self-hosted (your VPS)** | ⭐⭐⭐⭐⭐ You control everything |
| **Self-hosted (Docker local)** | ⭐⭐⭐⭐⭐ Fully private |
| **Public instance** | ⭐⭐⭐ Depends on operator's logging policy |
**Best practice:** Self-host if privacy is critical.
### Which provider has the best results?
| Metric | Winner |
|--------|--------|
| **Most accurate for facts** | Serper (Google) |
| **Best for research depth** | Tavily |
| **Best for semantic queries** | Exa |
| **Best for RAG/AI context** | You.com |
| **Most diverse sources** | SearXNG (70+ engines) |
| **Most private** | SearXNG (self-hosted) |
**Recommendation:** Enable multiple providers + auto-routing for best overall experience.
### How does auto-routing work?
The skill analyzes your query for keywords and patterns:
```python
"buy cheap laptop" → Serper (shopping signals)
"how does AI work?" → Tavily (research/explanation)
"companies like X" → Exa (semantic/similar)
"summarize latest news" → You.com (RAG/real-time)
"search privately" → SearXNG (privacy signals)
```
**Confidence threshold:** Only routes if confidence > 30%. Otherwise uses default provider.
**Override:** Use `-p provider` to force a specific provider.
---
## Production Use
### Can I use this in production?
**Yes!** Web-search-plus is production-ready:
- ✅ Error handling with automatic fallback
- ✅ Rate limit protection
- ✅ Timeout handling (30s per provider)
- ✅ API key security (.env + config.json gitignored)
- ✅ 5 providers for redundancy
**Tip:** Monitor API usage to avoid exceeding free tiers!
### What if I run out of API credits?
1. **Fallback chain:** Other enabled providers automatically take over
2. **Use SearXNG:** Switch to self-hosted (unlimited queries)
3. **Upgrade plan:** Paid tiers have higher limits
4. **Rate limit:** Use `disabled_providers` to skip exhausted APIs temporarily
---
## Updates
### How do I update to the latest version?
**Via ClawHub (recommended):**
```bash
clawhub update web-search-plus --registry "https://www.clawhub.ai" --no-input
```
**Manually:**
```bash
cd /path/to/workspace/skills/web-search-plus/
git pull origin main
python3 scripts/setup.py # Re-run to configure new features
```
### Where can I report bugs or request features?
- **GitHub Issues:** https://github.com/robbyczgw-cla/web-search-plus/issues
- **ClawHub:** https://www.clawhub.ai/skills/web-search-plus