Files
robbyczgw-cla_web-search-plus/FAQ.md

9.1 KiB

Frequently Asked Questions

Caching (NEW in v2.7.0!)

How does caching work?

Search results are automatically cached locally for 1 hour (3600 seconds). When you make the same query again, you get instant results at $0 API cost. The cache key is based on: query text + provider + max_results.

Where are cached results stored?

In .cache/ directory inside the skill folder by default. Override with WSP_CACHE_DIR environment variable:

export WSP_CACHE_DIR="/path/to/custom/cache"

How do I see cache stats?

python3 scripts/search.py --cache-stats

This shows total entries, size, oldest/newest entries, and breakdown by provider.

How do I clear the cache?

python3 scripts/search.py --clear-cache

Can I change the cache TTL?

Yes! Default is 3600 seconds (1 hour). Set a custom TTL per request:

python3 scripts/search.py -q "query" --cache-ttl 7200  # 2 hours

How do I skip the cache?

Use --no-cache to always fetch fresh results:

python3 scripts/search.py -q "query" --no-cache

How do I know if a result was cached?

The response includes:

  • "cached": true/false — whether result came from cache
  • "cache_age_seconds": 1234 — how old the cached result is (when cached)

General

How does auto-routing decide which provider to use?

Multi-signal analysis scores each provider based on: price patterns, explanation phrases, similarity keywords, URLs, product+brand combos, and query complexity. Highest score wins. Use --explain-routing to see the decision breakdown.

What if it picks the wrong provider?

Override with -p serper/tavily/exa. Check --explain-routing to understand why it chose differently.

What does "low confidence" mean?

Query is ambiguous (e.g., "Tesla" could be cars, stock, or company). Falls back to Serper. Results may vary.

Can I disable a provider?

Yes! In config.json: "disabled_providers": ["exa"]


API Keys

Which API keys do I need?

At minimum ONE key (or SearXNG instance). You can use just Serper, just Tavily, just Exa, just You.com, or just SearXNG. Missing keys = that provider is skipped.

Where do I get API keys?

How do I set API keys?

Two options (both auto-load):

Option A: .env file

export SERPER_API_KEY="your-key"

Option B: config.json (v2.2.1+)

{ "serper": { "api_key": "your-key" } }

Routing Details

Check routing.provider in JSON output, or [🔍 Searched with: Provider] in chat responses.

Why does it sometimes choose Serper for research questions?

If the query has brand/product signals (e.g., "how does Tesla FSD work"), shopping intent may outweigh research intent. Override with -p tavily.

What's the confidence threshold?

Default: 0.3 (30%). Below this = low confidence, uses fallback. Adjustable in config.json.


You.com Specific

When should I use You.com over other providers?

You.com excels at:

  • RAG applications: Pre-extracted snippets ready for LLM consumption
  • Real-time information: Current events, breaking news, status updates
  • Combined sources: Web + news results in a single API call
  • Summarization tasks: "What's the latest on...", "Key points about..."

What's the livecrawl feature?

You.com can fetch full page content on-demand. Use --livecrawl web for web results, --livecrawl news for news articles, or --livecrawl all for both. Content is returned in Markdown format.

Does You.com include news automatically?

Yes! You.com's intelligent classification automatically includes relevant news results when your query has news intent. You can also use --include-news to explicitly enable it.


SearXNG Specific

Do I need my own SearXNG instance?

Yes! SearXNG is self-hosted. Most public instances disable the JSON API to prevent bot abuse. You need to run your own instance with JSON format enabled. See: https://docs.searxng.org/admin/installation.html

How do I set up SearXNG?

Docker is the easiest way:

docker run -d -p 8080:8080 searxng/searxng

Then enable JSON in settings.yml:

search:
  formats:
    - html
    - json

Why am I getting "403 Forbidden"?

The JSON API is disabled on your instance. Enable it in settings.yml under search.formats.

What's the API cost for SearXNG?

$0! SearXNG is free and open-source. You only pay for hosting (~$5/month VPS). Unlimited queries.

When should I use SearXNG?

  • Privacy-sensitive queries: No tracking, no profiling
  • Budget-conscious: $0 API cost
  • Diverse results: Aggregates 70+ search engines
  • Self-hosted requirements: Full control over your search infrastructure
  • Fallback provider: When paid APIs are rate-limited

Can I limit which search engines SearXNG uses?

Yes! Use --engines google,bing,duckduckgo to specify engines, or configure defaults in config.json.


Provider Selection

Which provider should I use?

Query Type Best Provider Why
Shopping ("buy laptop", "cheap shoes") Serper Google Shopping, price comparisons, local stores
Research ("how does X work?", "explain Y") Tavily Deep research, academic quality, full-page content
Startups/Papers ("companies like X", "arxiv papers") Exa Semantic/neural search, startup discovery
RAG/Real-time ("summarize latest", "current events") You.com LLM-ready snippets, combined web+news
Privacy ("search without tracking") SearXNG No tracking, multi-source, self-hosted

Tip: Enable auto-routing and let the skill choose automatically! 🎯

Do I need all 5 providers?

No! All providers are optional. You can use:

  • 1 provider (e.g., just Serper for everything)
  • 2-3 providers (e.g., Serper + You.com for most needs)
  • All 5 (maximum flexibility + fallback options)

How much do the APIs cost?

Provider Free Tier Paid Plan
Serper 2,500 queries/mo $50/mo (5,000 queries)
Tavily 1,000 queries/mo $150/mo (10,000 queries)
Exa 1,000 queries/mo $1,000/mo (100,000 queries)
You.com Limited free ~$10/mo (varies by usage)
SearXNG FREE Only VPS cost (~$5/mo if self-hosting)

Budget tip: Use SearXNG as primary + others as fallback for specialized queries!

How private is SearXNG really?

Setup Privacy Level
Self-hosted (your VPS) You control everything
Self-hosted (Docker local) Fully private
Public instance Depends on operator's logging policy

Best practice: Self-host if privacy is critical.

Which provider has the best results?

Metric Winner
Most accurate for facts Serper (Google)
Best for research depth Tavily
Best for semantic queries Exa
Best for RAG/AI context You.com
Most diverse sources SearXNG (70+ engines)
Most private SearXNG (self-hosted)

Recommendation: Enable multiple providers + auto-routing for best overall experience.

How does auto-routing work?

The skill analyzes your query for keywords and patterns:

"buy cheap laptop"      Serper (shopping signals)
"how does AI work?"     Tavily (research/explanation)
"companies like X"      Exa (semantic/similar)
"summarize latest news"  You.com (RAG/real-time)
"search privately"      SearXNG (privacy signals)

Confidence threshold: Only routes if confidence > 30%. Otherwise uses default provider.

Override: Use -p provider to force a specific provider.


Production Use

Can I use this in production?

Yes! Web-search-plus is production-ready:

  • Error handling with automatic fallback
  • Rate limit protection
  • Timeout handling (30s per provider)
  • API key security (.env + config.json gitignored)
  • 5 providers for redundancy

Tip: Monitor API usage to avoid exceeding free tiers!

What if I run out of API credits?

  1. Fallback chain: Other enabled providers automatically take over
  2. Use SearXNG: Switch to self-hosted (unlimited queries)
  3. Upgrade plan: Paid tiers have higher limits
  4. Rate limit: Use disabled_providers to skip exhausted APIs temporarily

Updates

How do I update to the latest version?

Via ClawHub (recommended):

clawhub update web-search-plus --registry "https://www.clawhub.ai" --no-input

Manually:

cd /path/to/workspace/skills/web-search-plus/
git pull origin main
python3 scripts/setup.py  # Re-run to configure new features

Where can I report bugs or request features?