🌐 Web Pilot — OpenClaw Skill

Ko-fi

A web search, page reading, and browser automation skill for OpenClaw. No API keys required.

Accessibility

This skill enables AI agents to read, navigate, and interact with the web on behalf of users — making it a powerful accessibility tool for people with visual impairments, motor disabilities, or cognitive challenges.

  • Screen reading on steroids — extracts clean, structured text from any webpage, stripping away visual clutter, ads, and navigation noise
  • Voice-driven browsing — when paired with an AI assistant, users can browse the web entirely through natural language ("scroll down", "click Sign In", "read me the Overview section")
  • Targeted content extraction — grab specific sections, search for text, or screenshot regions without needing to visually scan a page
  • Form interaction — fill inputs and submit forms via commands, removing the need for precise mouse/keyboard control
  • Cookie banner removal — automatically dismisses consent popups that are notoriously difficult for screen readers

Features

  • Web Search — Multi-engine (DuckDuckGo, Brave, Google) with pagination
  • Page Reader — Extract clean text from any URL with JS rendering
  • Persistent Browser — Visible or headless browser with 20+ actions
  • Cookie Auto-Dismiss — Automatically clears cookie consent banners
  • File Download — Download files with auto-detection, PDF text extraction
  • Output Formats — JSON, markdown, or plain text
  • Zero API Keys — Everything runs locally
  • Partial Screenshots — Capture viewport, full page, single elements, or ranges between two elements

Requirements

  • Python 3.8+
  • pip install requests beautifulsoup4 playwright Pillow
  • playwright install chromium
  • Optional: pip install pdfplumber for PDF text extraction

Installation

As an OpenClaw Skill

cp -r web-pilot/ $(dirname $(which openclaw))/../lib/node_modules/openclaw/skills/web-pilot

Standalone

git clone https://github.com/LiranUdi/web-pilot.git
cd web-pilot
pip install requests beautifulsoup4 playwright Pillow
playwright install chromium

Usage

1. Search the Web

python3 scripts/google_search.py "search term" --pages 3 --engine brave
Flag Description Default
--pages N Result pages (~10 results each) 1
--engine duckduckgo, brave, or google duckduckgo

Engine notes:

  • duckduckgo — Most reliable, no CAPTCHA
  • brave — More results per page, broader sources
  • google — Often blocked by CAPTCHA; last resort

2. Read a Page

python3 scripts/read_page.py "https://example.com" --max-chars 10000 --format markdown
Flag Description Default
--max-chars N Max characters to extract 50000
--visible Show browser window off
--format json, markdown, or text json
--no-dismiss Skip cookie consent auto-dismiss off

3. Persistent Browser Session

The browser session is a long-running process that stays open between commands, enabling stateful multi-step browsing.

# Open a page (flags: --headless, --proxy <url>, --user-agent <string>)
python3 scripts/browser_session.py open "https://example.com"
python3 scripts/browser_session.py open "https://example.com" --headless --user-agent "MyBot/1.0"

# Check current state
python3 scripts/browser_session.py status

# Navigate (returns response status, final URL, load time)
python3 scripts/browser_session.py navigate "https://other-site.com"

# Extract content in different formats
python3 scripts/browser_session.py extract --format markdown

# Scroll
python3 scripts/browser_session.py scroll down
python3 scripts/browser_session.py scroll up
python3 scripts/browser_session.py scroll "#section-id"   # scroll to element

# Wait
python3 scripts/browser_session.py wait 2                  # wait 2 seconds
python3 scripts/browser_session.py wait ".loading-done"    # wait for element

# Fill forms
python3 scripts/browser_session.py fill "input[name=q]" "search term"
python3 scripts/browser_session.py fill "input[name=q]" "search term" --submit

# Navigation history
python3 scripts/browser_session.py back
python3 scripts/browser_session.py forward
python3 scripts/browser_session.py reload

# Execute JavaScript
python3 scripts/browser_session.py eval "document.title"

# Extract all links
python3 scripts/browser_session.py links

# Screenshots
python3 scripts/browser_session.py screenshot /tmp/page.png              # viewport
python3 scripts/browser_session.py screenshot /tmp/full.png --full       # full page
python3 scripts/browser_session.py screenshot /tmp/el.png --element "h1" # single element
python3 scripts/browser_session.py screenshot /tmp/range.png --from "#Overview" --to "#end"  # range

# Export page as PDF (headless only)
python3 scripts/browser_session.py pdf /tmp/page.pdf

# Click elements
python3 scripts/browser_session.py click "Sign In"
python3 scripts/browser_session.py click "#submit-btn"

# Search for text in the page
python3 scripts/browser_session.py search "pricing"

# Tab management
python3 scripts/browser_session.py tab new "https://docs.example.com"
python3 scripts/browser_session.py tab list
python3 scripts/browser_session.py tab switch 0
python3 scripts/browser_session.py tab close 1

# Dismiss cookie banners
python3 scripts/browser_session.py dismiss-cookies

# Close
python3 scripts/browser_session.py close

4. Download Files

python3 scripts/download_file.py "https://example.com/report.pdf" --output ~/docs
Flag Description Default
--output DIR Save directory /tmp/downloads
--filename Override filename auto-detected

For PDFs, returns extracted_text if pdfplumber or PyPDF2 is installed.

Architecture

  • Search — HTTP requests to DuckDuckGo/Brave/Google HTML endpoints
  • Page reading — Playwright + Chromium with read-only DOM TreeWalker
  • Browser sessions — Unix socket server with 4-byte length-prefix framing; forked child keeps browser alive, commands return immediately
  • Screenshots — Range mode uses full-page capture + PIL crop for pixel-perfect section captures
  • Cookie dismiss — Tries common selectors and button text patterns (Accept All, Got It, etc.)
  • Downloads — Streams to disk with auto filename detection from headers/URL

Browser Session Reference

Action Description
open <url> Launch browser (flags: --headless, --proxy, --user-agent)
navigate <url> Go to URL (returns status code, final URL, load time)
extract Extract page content (--format json|markdown|text)
screenshot <path> Capture (--full, --element <sel>, --from <sel> --to <sel>)
click <target> Click by CSS selector, text, or button/link role
scroll <dir|sel> Scroll down/up or to a CSS selector
wait <sec|sel> Wait seconds or for element to appear
fill <sel> <val> Fill input field (optional --submit)
back / forward / reload Navigation history
eval <js> Execute JavaScript, return result
links Extract all links (href + text)
search <text> Find text in page content
pdf <path> Export as PDF (headless only)
status Current URL, title, tab count
tab new|list|switch|close Multi-tab management
dismiss-cookies Clear cookie consent banners
close Shut down browser

For AI Agents (OpenClaw / LLM Integration)

Workflow Pattern

  1. Search → get URLs
  2. Read or Open → extract content
  3. Scroll/Click/Navigate/Tab → interact as needed
  4. Search → find specific info in page
  5. Screenshot → capture visual state (viewport, element, or range)
  6. Download → grab linked files
  7. Close → clean up

Important Notes

  • All output defaults to JSON to stdout; use --format for alternatives
  • browser_session.py is stateful — one session at a time, persists between commands
  • read_page.py is stateless — opens/closes browser each call
  • Cookie consent is auto-dismissed on open/navigate
  • Always close browser sessions when done
  • Pillow is required for range screenshots (--from/--to)

Support

If this project is useful to you, consider buying me a coffee

License

MIT

Description
无需API密钥搜索网页和阅读页面内容。
Readme 42 KiB
Languages
Python 100%