skills/liranudi_web-pilot

Fork 0

Files

zlei9 84d54a6ee9 Initial commit with translated description

2026-03-29 10:22:50 +08:00

8.6 KiB

Raw Blame History

🌐 Web Pilot — OpenClaw Skill

A web search, page reading, and browser automation skill for OpenClaw. No API keys required.

♿ Accessibility

This skill enables AI agents to read, navigate, and interact with the web on behalf of users — making it a powerful accessibility tool for people with visual impairments, motor disabilities, or cognitive challenges.

Screen reading on steroids — extracts clean, structured text from any webpage, stripping away visual clutter, ads, and navigation noise
Voice-driven browsing — when paired with an AI assistant, users can browse the web entirely through natural language ("scroll down", "click Sign In", "read me the Overview section")
Targeted content extraction — grab specific sections, search for text, or screenshot regions without needing to visually scan a page
Form interaction — fill inputs and submit forms via commands, removing the need for precise mouse/keyboard control
Cookie banner removal — automatically dismisses consent popups that are notoriously difficult for screen readers

Features

Web Search — Multi-engine (DuckDuckGo, Brave, Google) with pagination
Page Reader — Extract clean text from any URL with JS rendering
Persistent Browser — Visible or headless browser with 20+ actions
Cookie Auto-Dismiss — Automatically clears cookie consent banners
File Download — Download files with auto-detection, PDF text extraction
Output Formats — JSON, markdown, or plain text
Zero API Keys — Everything runs locally
Partial Screenshots — Capture viewport, full page, single elements, or ranges between two elements

Requirements

Python 3.8+
pip install requests beautifulsoup4 playwright Pillow
playwright install chromium
Optional: pip install pdfplumber for PDF text extraction

Installation

As an OpenClaw Skill

cp -r web-pilot/ $(dirname $(which openclaw))/../lib/node_modules/openclaw/skills/web-pilot

Standalone

git clone https://github.com/LiranUdi/web-pilot.git
cd web-pilot
pip install requests beautifulsoup4 playwright Pillow
playwright install chromium

Usage

1. Search the Web

python3 scripts/google_search.py "search term" --pages 3 --engine brave

Flag	Description	Default
`--pages N`	Result pages (~10 results each)	1
`--engine`	`duckduckgo`, `brave`, or `google`	duckduckgo

Engine notes:

duckduckgo — Most reliable, no CAPTCHA
brave — More results per page, broader sources
google — Often blocked by CAPTCHA; last resort

2. Read a Page

python3 scripts/read_page.py "https://example.com" --max-chars 10000 --format markdown

Flag	Description	Default
`--max-chars N`	Max characters to extract	50000
`--visible`	Show browser window	off
`--format`	`json`, `markdown`, or `text`	json
`--no-dismiss`	Skip cookie consent auto-dismiss	off

3. Persistent Browser Session

The browser session is a long-running process that stays open between commands, enabling stateful multi-step browsing.

# Open a page (flags: --headless, --proxy <url>, --user-agent <string>)
python3 scripts/browser_session.py open "https://example.com"
python3 scripts/browser_session.py open "https://example.com" --headless --user-agent "MyBot/1.0"

# Check current state
python3 scripts/browser_session.py status

# Navigate (returns response status, final URL, load time)
python3 scripts/browser_session.py navigate "https://other-site.com"

# Extract content in different formats
python3 scripts/browser_session.py extract --format markdown

# Scroll
python3 scripts/browser_session.py scroll down
python3 scripts/browser_session.py scroll up
python3 scripts/browser_session.py scroll "#section-id"   # scroll to element

# Wait
python3 scripts/browser_session.py wait 2                  # wait 2 seconds
python3 scripts/browser_session.py wait ".loading-done"    # wait for element

# Fill forms
python3 scripts/browser_session.py fill "input[name=q]" "search term"
python3 scripts/browser_session.py fill "input[name=q]" "search term" --submit

# Navigation history
python3 scripts/browser_session.py back
python3 scripts/browser_session.py forward
python3 scripts/browser_session.py reload

# Execute JavaScript
python3 scripts/browser_session.py eval "document.title"

# Extract all links
python3 scripts/browser_session.py links

# Screenshots
python3 scripts/browser_session.py screenshot /tmp/page.png              # viewport
python3 scripts/browser_session.py screenshot /tmp/full.png --full       # full page
python3 scripts/browser_session.py screenshot /tmp/el.png --element "h1" # single element
python3 scripts/browser_session.py screenshot /tmp/range.png --from "#Overview" --to "#end"  # range

# Export page as PDF (headless only)
python3 scripts/browser_session.py pdf /tmp/page.pdf

# Click elements
python3 scripts/browser_session.py click "Sign In"
python3 scripts/browser_session.py click "#submit-btn"

# Search for text in the page
python3 scripts/browser_session.py search "pricing"

# Tab management
python3 scripts/browser_session.py tab new "https://docs.example.com"
python3 scripts/browser_session.py tab list
python3 scripts/browser_session.py tab switch 0
python3 scripts/browser_session.py tab close 1

# Dismiss cookie banners
python3 scripts/browser_session.py dismiss-cookies

# Close
python3 scripts/browser_session.py close

4. Download Files

python3 scripts/download_file.py "https://example.com/report.pdf" --output ~/docs

Flag	Description	Default
`--output DIR`	Save directory	/tmp/downloads
`--filename`	Override filename	auto-detected

For PDFs, returns extracted_text if pdfplumber or PyPDF2 is installed.

Architecture

Search — HTTP requests to DuckDuckGo/Brave/Google HTML endpoints
Page reading — Playwright + Chromium with read-only DOM TreeWalker
Browser sessions — Unix socket server with 4-byte length-prefix framing; forked child keeps browser alive, commands return immediately
Screenshots — Range mode uses full-page capture + PIL crop for pixel-perfect section captures
Cookie dismiss — Tries common selectors and button text patterns (Accept All, Got It, etc.)
Downloads — Streams to disk with auto filename detection from headers/URL

Browser Session Reference

Action	Description
`open <url>`	Launch browser (flags: `--headless`, `--proxy`, `--user-agent`)
`navigate <url>`	Go to URL (returns status code, final URL, load time)
`extract`	Extract page content (`--format json\|markdown\|text`)
`screenshot <path>`	Capture (`--full`, `--element <sel>`, `--from <sel> --to <sel>`)
`click <target>`	Click by CSS selector, text, or button/link role
`scroll <dir\|sel>`	Scroll down/up or to a CSS selector
`wait <sec\|sel>`	Wait seconds or for element to appear
`fill <sel> <val>`	Fill input field (optional `--submit`)
`back` / `forward` / `reload`	Navigation history
`eval <js>`	Execute JavaScript, return result
`links`	Extract all links (href + text)
`search <text>`	Find text in page content
`pdf <path>`	Export as PDF (headless only)
`status`	Current URL, title, tab count
`tab new\|list\|switch\|close`	Multi-tab management
`dismiss-cookies`	Clear cookie consent banners
`close`	Shut down browser

For AI Agents (OpenClaw / LLM Integration)

Workflow Pattern

Search → get URLs
Read or Open → extract content
Scroll/Click/Navigate/Tab → interact as needed
Search → find specific info in page
Screenshot → capture visual state (viewport, element, or range)
Download → grab linked files
Close → clean up

Important Notes

All output defaults to JSON to stdout; use --format for alternatives
browser_session.py is stateful — one session at a time, persists between commands
read_page.py is stateless — opens/closes browser each call
Cookie consent is auto-dismissed on open/navigate
Always close browser sessions when done
Pillow is required for range screenshots (--from/--to)

Support

If this project is useful to you, consider buying me a coffee ☕

License

MIT

8.6 KiB Raw Blame History