commit 48cba2e57444f343c15c5f12ddb22ac29341a548 Author: zlei9 Date: Sun Mar 29 13:17:20 2026 +0800 Initial commit with translated description diff --git a/EXAMPLES.md b/EXAMPLES.md new file mode 100644 index 0000000..1c466ec --- /dev/null +++ b/EXAMPLES.md @@ -0,0 +1,109 @@ +# Browser Automation Examples + +Common browser automation workflows using the `browse` CLI. Each example demonstrates a distinct pattern using real commands. + +## Example 1: Extract Data from a Page + +**User request**: "Get the product details from example.com/product/123" + +```bash +browse open https://example.com/product/123 +browse snapshot # read page structure + element refs +browse get text "body" # extract all visible text content +browse stop +``` + +Parse the text output to extract structured data (name, price, description, etc.). + +For a specific section, use a CSS selector: + +```bash +browse get text ".product-details" # text from a specific container +``` + +**Note**: `browse get text` requires a CSS selector — use `"body"` for all page text. + +## Example 2: Fill and Submit a Form + +**User request**: "Fill out the contact form on example.com with my information" + +```bash +browse open https://example.com/contact +browse snapshot # find form fields and their refs +browse click @0-3 # click the Name input (ref from snapshot) +browse type "John Doe" +browse press Tab # move to next field +browse type "john@example.com" +browse fill "#message" "I would like to inquire about your services" +browse snapshot # verify fields are filled +browse click @0-8 # click Submit button (ref from snapshot) +browse snapshot # confirm submission result +browse stop +``` + +**Key pattern**: Use `browse snapshot` before interacting to discover element refs, then `browse click ` and `browse type` to interact. + +## Example 3: Multi-Step Navigation + +**User request**: "Get headlines from the first 3 pages of results on example.com/news" + +```bash +browse open https://example.com/news +browse snapshot # read page 1 content +browse get text ".headline" # extract headlines + +browse snapshot # find "Next" button ref +browse click @0-12 # click Next (ref from snapshot) +browse wait load # wait for page 2 to load +browse get text ".headline" # extract page 2 headlines + +browse snapshot # find Next again (ref may change) +browse click @0-15 # click Next +browse wait load +browse get text ".headline" # extract page 3 headlines + +browse stop +``` + +**Key pattern**: Re-run `browse snapshot` after each navigation because element refs change when the page updates. + +## Example 4: Escalate to Remote Mode + +**User request**: "Scrape pricing from competitor.com" (a site with Cloudflare protection) + +```bash +# Attempt 1: local mode +browse open https://competitor.com/pricing +browse snapshot +# Output shows: "Checking your browser..." (Cloudflare interstitial) +# or: page content is empty / access denied +browse stop +``` + +The agent detects bot protection and tells the user: + +> This site has Cloudflare bot detection. Browserbase remote mode can bypass this with anti-bot stealth and residential proxies. Want me to set it up? + +If the user agrees: + +```bash +# Set Browserbase credentials +export BROWSERBASE_API_KEY="bb_live_..." +export BROWSERBASE_PROJECT_ID="proj_..." + +# Retry in remote mode +browse env remote +browse open https://competitor.com/pricing +browse snapshot # full page content now accessible +browse get text ".pricing-table" +browse stop +``` + +## Tips + +- **Snapshot first**: Always run `browse snapshot` before interacting — it gives you the accessibility tree with element refs +- **Use refs to click**: `browse click @0-5` is more reliable than trying to describe elements +- **Re-snapshot after actions**: Element refs change when the page updates +- **`get text` for data extraction**: Use `browse get text [selector]` to pull text content from specific elements +- **`stop` when done**: Always `browse stop` to clean up the browser session +- **Prefer snapshot over screenshot**: Snapshot is fast and structured; screenshot is slow and uses vision tokens. Only screenshot when you need visual context (layout, images, debugging) diff --git a/LICENSE.txt b/LICENSE.txt new file mode 100644 index 0000000..f2f4397 --- /dev/null +++ b/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Browserbase, Inc. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/REFERENCE.md b/REFERENCE.md new file mode 100644 index 0000000..d77a4a8 --- /dev/null +++ b/REFERENCE.md @@ -0,0 +1,432 @@ +# Browser Automation CLI Reference + +Technical reference for the `browse` CLI tool. + +## Table of Contents + +- [Architecture](#architecture) +- [Command Reference](#command-reference) + - [Navigation](#navigation) + - [Page State](#page-state) + - [Interaction](#interaction) + - [Session Management](#session-management) + - [JavaScript Evaluation](#javascript-evaluation) + - [Viewport](#viewport) + - [Network Capture](#network-capture) +- [Configuration](#configuration) + - [Global Flags](#global-flags) + - [Environment Variables](#environment-variables) +- [Error Messages](#error-messages) + +## Architecture + +The browse CLI is a **daemon-based** command-line tool: + +- **Daemon process**: A background process manages the browser instance. Auto-starts on the first command (e.g., `browse open`), persists across commands, and stops with `browse stop`. +- **Local mode** (default): Launches a local Chrome/Chromium instance. +- **Remote mode** (Browserbase): Connects to a Browserbase cloud browser session when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set. +- **Accessibility-first**: Use `browse snapshot` to get the page's accessibility tree with element refs, then interact using those refs. + +## Command Reference + +### Navigation + +#### `open ` + +Navigate to a URL. Alias: `goto`. Auto-starts the daemon if not running. + +```bash +browse open https://example.com +browse open https://example.com --wait networkidle # wait for all network requests to finish (useful for SPAs) +browse open https://example.com --wait domcontentloaded +``` + +The `--wait` flag controls when navigation is considered complete. Values: `load` (default), `domcontentloaded`, `networkidle`. Use `networkidle` for JavaScript-heavy pages that fetch data after initial load. + +#### `reload` + +Reload the current page. + +```bash +browse reload +``` + +#### `back` / `forward` + +Navigate browser history. + +```bash +browse back +browse forward +``` + +--- + +### Page State + +#### `snapshot` + +Get the accessibility tree with interactive element refs. This is the primary way to understand page structure. + +```bash +browse snapshot +browse snapshot --compact # tree only, no ref maps +``` + +Returns a text representation of the page with refs like `@0-5` that can be passed to `click`. Use `--compact` for shorter output when you only need the tree. + +#### `screenshot [path]` + +Take a visual screenshot. Slower than snapshot and uses vision tokens. + +```bash +browse screenshot # auto-generated path +browse screenshot ./capture.png # custom path +browse screenshot --full-page # capture entire scrollable page +``` + +#### `get [selector]` + +Get page properties. Available properties: `url`, `title`, `text`, `html`, `value`, `box`, `visible`, `checked`. + +```bash +browse get url # current URL +browse get title # page title +browse get text "body" # all visible text (selector required) +browse get text ".product-info" # text within a CSS selector +browse get html "#main" # inner HTML of an element +browse get value "#email-input" # value of a form field +browse get box "#header" # bounding box (centroid coordinates) +browse get visible ".modal" # check if element is visible +browse get checked "#agree" # check if checkbox/radio is checked +``` + +**Note**: `get text` requires a CSS selector argument — use `"body"` for full page text. + +#### `refs` + +Show the cached ref map from the last `browse snapshot`. Useful for looking up element refs without re-running a full snapshot. + +```bash +browse refs +``` + +--- + +### Interaction + +#### `click ` + +Click an element by its ref from `browse snapshot` output. + +```bash +browse click @0-5 # click element with ref 0-5 +``` + +#### `click_xy ` + +Click at exact viewport coordinates. + +```bash +browse click_xy 500 300 +``` + +#### `hover ` + +Hover at viewport coordinates. + +```bash +browse hover 500 300 +``` + +#### `type ` + +Type text into the currently focused element. + +```bash +browse type "Hello, world!" +browse type "slow typing" --delay 100 # 100ms between keystrokes +browse type "human-like" --mistakes # simulate human typing with typos +``` + +#### `fill ` + +Fill an input element matching a CSS selector and press Enter. + +```bash +browse fill "#search" "browser automation" +browse fill "input[name=email]" "user@example.com" +browse fill "#search" "query" --no-press-enter # fill without pressing Enter +``` + +#### `select ` + +Select option(s) from a dropdown. + +```bash +browse select "#country" "United States" +browse select "#tags" "javascript" "typescript" # multi-select +``` + +#### `press ` + +Press a keyboard key or key combination. + +```bash +browse press Enter +browse press Tab +browse press Escape +browse press Cmd+A # select all (Mac) +browse press Ctrl+C # copy (Linux/Windows) +``` + +#### `scroll ` + +Scroll at a given position by a given amount. + +```bash +browse scroll 500 300 0 -300 # scroll up at (500, 300) +browse scroll 500 300 0 500 # scroll down +``` + +#### `drag ` + +Drag from one viewport coordinate to another. + +```bash +browse drag 80 80 310 100 # drag with default 10 steps +browse drag 80 80 310 100 --steps 20 # more intermediate steps +browse drag 80 80 310 100 --delay 50 # 50ms between steps +browse drag 80 80 310 100 --button right # use right mouse button +browse drag 80 80 310 100 --xpath # return source/target XPaths +``` + +#### `highlight ` + +Highlight an element on the page for visual debugging. + +```bash +browse highlight "#submit-btn" # highlight for 2 seconds (default) +browse highlight ".nav" -d 5000 # highlight for 5 seconds +``` + +#### `is ` + +Check element state. Available checks: `visible`, `checked`. + +```bash +browse is visible ".modal" # returns { visible: true/false } +browse is checked "#agree" # returns { checked: true/false } +``` + +#### `wait [arg]` + +Wait for a condition. + +```bash +browse wait load # wait for page load +browse wait "selector" ".results" # wait for element to appear +browse wait timeout 3000 # wait 3 seconds +``` + +--- + +### Session Management + +#### `start` + +Start the browser daemon manually. Usually not needed — the daemon auto-starts on first command. + +```bash +browse start +``` + +#### `stop` + +Stop the browser daemon and close the browser. + +```bash +browse stop +browse stop --force # force kill if daemon is unresponsive +``` + +#### `status` + +Check whether the daemon is running, its connection details, and current environment. + +```bash +browse status +``` + +#### `env [local|remote]` + +Show or switch the browser environment. Without arguments, prints the current environment. With an argument, stops the running daemon and restarts in the specified environment. The switch is sticky — subsequent commands stay in the chosen environment until you switch again or run `browse stop`. + +```bash +browse env # print current environment +browse env local # switch to local Chrome +browse env remote # switch to Browserbase (requires API keys) +``` + +#### `newpage [url]` + +Create a new tab, optionally navigating to a URL. + +```bash +browse newpage # open blank tab +browse newpage https://example.com # open tab with URL +``` + +#### `pages` + +List all open tabs. + +```bash +browse pages +``` + +#### `tab_switch ` + +Switch to a tab by its index (from `browse pages`). + +```bash +browse tab_switch 1 +``` + +#### `tab_close [index]` + +Close a tab. Closes current tab if no index given. + +```bash +browse tab_close # close current tab +browse tab_close 2 # close tab at index 2 +``` + +--- + +### JavaScript Evaluation + +#### `eval ` + +Evaluate JavaScript in the page context. + +```bash +browse eval "document.title" +browse eval "document.querySelectorAll('a').length" +``` + +--- + +### Viewport + +#### `viewport ` + +Set the browser viewport size. + +```bash +browse viewport 1920 1080 +``` + +--- + +### Network Capture + +Capture network requests to the filesystem for inspection. + +#### `network on` + +Enable network request capture. Creates a temp directory where requests and responses are saved as JSON files. + +```bash +browse network on +``` + +#### `network off` + +Disable network capture. + +```bash +browse network off +``` + +#### `network path` + +Show the capture directory path. + +```bash +browse network path +``` + +#### `network clear` + +Clear all captured requests. + +```bash +browse network clear +``` + +--- + +## Configuration + +### Global Flags + +#### `--json` + +Output as JSON for all commands. Useful for structured, parseable output. + +```bash +browse --json get url # returns {"url": "https://..."} +browse --json snapshot # returns JSON accessibility tree +``` + +#### `--session ` + +Run commands against a named session, enabling multiple concurrent browsers. + +```bash +browse --session work open https://a.com +browse --session personal open https://b.com +``` + +### Environment Variables + +| Variable | Required | Description | +|----------|----------|-------------| +| `BROWSERBASE_API_KEY` | For remote mode | API key from https://browserbase.com/settings | +| `BROWSERBASE_PROJECT_ID` | For remote mode | Project ID from Browserbase dashboard | + +When both are set, the CLI uses Browserbase remote sessions. Otherwise, it falls back to local Chrome. + +### Setting credentials + +```bash +export BROWSERBASE_API_KEY="bb_live_..." +export BROWSERBASE_PROJECT_ID="proj_..." +``` + +Get these values from https://browserbase.com/settings. + +--- + +## Error Messages + +**"No active page"** +- The daemon is running but has no page open. +- Fix: Run `browse open `. If the issue persists, run `browse stop` and retry. For zombie daemons: `pkill -f "browse.*daemon"`. + +**"Chrome not found"** / **"Could not find local Chrome installation"** +- Chrome/Chromium is not installed or not in a standard location. +- Fix: Install Chrome, or switch to remote with `browse env remote` (no local browser needed). + +**"Daemon not running"** +- No daemon process is active. Most commands auto-start the daemon, but `snapshot`, `click`, etc. require an active session. +- Fix: Run `browse open ` to start a session. + +**Element ref not found (e.g., "@0-5")** +- The ref from a previous snapshot is no longer valid (page changed). +- Fix: Run `browse snapshot` again to get fresh refs. + +**Timeout errors** +- The page took too long to load or an element didn't appear. +- Fix: Try `browse wait load` before interacting, or increase wait time. diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..c13f140 --- /dev/null +++ b/SKILL.md @@ -0,0 +1,161 @@ +--- +name: browser +description: "使用自然语言通过CLI命令自动化网页浏览器交互。" +compatibility: "Requires the browse CLI (`npm install -g @browserbasehq/browse-cli`). Optional: set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for remote Browserbase sessions; falls back to local Chrome otherwise." +license: MIT +allowed-tools: Bash +metadata: + openclaw: + requires: + bins: + - browse + install: + - kind: node + package: "@browserbasehq/browse-cli" + bins: [browse] + homepage: https://github.com/browserbase/skills +--- + +# Browser Automation + +Automate browser interactions using the browse CLI with Claude. + +## Setup check + +Before running any browser commands, verify the CLI is available: + +```bash +which browse || npm install -g @browserbasehq/browse-cli +``` + +## Environment Selection (Local vs Remote) + +The CLI automatically selects between local and remote browser environments based on available configuration: + +### Local mode (default) +- Uses local Chrome — no API keys needed +- Best for: development, simple pages, trusted sites with no bot protection + +### Remote mode (Browserbase) +- Activated when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set +- Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence +- **Use remote mode when:** the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access +- Get credentials at https://browserbase.com/settings + +### When to choose which +- **Simple browsing** (docs, wikis, public APIs): local mode is fine +- **Protected sites** (login walls, CAPTCHAs, anti-scraping): use remote mode +- **If local mode fails** with bot detection or access denied: switch to remote mode + +## Commands + +All commands work identically in both modes. The daemon auto-starts on first command. + +### Navigation +```bash +browse open # Go to URL (aliases: goto) +browse reload # Reload current page +browse back # Go back in history +browse forward # Go forward in history +``` + +### Page state (prefer snapshot over screenshot) +```bash +browse snapshot # Get accessibility tree with element refs (fast, structured) +browse screenshot [path] # Take visual screenshot (slow, uses vision tokens) +browse get url # Get current URL +browse get title # Get page title +browse get text # Get text content (use "body" for all text) +browse get html # Get HTML content of element +browse get value # Get form field value +``` + +Use `browse snapshot` as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use `browse screenshot` when you need visual context (layout, images, debugging). + +### Interaction +```bash +browse click # Click element by ref from snapshot (e.g., @0-5) +browse type # Type text into focused element +browse fill # Fill input and press Enter +browse select # Select dropdown option(s) +browse press # Press key (Enter, Tab, Escape, Cmd+A, etc.) +browse drag # Drag from one point to another +browse scroll # Scroll at coordinates +browse highlight # Highlight element on page +browse is visible # Check if element is visible +browse is checked # Check if element is checked +browse wait [arg] # Wait for: load, selector, timeout +``` + +### Session management +```bash +browse stop # Stop the browser daemon +browse status # Check daemon status (includes env) +browse env # Show current environment (local or remote) +browse env local # Switch to local Chrome +browse env remote # Switch to Browserbase (requires API keys) +browse pages # List all open tabs +browse tab_switch # Switch to tab by index +browse tab_close [index] # Close tab +``` + +### Typical workflow +1. `browse open ` — navigate to the page +2. `browse snapshot` — read the accessibility tree to understand page structure and get element refs +3. `browse click ` / `browse type ` / `browse fill ` — interact using refs from snapshot +4. `browse snapshot` — confirm the action worked +5. Repeat 3-4 as needed +6. `browse stop` — close the browser when done + +## Quick Example + +```bash +browse open https://example.com +browse snapshot # see page structure + element refs +browse click @0-5 # click element with ref 0-5 +browse get title +browse stop +``` + +## Mode Comparison + +| Feature | Local | Browserbase | +|---------|-------|-------------| +| Speed | Faster | Slightly slower | +| Setup | Chrome required | API key required | +| Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) | +| CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) | +| Residential proxies | No | Yes (201 countries, geo-targeting) | +| Session persistence | No | Yes (cookies/auth persist across sessions) | +| Best for | Development/simple pages | Protected sites, bot detection, production scraping | + +## Best Practices + +1. **Always `browse open` first** before interacting +2. **Use `browse snapshot`** to check page state — it's fast and gives you element refs +3. **Only screenshot when visual context is needed** (layout checks, images, debugging) +4. **Use refs from snapshot** to click/interact — e.g., `browse click @0-5` +5. **`browse stop`** when done to clean up the browser session + +## Troubleshooting + +- **"No active page"**: Run `browse stop`, then check `browse status`. If it still says running, kill the zombie daemon with `pkill -f "browse.*daemon"`, then retry `browse open` +- **Chrome not found**: Install Chrome or use `browse env remote` +- **Action fails**: Run `browse snapshot` to see available elements and their refs +- **Browserbase fails**: Verify API key and project ID are set + +## Switching to Remote Mode + +Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it. + +Don't switch for simple sites (docs, wikis, public APIs, localhost). + +```bash +browse env remote # switch to Browserbase +browse env local # switch back to local Chrome +``` + +The switch is sticky until you run `browse stop` or switch again. + +For detailed examples, see [EXAMPLES.md](EXAMPLES.md). +For API reference, see [REFERENCE.md](REFERENCE.md). diff --git a/_meta.json b/_meta.json new file mode 100644 index 0000000..45d127f --- /dev/null +++ b/_meta.json @@ -0,0 +1,6 @@ +{ + "ownerId": "kn7f3h94x6dsndkdjph76br4pd803szg", + "slug": "browse", + "version": "2.0.2", + "publishedAt": 1772680539406 +} \ No newline at end of file