Initial commit with translated description

This commit is contained in:
2026-03-29 13:17:20 +08:00
commit 48cba2e574
5 changed files with 729 additions and 0 deletions

109
EXAMPLES.md Normal file
View File

@@ -0,0 +1,109 @@
# Browser Automation Examples
Common browser automation workflows using the `browse` CLI. Each example demonstrates a distinct pattern using real commands.
## Example 1: Extract Data from a Page
**User request**: "Get the product details from example.com/product/123"
```bash
browse open https://example.com/product/123
browse snapshot # read page structure + element refs
browse get text "body" # extract all visible text content
browse stop
```
Parse the text output to extract structured data (name, price, description, etc.).
For a specific section, use a CSS selector:
```bash
browse get text ".product-details" # text from a specific container
```
**Note**: `browse get text` requires a CSS selector — use `"body"` for all page text.
## Example 2: Fill and Submit a Form
**User request**: "Fill out the contact form on example.com with my information"
```bash
browse open https://example.com/contact
browse snapshot # find form fields and their refs
browse click @0-3 # click the Name input (ref from snapshot)
browse type "John Doe"
browse press Tab # move to next field
browse type "john@example.com"
browse fill "#message" "I would like to inquire about your services"
browse snapshot # verify fields are filled
browse click @0-8 # click Submit button (ref from snapshot)
browse snapshot # confirm submission result
browse stop
```
**Key pattern**: Use `browse snapshot` before interacting to discover element refs, then `browse click <ref>` and `browse type` to interact.
## Example 3: Multi-Step Navigation
**User request**: "Get headlines from the first 3 pages of results on example.com/news"
```bash
browse open https://example.com/news
browse snapshot # read page 1 content
browse get text ".headline" # extract headlines
browse snapshot # find "Next" button ref
browse click @0-12 # click Next (ref from snapshot)
browse wait load # wait for page 2 to load
browse get text ".headline" # extract page 2 headlines
browse snapshot # find Next again (ref may change)
browse click @0-15 # click Next
browse wait load
browse get text ".headline" # extract page 3 headlines
browse stop
```
**Key pattern**: Re-run `browse snapshot` after each navigation because element refs change when the page updates.
## Example 4: Escalate to Remote Mode
**User request**: "Scrape pricing from competitor.com" (a site with Cloudflare protection)
```bash
# Attempt 1: local mode
browse open https://competitor.com/pricing
browse snapshot
# Output shows: "Checking your browser..." (Cloudflare interstitial)
# or: page content is empty / access denied
browse stop
```
The agent detects bot protection and tells the user:
> This site has Cloudflare bot detection. Browserbase remote mode can bypass this with anti-bot stealth and residential proxies. Want me to set it up?
If the user agrees:
```bash
# Set Browserbase credentials
export BROWSERBASE_API_KEY="bb_live_..."
export BROWSERBASE_PROJECT_ID="proj_..."
# Retry in remote mode
browse env remote
browse open https://competitor.com/pricing
browse snapshot # full page content now accessible
browse get text ".pricing-table"
browse stop
```
## Tips
- **Snapshot first**: Always run `browse snapshot` before interacting — it gives you the accessibility tree with element refs
- **Use refs to click**: `browse click @0-5` is more reliable than trying to describe elements
- **Re-snapshot after actions**: Element refs change when the page updates
- **`get text` for data extraction**: Use `browse get text [selector]` to pull text content from specific elements
- **`stop` when done**: Always `browse stop` to clean up the browser session
- **Prefer snapshot over screenshot**: Snapshot is fast and structured; screenshot is slow and uses vision tokens. Only screenshot when you need visual context (layout, images, debugging)

21
LICENSE.txt Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 Browserbase, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

432
REFERENCE.md Normal file
View File

@@ -0,0 +1,432 @@
# Browser Automation CLI Reference
Technical reference for the `browse` CLI tool.
## Table of Contents
- [Architecture](#architecture)
- [Command Reference](#command-reference)
- [Navigation](#navigation)
- [Page State](#page-state)
- [Interaction](#interaction)
- [Session Management](#session-management)
- [JavaScript Evaluation](#javascript-evaluation)
- [Viewport](#viewport)
- [Network Capture](#network-capture)
- [Configuration](#configuration)
- [Global Flags](#global-flags)
- [Environment Variables](#environment-variables)
- [Error Messages](#error-messages)
## Architecture
The browse CLI is a **daemon-based** command-line tool:
- **Daemon process**: A background process manages the browser instance. Auto-starts on the first command (e.g., `browse open`), persists across commands, and stops with `browse stop`.
- **Local mode** (default): Launches a local Chrome/Chromium instance.
- **Remote mode** (Browserbase): Connects to a Browserbase cloud browser session when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set.
- **Accessibility-first**: Use `browse snapshot` to get the page's accessibility tree with element refs, then interact using those refs.
## Command Reference
### Navigation
#### `open <url>`
Navigate to a URL. Alias: `goto`. Auto-starts the daemon if not running.
```bash
browse open https://example.com
browse open https://example.com --wait networkidle # wait for all network requests to finish (useful for SPAs)
browse open https://example.com --wait domcontentloaded
```
The `--wait` flag controls when navigation is considered complete. Values: `load` (default), `domcontentloaded`, `networkidle`. Use `networkidle` for JavaScript-heavy pages that fetch data after initial load.
#### `reload`
Reload the current page.
```bash
browse reload
```
#### `back` / `forward`
Navigate browser history.
```bash
browse back
browse forward
```
---
### Page State
#### `snapshot`
Get the accessibility tree with interactive element refs. This is the primary way to understand page structure.
```bash
browse snapshot
browse snapshot --compact # tree only, no ref maps
```
Returns a text representation of the page with refs like `@0-5` that can be passed to `click`. Use `--compact` for shorter output when you only need the tree.
#### `screenshot [path]`
Take a visual screenshot. Slower than snapshot and uses vision tokens.
```bash
browse screenshot # auto-generated path
browse screenshot ./capture.png # custom path
browse screenshot --full-page # capture entire scrollable page
```
#### `get <property> [selector]`
Get page properties. Available properties: `url`, `title`, `text`, `html`, `value`, `box`, `visible`, `checked`.
```bash
browse get url # current URL
browse get title # page title
browse get text "body" # all visible text (selector required)
browse get text ".product-info" # text within a CSS selector
browse get html "#main" # inner HTML of an element
browse get value "#email-input" # value of a form field
browse get box "#header" # bounding box (centroid coordinates)
browse get visible ".modal" # check if element is visible
browse get checked "#agree" # check if checkbox/radio is checked
```
**Note**: `get text` requires a CSS selector argument — use `"body"` for full page text.
#### `refs`
Show the cached ref map from the last `browse snapshot`. Useful for looking up element refs without re-running a full snapshot.
```bash
browse refs
```
---
### Interaction
#### `click <ref>`
Click an element by its ref from `browse snapshot` output.
```bash
browse click @0-5 # click element with ref 0-5
```
#### `click_xy <x> <y>`
Click at exact viewport coordinates.
```bash
browse click_xy 500 300
```
#### `hover <x> <y>`
Hover at viewport coordinates.
```bash
browse hover 500 300
```
#### `type <text>`
Type text into the currently focused element.
```bash
browse type "Hello, world!"
browse type "slow typing" --delay 100 # 100ms between keystrokes
browse type "human-like" --mistakes # simulate human typing with typos
```
#### `fill <selector> <value>`
Fill an input element matching a CSS selector and press Enter.
```bash
browse fill "#search" "browser automation"
browse fill "input[name=email]" "user@example.com"
browse fill "#search" "query" --no-press-enter # fill without pressing Enter
```
#### `select <selector> <values...>`
Select option(s) from a dropdown.
```bash
browse select "#country" "United States"
browse select "#tags" "javascript" "typescript" # multi-select
```
#### `press <key>`
Press a keyboard key or key combination.
```bash
browse press Enter
browse press Tab
browse press Escape
browse press Cmd+A # select all (Mac)
browse press Ctrl+C # copy (Linux/Windows)
```
#### `scroll <x> <y> <deltaX> <deltaY>`
Scroll at a given position by a given amount.
```bash
browse scroll 500 300 0 -300 # scroll up at (500, 300)
browse scroll 500 300 0 500 # scroll down
```
#### `drag <fromX> <fromY> <toX> <toY>`
Drag from one viewport coordinate to another.
```bash
browse drag 80 80 310 100 # drag with default 10 steps
browse drag 80 80 310 100 --steps 20 # more intermediate steps
browse drag 80 80 310 100 --delay 50 # 50ms between steps
browse drag 80 80 310 100 --button right # use right mouse button
browse drag 80 80 310 100 --xpath # return source/target XPaths
```
#### `highlight <selector>`
Highlight an element on the page for visual debugging.
```bash
browse highlight "#submit-btn" # highlight for 2 seconds (default)
browse highlight ".nav" -d 5000 # highlight for 5 seconds
```
#### `is <check> <selector>`
Check element state. Available checks: `visible`, `checked`.
```bash
browse is visible ".modal" # returns { visible: true/false }
browse is checked "#agree" # returns { checked: true/false }
```
#### `wait <type> [arg]`
Wait for a condition.
```bash
browse wait load # wait for page load
browse wait "selector" ".results" # wait for element to appear
browse wait timeout 3000 # wait 3 seconds
```
---
### Session Management
#### `start`
Start the browser daemon manually. Usually not needed — the daemon auto-starts on first command.
```bash
browse start
```
#### `stop`
Stop the browser daemon and close the browser.
```bash
browse stop
browse stop --force # force kill if daemon is unresponsive
```
#### `status`
Check whether the daemon is running, its connection details, and current environment.
```bash
browse status
```
#### `env [local|remote]`
Show or switch the browser environment. Without arguments, prints the current environment. With an argument, stops the running daemon and restarts in the specified environment. The switch is sticky — subsequent commands stay in the chosen environment until you switch again or run `browse stop`.
```bash
browse env # print current environment
browse env local # switch to local Chrome
browse env remote # switch to Browserbase (requires API keys)
```
#### `newpage [url]`
Create a new tab, optionally navigating to a URL.
```bash
browse newpage # open blank tab
browse newpage https://example.com # open tab with URL
```
#### `pages`
List all open tabs.
```bash
browse pages
```
#### `tab_switch <index>`
Switch to a tab by its index (from `browse pages`).
```bash
browse tab_switch 1
```
#### `tab_close [index]`
Close a tab. Closes current tab if no index given.
```bash
browse tab_close # close current tab
browse tab_close 2 # close tab at index 2
```
---
### JavaScript Evaluation
#### `eval <expression>`
Evaluate JavaScript in the page context.
```bash
browse eval "document.title"
browse eval "document.querySelectorAll('a').length"
```
---
### Viewport
#### `viewport <width> <height>`
Set the browser viewport size.
```bash
browse viewport 1920 1080
```
---
### Network Capture
Capture network requests to the filesystem for inspection.
#### `network on`
Enable network request capture. Creates a temp directory where requests and responses are saved as JSON files.
```bash
browse network on
```
#### `network off`
Disable network capture.
```bash
browse network off
```
#### `network path`
Show the capture directory path.
```bash
browse network path
```
#### `network clear`
Clear all captured requests.
```bash
browse network clear
```
---
## Configuration
### Global Flags
#### `--json`
Output as JSON for all commands. Useful for structured, parseable output.
```bash
browse --json get url # returns {"url": "https://..."}
browse --json snapshot # returns JSON accessibility tree
```
#### `--session <name>`
Run commands against a named session, enabling multiple concurrent browsers.
```bash
browse --session work open https://a.com
browse --session personal open https://b.com
```
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `BROWSERBASE_API_KEY` | For remote mode | API key from https://browserbase.com/settings |
| `BROWSERBASE_PROJECT_ID` | For remote mode | Project ID from Browserbase dashboard |
When both are set, the CLI uses Browserbase remote sessions. Otherwise, it falls back to local Chrome.
### Setting credentials
```bash
export BROWSERBASE_API_KEY="bb_live_..."
export BROWSERBASE_PROJECT_ID="proj_..."
```
Get these values from https://browserbase.com/settings.
---
## Error Messages
**"No active page"**
- The daemon is running but has no page open.
- Fix: Run `browse open <url>`. If the issue persists, run `browse stop` and retry. For zombie daemons: `pkill -f "browse.*daemon"`.
**"Chrome not found"** / **"Could not find local Chrome installation"**
- Chrome/Chromium is not installed or not in a standard location.
- Fix: Install Chrome, or switch to remote with `browse env remote` (no local browser needed).
**"Daemon not running"**
- No daemon process is active. Most commands auto-start the daemon, but `snapshot`, `click`, etc. require an active session.
- Fix: Run `browse open <url>` to start a session.
**Element ref not found (e.g., "@0-5")**
- The ref from a previous snapshot is no longer valid (page changed).
- Fix: Run `browse snapshot` again to get fresh refs.
**Timeout errors**
- The page took too long to load or an element didn't appear.
- Fix: Try `browse wait load` before interacting, or increase wait time.

161
SKILL.md Normal file
View File

@@ -0,0 +1,161 @@
---
name: browser
description: "使用自然语言通过CLI命令自动化网页浏览器交互。"
compatibility: "Requires the browse CLI (`npm install -g @browserbasehq/browse-cli`). Optional: set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for remote Browserbase sessions; falls back to local Chrome otherwise."
license: MIT
allowed-tools: Bash
metadata:
openclaw:
requires:
bins:
- browse
install:
- kind: node
package: "@browserbasehq/browse-cli"
bins: [browse]
homepage: https://github.com/browserbase/skills
---
# Browser Automation
Automate browser interactions using the browse CLI with Claude.
## Setup check
Before running any browser commands, verify the CLI is available:
```bash
which browse || npm install -g @browserbasehq/browse-cli
```
## Environment Selection (Local vs Remote)
The CLI automatically selects between local and remote browser environments based on available configuration:
### Local mode (default)
- Uses local Chrome — no API keys needed
- Best for: development, simple pages, trusted sites with no bot protection
### Remote mode (Browserbase)
- Activated when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set
- Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
- **Use remote mode when:** the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
- Get credentials at https://browserbase.com/settings
### When to choose which
- **Simple browsing** (docs, wikis, public APIs): local mode is fine
- **Protected sites** (login walls, CAPTCHAs, anti-scraping): use remote mode
- **If local mode fails** with bot detection or access denied: switch to remote mode
## Commands
All commands work identically in both modes. The daemon auto-starts on first command.
### Navigation
```bash
browse open <url> # Go to URL (aliases: goto)
browse reload # Reload current page
browse back # Go back in history
browse forward # Go forward in history
```
### Page state (prefer snapshot over screenshot)
```bash
browse snapshot # Get accessibility tree with element refs (fast, structured)
browse screenshot [path] # Take visual screenshot (slow, uses vision tokens)
browse get url # Get current URL
browse get title # Get page title
browse get text <selector> # Get text content (use "body" for all text)
browse get html <selector> # Get HTML content of element
browse get value <selector> # Get form field value
```
Use `browse snapshot` as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use `browse screenshot` when you need visual context (layout, images, debugging).
### Interaction
```bash
browse click <ref> # Click element by ref from snapshot (e.g., @0-5)
browse type <text> # Type text into focused element
browse fill <selector> <value> # Fill input and press Enter
browse select <selector> <values...> # Select dropdown option(s)
browse press <key> # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse drag <fromX> <fromY> <toX> <toY> # Drag from one point to another
browse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
browse highlight <selector> # Highlight element on page
browse is visible <selector> # Check if element is visible
browse is checked <selector> # Check if element is checked
browse wait <type> [arg] # Wait for: load, selector, timeout
```
### Session management
```bash
browse stop # Stop the browser daemon
browse status # Check daemon status (includes env)
browse env # Show current environment (local or remote)
browse env local # Switch to local Chrome
browse env remote # Switch to Browserbase (requires API keys)
browse pages # List all open tabs
browse tab_switch <index> # Switch to tab by index
browse tab_close [index] # Close tab
```
### Typical workflow
1. `browse open <url>` — navigate to the page
2. `browse snapshot` — read the accessibility tree to understand page structure and get element refs
3. `browse click <ref>` / `browse type <text>` / `browse fill <selector> <value>` — interact using refs from snapshot
4. `browse snapshot` — confirm the action worked
5. Repeat 3-4 as needed
6. `browse stop` — close the browser when done
## Quick Example
```bash
browse open https://example.com
browse snapshot # see page structure + element refs
browse click @0-5 # click element with ref 0-5
browse get title
browse stop
```
## Mode Comparison
| Feature | Local | Browserbase |
|---------|-------|-------------|
| Speed | Faster | Slightly slower |
| Setup | Chrome required | API key required |
| Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) |
| CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) |
| Residential proxies | No | Yes (201 countries, geo-targeting) |
| Session persistence | No | Yes (cookies/auth persist across sessions) |
| Best for | Development/simple pages | Protected sites, bot detection, production scraping |
## Best Practices
1. **Always `browse open` first** before interacting
2. **Use `browse snapshot`** to check page state — it's fast and gives you element refs
3. **Only screenshot when visual context is needed** (layout checks, images, debugging)
4. **Use refs from snapshot** to click/interact — e.g., `browse click @0-5`
5. **`browse stop`** when done to clean up the browser session
## Troubleshooting
- **"No active page"**: Run `browse stop`, then check `browse status`. If it still says running, kill the zombie daemon with `pkill -f "browse.*daemon"`, then retry `browse open`
- **Chrome not found**: Install Chrome or use `browse env remote`
- **Action fails**: Run `browse snapshot` to see available elements and their refs
- **Browserbase fails**: Verify API key and project ID are set
## Switching to Remote Mode
Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.
Don't switch for simple sites (docs, wikis, public APIs, localhost).
```bash
browse env remote # switch to Browserbase
browse env local # switch back to local Chrome
```
The switch is sticky until you run `browse stop` or switch again.
For detailed examples, see [EXAMPLES.md](EXAMPLES.md).
For API reference, see [REFERENCE.md](REFERENCE.md).

6
_meta.json Normal file
View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn7f3h94x6dsndkdjph76br4pd803szg",
"slug": "browse",
"version": "2.0.2",
"publishedAt": 1772680539406
}