Initial commit with translated description

2026-03-29 13:17:20 +08:00
commit 48cba2e574
5 changed files with 729 additions and 0 deletions
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@@ -0,0 +1,109 @@
 # Browser Automation Examples
 Common browser automation workflows using the `browse` CLI. Each example demonstrates a distinct pattern using real commands.
 ## Example 1: Extract Data from a Page
 **User request**: "Get the product details from example.com/product/123"
 ```bash
 browse open https://example.com/product/123
 browse snapshot                          # read page structure + element refs
 browse get text "body"                   # extract all visible text content
 browse stop
 ```
 Parse the text output to extract structured data (name, price, description, etc.).
 For a specific section, use a CSS selector:
 ```bash
 browse get text ".product-details"       # text from a specific container
 ```
 **Note**: `browse get text` requires a CSS selector — use `"body"` for all page text.
 ## Example 2: Fill and Submit a Form
 **User request**: "Fill out the contact form on example.com with my information"
 ```bash
 browse open https://example.com/contact
 browse snapshot                          # find form fields and their refs
 browse click @0-3                        # click the Name input (ref from snapshot)
 browse type "John Doe"
 browse press Tab                         # move to next field
 browse type "john@example.com"
 browse fill "#message" "I would like to inquire about your services"
 browse snapshot                          # verify fields are filled
 browse click @0-8                        # click Submit button (ref from snapshot)
 browse snapshot                          # confirm submission result
 browse stop
 ```
 **Key pattern**: Use `browse snapshot` before interacting to discover element refs, then `browse click <ref>` and `browse type` to interact.
 ## Example 3: Multi-Step Navigation
 **User request**: "Get headlines from the first 3 pages of results on example.com/news"
 ```bash
 browse open https://example.com/news
 browse snapshot                          # read page 1 content
 browse get text ".headline"              # extract headlines
 browse snapshot                          # find "Next" button ref
 browse click @0-12                       # click Next (ref from snapshot)
 browse wait load                         # wait for page 2 to load
 browse get text ".headline"              # extract page 2 headlines
 browse snapshot                          # find Next again (ref may change)
 browse click @0-15                       # click Next
 browse wait load
 browse get text ".headline"              # extract page 3 headlines
 browse stop
 ```
 **Key pattern**: Re-run `browse snapshot` after each navigation because element refs change when the page updates.
 ## Example 4: Escalate to Remote Mode
 **User request**: "Scrape pricing from competitor.com" (a site with Cloudflare protection)
 ```bash
 # Attempt 1: local mode
 browse open https://competitor.com/pricing
 browse snapshot
 # Output shows: "Checking your browser..." (Cloudflare interstitial)
 # or: page content is empty / access denied
 browse stop
 ```
 The agent detects bot protection and tells the user:
 > This site has Cloudflare bot detection. Browserbase remote mode can bypass this with anti-bot stealth and residential proxies. Want me to set it up?
 If the user agrees:
 ```bash
 # Set Browserbase credentials
 export BROWSERBASE_API_KEY="bb_live_..."
 export BROWSERBASE_PROJECT_ID="proj_..."
 # Retry in remote mode
 browse env remote
 browse open https://competitor.com/pricing
 browse snapshot                          # full page content now accessible
 browse get text ".pricing-table"
 browse stop
 ```
 ## Tips
 - **Snapshot first**: Always run `browse snapshot` before interacting — it gives you the accessibility tree with element refs
 - **Use refs to click**: `browse click @0-5` is more reliable than trying to describe elements
 - **Re-snapshot after actions**: Element refs change when the page updates
 - **`get text` for data extraction**: Use `browse get text [selector]` to pull text content from specific elements
 - **`stop` when done**: Always `browse stop` to clean up the browser session
 - **Prefer snapshot over screenshot**: Snapshot is fast and structured; screenshot is slow and uses vision tokens. Only screenshot when you need visual context (layout, images, debugging)
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2026 Browserbase, Inc.
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/REFERENCE.md
+++ b/REFERENCE.md
@@ -0,0 +1,432 @@
 # Browser Automation CLI Reference
 Technical reference for the `browse` CLI tool.
 ## Table of Contents
 - [Architecture](#architecture)
 - [Command Reference](#command-reference)
  - [Navigation](#navigation)
  - [Page State](#page-state)
  - [Interaction](#interaction)
  - [Session Management](#session-management)
  - [JavaScript Evaluation](#javascript-evaluation)
  - [Viewport](#viewport)
  - [Network Capture](#network-capture)
 - [Configuration](#configuration)
  - [Global Flags](#global-flags)
  - [Environment Variables](#environment-variables)
 - [Error Messages](#error-messages)
 ## Architecture
 The browse CLI is a **daemon-based** command-line tool:
 - **Daemon process**: A background process manages the browser instance. Auto-starts on the first command (e.g., `browse open`), persists across commands, and stops with `browse stop`.
 - **Local mode** (default): Launches a local Chrome/Chromium instance.
 - **Remote mode** (Browserbase): Connects to a Browserbase cloud browser session when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set.
 - **Accessibility-first**: Use `browse snapshot` to get the page's accessibility tree with element refs, then interact using those refs.
 ## Command Reference
 ### Navigation
 #### `open <url>`
 Navigate to a URL. Alias: `goto`. Auto-starts the daemon if not running.
 ```bash
 browse open https://example.com
 browse open https://example.com --wait networkidle   # wait for all network requests to finish (useful for SPAs)
 browse open https://example.com --wait domcontentloaded
 ```
 The `--wait` flag controls when navigation is considered complete. Values: `load` (default), `domcontentloaded`, `networkidle`. Use `networkidle` for JavaScript-heavy pages that fetch data after initial load.
 #### `reload`
 Reload the current page.
 ```bash
 browse reload
 ```
 #### `back` / `forward`
 Navigate browser history.
 ```bash
 browse back
 browse forward
 ```
 ---
 ### Page State
 #### `snapshot`
 Get the accessibility tree with interactive element refs. This is the primary way to understand page structure.
 ```bash
 browse snapshot
 browse snapshot --compact                # tree only, no ref maps
 ```
 Returns a text representation of the page with refs like `@0-5` that can be passed to `click`. Use `--compact` for shorter output when you only need the tree.
 #### `screenshot [path]`
 Take a visual screenshot. Slower than snapshot and uses vision tokens.
 ```bash
 browse screenshot                        # auto-generated path
 browse screenshot ./capture.png          # custom path
 browse screenshot --full-page            # capture entire scrollable page
 ```
 #### `get <property> [selector]`
 Get page properties. Available properties: `url`, `title`, `text`, `html`, `value`, `box`, `visible`, `checked`.
 ```bash
 browse get url                           # current URL
 browse get title                         # page title
 browse get text "body"                   # all visible text (selector required)
 browse get text ".product-info"          # text within a CSS selector
 browse get html "#main"                  # inner HTML of an element
 browse get value "#email-input"          # value of a form field
 browse get box "#header"                 # bounding box (centroid coordinates)
 browse get visible ".modal"              # check if element is visible
 browse get checked "#agree"              # check if checkbox/radio is checked
 ```
 **Note**: `get text` requires a CSS selector argument — use `"body"` for full page text.
 #### `refs`
 Show the cached ref map from the last `browse snapshot`. Useful for looking up element refs without re-running a full snapshot.
 ```bash
 browse refs
 ```
 ---
 ### Interaction
 #### `click <ref>`
 Click an element by its ref from `browse snapshot` output.
 ```bash
 browse click @0-5                        # click element with ref 0-5
 ```
 #### `click_xy <x> <y>`
 Click at exact viewport coordinates.
 ```bash
 browse click_xy 500 300
 ```
 #### `hover <x> <y>`
 Hover at viewport coordinates.
 ```bash
 browse hover 500 300
 ```
 #### `type <text>`
 Type text into the currently focused element.
 ```bash
 browse type "Hello, world!"
 browse type "slow typing" --delay 100    # 100ms between keystrokes
 browse type "human-like" --mistakes      # simulate human typing with typos
 ```
 #### `fill <selector> <value>`
 Fill an input element matching a CSS selector and press Enter.
 ```bash
 browse fill "#search" "browser automation"
 browse fill "input[name=email]" "user@example.com"
 browse fill "#search" "query" --no-press-enter   # fill without pressing Enter
 ```
 #### `select <selector> <values...>`
 Select option(s) from a dropdown.
 ```bash
 browse select "#country" "United States"
 browse select "#tags" "javascript" "typescript"    # multi-select
 ```
 #### `press <key>`
 Press a keyboard key or key combination.
 ```bash
 browse press Enter
 browse press Tab
 browse press Escape
 browse press Cmd+A                       # select all (Mac)
 browse press Ctrl+C                      # copy (Linux/Windows)
 ```
 #### `scroll <x> <y> <deltaX> <deltaY>`
 Scroll at a given position by a given amount.
 ```bash
 browse scroll 500 300 0 -300             # scroll up at (500, 300)
 browse scroll 500 300 0 500              # scroll down
 ```
 #### `drag <fromX> <fromY> <toX> <toY>`
 Drag from one viewport coordinate to another.
 ```bash
 browse drag 80 80 310 100                # drag with default 10 steps
 browse drag 80 80 310 100 --steps 20     # more intermediate steps
 browse drag 80 80 310 100 --delay 50     # 50ms between steps
 browse drag 80 80 310 100 --button right # use right mouse button
 browse drag 80 80 310 100 --xpath        # return source/target XPaths
 ```
 #### `highlight <selector>`
 Highlight an element on the page for visual debugging.
 ```bash
 browse highlight "#submit-btn"           # highlight for 2 seconds (default)
 browse highlight ".nav" -d 5000          # highlight for 5 seconds
 ```
 #### `is <check> <selector>`
 Check element state. Available checks: `visible`, `checked`.
 ```bash
 browse is visible ".modal"               # returns { visible: true/false }
 browse is checked "#agree"               # returns { checked: true/false }
 ```
 #### `wait <type> [arg]`
 Wait for a condition.
 ```bash
 browse wait load                         # wait for page load
 browse wait "selector" ".results"        # wait for element to appear
 browse wait timeout 3000                 # wait 3 seconds
 ```
 ---
 ### Session Management
 #### `start`
 Start the browser daemon manually. Usually not needed — the daemon auto-starts on first command.
 ```bash
 browse start
 ```
 #### `stop`
 Stop the browser daemon and close the browser.
 ```bash
 browse stop
 browse stop --force                      # force kill if daemon is unresponsive
 ```
 #### `status`
 Check whether the daemon is running, its connection details, and current environment.
 ```bash
 browse status
 ```
 #### `env [local|remote]`
 Show or switch the browser environment. Without arguments, prints the current environment. With an argument, stops the running daemon and restarts in the specified environment. The switch is sticky — subsequent commands stay in the chosen environment until you switch again or run `browse stop`.
 ```bash
 browse env                               # print current environment
 browse env local                         # switch to local Chrome
 browse env remote                        # switch to Browserbase (requires API keys)
 ```
 #### `newpage [url]`
 Create a new tab, optionally navigating to a URL.
 ```bash
 browse newpage                           # open blank tab
 browse newpage https://example.com       # open tab with URL
 ```
 #### `pages`
 List all open tabs.
 ```bash
 browse pages
 ```
 #### `tab_switch <index>`
 Switch to a tab by its index (from `browse pages`).
 ```bash
 browse tab_switch 1
 ```
 #### `tab_close [index]`
 Close a tab. Closes current tab if no index given.
 ```bash
 browse tab_close          # close current tab
 browse tab_close 2        # close tab at index 2
 ```
 ---
 ### JavaScript Evaluation
 #### `eval <expression>`
 Evaluate JavaScript in the page context.
 ```bash
 browse eval "document.title"
 browse eval "document.querySelectorAll('a').length"
 ```
 ---
 ### Viewport
 #### `viewport <width> <height>`
 Set the browser viewport size.
 ```bash
 browse viewport 1920 1080
 ```
 ---
 ### Network Capture
 Capture network requests to the filesystem for inspection.
 #### `network on`
 Enable network request capture. Creates a temp directory where requests and responses are saved as JSON files.
 ```bash
 browse network on
 ```
 #### `network off`
 Disable network capture.
 ```bash
 browse network off
 ```
 #### `network path`
 Show the capture directory path.
 ```bash
 browse network path
 ```
 #### `network clear`
 Clear all captured requests.
 ```bash
 browse network clear
 ```
 ---
 ## Configuration
 ### Global Flags
 #### `--json`
 Output as JSON for all commands. Useful for structured, parseable output.
 ```bash
 browse --json get url                    # returns {"url": "https://..."}
 browse --json snapshot                   # returns JSON accessibility tree
 ```
 #### `--session <name>`
 Run commands against a named session, enabling multiple concurrent browsers.
 ```bash
 browse --session work open https://a.com
 browse --session personal open https://b.com
 ```
 ### Environment Variables
 | Variable | Required | Description |
 |----------|----------|-------------|
 | `BROWSERBASE_API_KEY` | For remote mode | API key from https://browserbase.com/settings |
 | `BROWSERBASE_PROJECT_ID` | For remote mode | Project ID from Browserbase dashboard |
 When both are set, the CLI uses Browserbase remote sessions. Otherwise, it falls back to local Chrome.
 ### Setting credentials
 ```bash
 export BROWSERBASE_API_KEY="bb_live_..."
 export BROWSERBASE_PROJECT_ID="proj_..."
 ```
 Get these values from https://browserbase.com/settings.
 ---
 ## Error Messages
 **"No active page"**
 - The daemon is running but has no page open.
 - Fix: Run `browse open <url>`. If the issue persists, run `browse stop` and retry. For zombie daemons: `pkill -f "browse.*daemon"`.
 **"Chrome not found"** / **"Could not find local Chrome installation"**
 - Chrome/Chromium is not installed or not in a standard location.
 - Fix: Install Chrome, or switch to remote with `browse env remote` (no local browser needed).
 **"Daemon not running"**
 - No daemon process is active. Most commands auto-start the daemon, but `snapshot`, `click`, etc. require an active session.
 - Fix: Run `browse open <url>` to start a session.
 **Element ref not found (e.g., "@0-5")**
 - The ref from a previous snapshot is no longer valid (page changed).
 - Fix: Run `browse snapshot` again to get fresh refs.
 **Timeout errors**
 - The page took too long to load or an element didn't appear.
 - Fix: Try `browse wait load` before interacting, or increase wait time.
--- a/SKILL.md
+++ b/SKILL.md
@@ -0,0 +1,161 @@
 ---
 name: browser
 description: "使用自然语言通过CLI命令自动化网页浏览器交互。"
 compatibility: "Requires the browse CLI (`npm install -g @browserbasehq/browse-cli`). Optional: set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for remote Browserbase sessions; falls back to local Chrome otherwise."
 license: MIT
 allowed-tools: Bash
 metadata:
  openclaw:
    requires:
      bins:
        - browse
    install:
      - kind: node
        package: "@browserbasehq/browse-cli"
        bins: [browse]
    homepage: https://github.com/browserbase/skills
 ---
 # Browser Automation
 Automate browser interactions using the browse CLI with Claude.
 ## Setup check
 Before running any browser commands, verify the CLI is available:
 ```bash
 which browse || npm install -g @browserbasehq/browse-cli
 ```
 ## Environment Selection (Local vs Remote)
 The CLI automatically selects between local and remote browser environments based on available configuration:
 ### Local mode (default)
 - Uses local Chrome — no API keys needed
 - Best for: development, simple pages, trusted sites with no bot protection
 ### Remote mode (Browserbase)
 - Activated when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set
 - Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
 - **Use remote mode when:** the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
 - Get credentials at https://browserbase.com/settings
 ### When to choose which
 - **Simple browsing** (docs, wikis, public APIs): local mode is fine
 - **Protected sites** (login walls, CAPTCHAs, anti-scraping): use remote mode
 - **If local mode fails** with bot detection or access denied: switch to remote mode
 ## Commands
 All commands work identically in both modes. The daemon auto-starts on first command.
 ### Navigation
 ```bash
 browse open <url>                        # Go to URL (aliases: goto)
 browse reload                            # Reload current page
 browse back                              # Go back in history
 browse forward                           # Go forward in history
 ```
 ### Page state (prefer snapshot over screenshot)
 ```bash
 browse snapshot                          # Get accessibility tree with element refs (fast, structured)
 browse screenshot [path]                 # Take visual screenshot (slow, uses vision tokens)
 browse get url                           # Get current URL
 browse get title                         # Get page title
 browse get text <selector>               # Get text content (use "body" for all text)
 browse get html <selector>               # Get HTML content of element
 browse get value <selector>              # Get form field value
 ```
 Use `browse snapshot` as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use `browse screenshot` when you need visual context (layout, images, debugging).
 ### Interaction
 ```bash
 browse click <ref>                       # Click element by ref from snapshot (e.g., @0-5)
 browse type <text>                       # Type text into focused element
 browse fill <selector> <value>           # Fill input and press Enter
 browse select <selector> <values...>     # Select dropdown option(s)
 browse press <key>                       # Press key (Enter, Tab, Escape, Cmd+A, etc.)
 browse drag <fromX> <fromY> <toX> <toY>  # Drag from one point to another
 browse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
 browse highlight <selector>              # Highlight element on page
 browse is visible <selector>             # Check if element is visible
 browse is checked <selector>             # Check if element is checked
 browse wait <type> [arg]                 # Wait for: load, selector, timeout
 ```
 ### Session management
 ```bash
 browse stop                              # Stop the browser daemon
 browse status                            # Check daemon status (includes env)
 browse env                               # Show current environment (local or remote)
 browse env local                         # Switch to local Chrome
 browse env remote                        # Switch to Browserbase (requires API keys)
 browse pages                             # List all open tabs
 browse tab_switch <index>                # Switch to tab by index
 browse tab_close [index]                 # Close tab
 ```
 ### Typical workflow
 1. `browse open <url>` — navigate to the page
 2. `browse snapshot` — read the accessibility tree to understand page structure and get element refs
 3. `browse click <ref>` / `browse type <text>` / `browse fill <selector> <value>` — interact using refs from snapshot
 4. `browse snapshot` — confirm the action worked
 5. Repeat 3-4 as needed
 6. `browse stop` — close the browser when done
 ## Quick Example
 ```bash
 browse open https://example.com
 browse snapshot                          # see page structure + element refs
 browse click @0-5                        # click element with ref 0-5
 browse get title
 browse stop
 ```
 ## Mode Comparison
 | Feature | Local | Browserbase |
 |---------|-------|-------------|
 | Speed | Faster | Slightly slower |
 | Setup | Chrome required | API key required |
 | Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) |
 | CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) |
 | Residential proxies | No | Yes (201 countries, geo-targeting) |
 | Session persistence | No | Yes (cookies/auth persist across sessions) |
 | Best for | Development/simple pages | Protected sites, bot detection, production scraping |
 ## Best Practices
 1. **Always `browse open` first** before interacting
 2. **Use `browse snapshot`** to check page state — it's fast and gives you element refs
 3. **Only screenshot when visual context is needed** (layout checks, images, debugging)
 4. **Use refs from snapshot** to click/interact — e.g., `browse click @0-5`
 5. **`browse stop`** when done to clean up the browser session
 ## Troubleshooting
 - **"No active page"**: Run `browse stop`, then check `browse status`. If it still says running, kill the zombie daemon with `pkill -f "browse.*daemon"`, then retry `browse open`
 - **Chrome not found**: Install Chrome or use `browse env remote`
 - **Action fails**: Run `browse snapshot` to see available elements and their refs
 - **Browserbase fails**: Verify API key and project ID are set
 ## Switching to Remote Mode
 Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.
 Don't switch for simple sites (docs, wikis, public APIs, localhost).
 ```bash
 browse env remote            # switch to Browserbase
 browse env local             # switch back to local Chrome
 ```
 The switch is sticky until you run `browse stop` or switch again.
 For detailed examples, see [EXAMPLES.md](EXAMPLES.md).
 For API reference, see [REFERENCE.md](REFERENCE.md).
--- a/_meta.json
+++ b/_meta.json
@@ -0,0 +1,6 @@
 {
  "ownerId": "kn7f3h94x6dsndkdjph76br4pd803szg",
  "slug": "browse",
  "version": "2.0.2",
  "publishedAt": 1772680539406
 }