--- name: browser description: "使用自然语言通过CLI命令自动化网页浏览器交互。" compatibility: "Requires the browse CLI (`npm install -g @browserbasehq/browse-cli`). Optional: set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for remote Browserbase sessions; falls back to local Chrome otherwise." license: MIT allowed-tools: Bash metadata: openclaw: requires: bins: - browse install: - kind: node package: "@browserbasehq/browse-cli" bins: [browse] homepage: https://github.com/browserbase/skills --- # Browser Automation Automate browser interactions using the browse CLI with Claude. ## Setup check Before running any browser commands, verify the CLI is available: ```bash which browse || npm install -g @browserbasehq/browse-cli ``` ## Environment Selection (Local vs Remote) The CLI automatically selects between local and remote browser environments based on available configuration: ### Local mode (default) - Uses local Chrome — no API keys needed - Best for: development, simple pages, trusted sites with no bot protection ### Remote mode (Browserbase) - Activated when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set - Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence - **Use remote mode when:** the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access - Get credentials at https://browserbase.com/settings ### When to choose which - **Simple browsing** (docs, wikis, public APIs): local mode is fine - **Protected sites** (login walls, CAPTCHAs, anti-scraping): use remote mode - **If local mode fails** with bot detection or access denied: switch to remote mode ## Commands All commands work identically in both modes. The daemon auto-starts on first command. ### Navigation ```bash browse open # Go to URL (aliases: goto) browse reload # Reload current page browse back # Go back in history browse forward # Go forward in history ``` ### Page state (prefer snapshot over screenshot) ```bash browse snapshot # Get accessibility tree with element refs (fast, structured) browse screenshot [path] # Take visual screenshot (slow, uses vision tokens) browse get url # Get current URL browse get title # Get page title browse get text # Get text content (use "body" for all text) browse get html # Get HTML content of element browse get value # Get form field value ``` Use `browse snapshot` as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use `browse screenshot` when you need visual context (layout, images, debugging). ### Interaction ```bash browse click # Click element by ref from snapshot (e.g., @0-5) browse type # Type text into focused element browse fill # Fill input and press Enter browse select # Select dropdown option(s) browse press # Press key (Enter, Tab, Escape, Cmd+A, etc.) browse drag # Drag from one point to another browse scroll # Scroll at coordinates browse highlight # Highlight element on page browse is visible # Check if element is visible browse is checked # Check if element is checked browse wait [arg] # Wait for: load, selector, timeout ``` ### Session management ```bash browse stop # Stop the browser daemon browse status # Check daemon status (includes env) browse env # Show current environment (local or remote) browse env local # Switch to local Chrome browse env remote # Switch to Browserbase (requires API keys) browse pages # List all open tabs browse tab_switch # Switch to tab by index browse tab_close [index] # Close tab ``` ### Typical workflow 1. `browse open ` — navigate to the page 2. `browse snapshot` — read the accessibility tree to understand page structure and get element refs 3. `browse click ` / `browse type ` / `browse fill ` — interact using refs from snapshot 4. `browse snapshot` — confirm the action worked 5. Repeat 3-4 as needed 6. `browse stop` — close the browser when done ## Quick Example ```bash browse open https://example.com browse snapshot # see page structure + element refs browse click @0-5 # click element with ref 0-5 browse get title browse stop ``` ## Mode Comparison | Feature | Local | Browserbase | |---------|-------|-------------| | Speed | Faster | Slightly slower | | Setup | Chrome required | API key required | | Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) | | CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) | | Residential proxies | No | Yes (201 countries, geo-targeting) | | Session persistence | No | Yes (cookies/auth persist across sessions) | | Best for | Development/simple pages | Protected sites, bot detection, production scraping | ## Best Practices 1. **Always `browse open` first** before interacting 2. **Use `browse snapshot`** to check page state — it's fast and gives you element refs 3. **Only screenshot when visual context is needed** (layout checks, images, debugging) 4. **Use refs from snapshot** to click/interact — e.g., `browse click @0-5` 5. **`browse stop`** when done to clean up the browser session ## Troubleshooting - **"No active page"**: Run `browse stop`, then check `browse status`. If it still says running, kill the zombie daemon with `pkill -f "browse.*daemon"`, then retry `browse open` - **Chrome not found**: Install Chrome or use `browse env remote` - **Action fails**: Run `browse snapshot` to see available elements and their refs - **Browserbase fails**: Verify API key and project ID are set ## Switching to Remote Mode Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it. Don't switch for simple sites (docs, wikis, public APIs, localhost). ```bash browse env remote # switch to Browserbase browse env local # switch back to local Chrome ``` The switch is sticky until you run `browse stop` or switch again. For detailed examples, see [EXAMPLES.md](EXAMPLES.md). For API reference, see [REFERENCE.md](REFERENCE.md).