skills/pkiv_browse

Fork 0

Files

zlei9 48cba2e574 Initial commit with translated description

2026-03-29 13:17:20 +08:00

11 KiB

Raw Blame History

Browser Automation CLI Reference

Technical reference for the browse CLI tool.

Architecture
Command Reference
Configuration
- Global Flags
- Environment Variables
Error Messages

Architecture

The browse CLI is a daemon-based command-line tool:

Daemon process: A background process manages the browser instance. Auto-starts on the first command (e.g., browse open), persists across commands, and stops with browse stop.
Local mode (default): Launches a local Chrome/Chromium instance.
Remote mode (Browserbase): Connects to a Browserbase cloud browser session when BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID are set.
Accessibility-first: Use browse snapshot to get the page's accessibility tree with element refs, then interact using those refs.

Command Reference

`open <url>`

Navigate to a URL. Alias: goto. Auto-starts the daemon if not running.

browse open https://example.com
browse open https://example.com --wait networkidle   # wait for all network requests to finish (useful for SPAs)
browse open https://example.com --wait domcontentloaded

The --wait flag controls when navigation is considered complete. Values: load (default), domcontentloaded, networkidle. Use networkidle for JavaScript-heavy pages that fetch data after initial load.

`reload`

Reload the current page.

browse reload

`back` / `forward`

Navigate browser history.

browse back
browse forward

Page State

`snapshot`

Get the accessibility tree with interactive element refs. This is the primary way to understand page structure.

browse snapshot
browse snapshot --compact                # tree only, no ref maps

Returns a text representation of the page with refs like @0-5 that can be passed to click. Use --compact for shorter output when you only need the tree.

`screenshot [path]`

Take a visual screenshot. Slower than snapshot and uses vision tokens.

browse screenshot                        # auto-generated path
browse screenshot ./capture.png          # custom path
browse screenshot --full-page            # capture entire scrollable page

`get <property> [selector]`

Get page properties. Available properties: url, title, text, html, value, box, visible, checked.

browse get url                           # current URL
browse get title                         # page title
browse get text "body"                   # all visible text (selector required)
browse get text ".product-info"          # text within a CSS selector
browse get html "#main"                  # inner HTML of an element
browse get value "#email-input"          # value of a form field
browse get box "#header"                 # bounding box (centroid coordinates)
browse get visible ".modal"              # check if element is visible
browse get checked "#agree"              # check if checkbox/radio is checked

Note: get text requires a CSS selector argument — use "body" for full page text.

`refs`

Show the cached ref map from the last browse snapshot. Useful for looking up element refs without re-running a full snapshot.

browse refs

Interaction

`click <ref>`

Click an element by its ref from browse snapshot output.

browse click @0-5                        # click element with ref 0-5

`click_xy <x> <y>`

Click at exact viewport coordinates.

browse click_xy 500 300

`hover <x> <y>`

Hover at viewport coordinates.

browse hover 500 300

`type <text>`

Type text into the currently focused element.

browse type "Hello, world!"
browse type "slow typing" --delay 100    # 100ms between keystrokes
browse type "human-like" --mistakes      # simulate human typing with typos

`fill <selector> <value>`

Fill an input element matching a CSS selector and press Enter.

browse fill "#search" "browser automation"
browse fill "input[name=email]" "user@example.com"
browse fill "#search" "query" --no-press-enter   # fill without pressing Enter

`select <selector> <values...>`

Select option(s) from a dropdown.

browse select "#country" "United States"
browse select "#tags" "javascript" "typescript"    # multi-select

`press <key>`

Press a keyboard key or key combination.

browse press Enter
browse press Tab
browse press Escape
browse press Cmd+A                       # select all (Mac)
browse press Ctrl+C                      # copy (Linux/Windows)

`scroll <x> <y> <deltaX> <deltaY>`

Scroll at a given position by a given amount.

browse scroll 500 300 0 -300             # scroll up at (500, 300)
browse scroll 500 300 0 500              # scroll down

`drag <fromX> <fromY> <toX> <toY>`

Drag from one viewport coordinate to another.

browse drag 80 80 310 100                # drag with default 10 steps
browse drag 80 80 310 100 --steps 20     # more intermediate steps
browse drag 80 80 310 100 --delay 50     # 50ms between steps
browse drag 80 80 310 100 --button right # use right mouse button
browse drag 80 80 310 100 --xpath        # return source/target XPaths

`highlight <selector>`

Highlight an element on the page for visual debugging.

browse highlight "#submit-btn"           # highlight for 2 seconds (default)
browse highlight ".nav" -d 5000          # highlight for 5 seconds

`is <check> <selector>`

Check element state. Available checks: visible, checked.

browse is visible ".modal"               # returns { visible: true/false }
browse is checked "#agree"               # returns { checked: true/false }

`wait <type> [arg]`

Wait for a condition.

browse wait load                         # wait for page load
browse wait "selector" ".results"        # wait for element to appear
browse wait timeout 3000                 # wait 3 seconds

Session Management

`start`

Start the browser daemon manually. Usually not needed — the daemon auto-starts on first command.

browse start

`stop`

Stop the browser daemon and close the browser.

browse stop
browse stop --force                      # force kill if daemon is unresponsive

`status`

Check whether the daemon is running, its connection details, and current environment.

browse status

`env [local|remote]`

Show or switch the browser environment. Without arguments, prints the current environment. With an argument, stops the running daemon and restarts in the specified environment. The switch is sticky — subsequent commands stay in the chosen environment until you switch again or run browse stop.

browse env                               # print current environment
browse env local                         # switch to local Chrome
browse env remote                        # switch to Browserbase (requires API keys)

`newpage [url]`

Create a new tab, optionally navigating to a URL.

browse newpage                           # open blank tab
browse newpage https://example.com       # open tab with URL

`pages`

List all open tabs.

browse pages

`tab_switch <index>`

Switch to a tab by its index (from browse pages).

browse tab_switch 1

`tab_close [index]`

Close a tab. Closes current tab if no index given.

browse tab_close          # close current tab
browse tab_close 2        # close tab at index 2

JavaScript Evaluation

`eval <expression>`

Evaluate JavaScript in the page context.

browse eval "document.title"
browse eval "document.querySelectorAll('a').length"

Viewport

`viewport <width> <height>`

Set the browser viewport size.

browse viewport 1920 1080

Network Capture

Capture network requests to the filesystem for inspection.

`network on`

Enable network request capture. Creates a temp directory where requests and responses are saved as JSON files.

browse network on

`network off`

Disable network capture.

browse network off

`network path`

Show the capture directory path.

browse network path

`network clear`

Clear all captured requests.

browse network clear

Configuration

Global Flags

`--json`

Output as JSON for all commands. Useful for structured, parseable output.

browse --json get url                    # returns {"url": "https://..."}
browse --json snapshot                   # returns JSON accessibility tree

`--session <name>`

Run commands against a named session, enabling multiple concurrent browsers.

browse --session work open https://a.com
browse --session personal open https://b.com

Environment Variables

Variable	Required	Description
`BROWSERBASE_API_KEY`	For remote mode	API key from https://browserbase.com/settings
`BROWSERBASE_PROJECT_ID`	For remote mode	Project ID from Browserbase dashboard

When both are set, the CLI uses Browserbase remote sessions. Otherwise, it falls back to local Chrome.

Setting credentials

export BROWSERBASE_API_KEY="bb_live_..."
export BROWSERBASE_PROJECT_ID="proj_..."

Get these values from https://browserbase.com/settings.

Error Messages

"No active page"

The daemon is running but has no page open.
Fix: Run browse open <url>. If the issue persists, run browse stop and retry. For zombie daemons: pkill -f "browse.*daemon".

"Chrome not found" / "Could not find local Chrome installation"

Chrome/Chromium is not installed or not in a standard location.
Fix: Install Chrome, or switch to remote with browse env remote (no local browser needed).

"Daemon not running"

No daemon process is active. Most commands auto-start the daemon, but snapshot, click, etc. require an active session.
Fix: Run browse open <url> to start a session.

Element ref not found (e.g., "@0-5")

The ref from a previous snapshot is no longer valid (page changed).
Fix: Run browse snapshot again to get fresh refs.

Timeout errors

The page took too long to load or an element didn't appear.
Fix: Try browse wait load before interacting, or increase wait time.

11 KiB Raw Blame History

Browser Automation CLI Reference

Table of Contents

Architecture

Command Reference

Navigation

open <url>

reload

back / forward

Page State

snapshot

screenshot [path]

get <property> [selector]

refs

Interaction

click <ref>

click_xy <x> <y>

hover <x> <y>

type <text>

fill <selector> <value>

select <selector> <values...>

press <key>

scroll <x> <y> <deltaX> <deltaY>

drag <fromX> <fromY> <toX> <toY>

highlight <selector>

is <check> <selector>

wait <type> [arg]