# https://github.com/vercel-labs/agent-browser 项目说明书

生成时间：2026-05-15 11:32:46 UTC

## 目录

- [Introduction to Agent Browser](#introduction)
- [Installation Guide](#installation-guide)
- [Element References System](#element-references)
- [Architecture Overview](#architecture-overview)
- [Daemon and CDP Protocol](#daemon-and-cdp)
- [Navigation Commands](#navigation-commands)
- [Interaction Commands](#interaction-commands)
- [State Inspection Commands](#state-inspection-commands)
- [Browser Engine Integration](#browser-engines)
- [Authentication and Session Persistence](#authentication)

<a id='introduction'></a>

## Introduction to Agent Browser

### 相关页面

相关主题：[Installation Guide](#installation-guide), [Architecture Overview](#architecture-overview)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
- [skills/agent-browser/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md)
- [skill-data/core/references/commands.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)
- [skill-data/core/references/snapshot-refs.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/snapshot-refs.md)
- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)
- [examples/environments/app/page.tsx](https://github.com/vercel-labs/agent-browser/blob/main/examples/environments/app/page.tsx)
</details>

# Introduction to Agent Browser

Agent Browser is a high-performance, native Rust CLI tool designed for browser automation and AI agent integration. Unlike traditional browser automation frameworks that rely on Node.js wrappers or third-party libraries, Agent Browser communicates directly with Chrome/Chromium via the Chrome DevTools Protocol (CDP), providing a lightweight and reliable solution for web interaction tasks.

## Overview

Agent Browser serves as a bridge between AI agents and web browsers, enabling autonomous web navigation, interaction, and data extraction. It is compatible with a wide range of AI agent platforms including Cursor, Claude Code, Codex, Continue, and Windsurf.

| Aspect | Description |
|--------|-------------|
| **Language** | Rust (native CLI) |
| **Protocol** | Chrome DevTools Protocol (CDP) |
| **Dependencies** | No Playwright or Puppeteer dependency |
| **Platform** | Chrome/Chromium |
| **License** | See repository LICENSE |

**资料来源：** [skills/agent-browser/SKILL.md]()

## Architecture

Agent Browser follows a modular architecture with distinct layers for CLI handling, native browser control, and extensible skills.

```mermaid
graph TD
    A[User / AI Agent] --> B[CLI Layer<br/>Rust Commands]
    B --> C[Native Actions Layer<br/>CDP Dispatcher]
    C --> D[Chrome/Chromium<br/>via CDP]
    
    E[Skills System] --> B
    E --> F[Core Skills]
    E --> G[Specialized Skills]
    
    G --> G1[Electron Apps]
    G --> G2[Slack Workspace]
    G --> G3[Exploratory Testing]
    G --> G4[Cloud Providers]
    
    H[Session Management] --> C
    H --> H1[Auth Vault]
    H --> H2[State Persistence]
    H --> H3[Video Recording]
```

**资料来源：** [skill-data/core/SKILL.md](), [skills/agent-browser/SKILL.md]()

## Core Concepts

### Accessibility-Tree Snapshots

Agent Browser generates accessibility-tree snapshots that provide structured, human-readable representations of web pages. Each interactive element receives a unique reference ID (e.g., `@e1`, `@e2`) that can be used for subsequent interactions.

Example snapshot output:
```
Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"
```

**资料来源：** [skill-data/core/references/snapshot-refs.md](), [skill-data/core/SKILL.md]()

### Element Reference Notation

Element references follow a consistent notation pattern:

```
@e1 [tag attribute="value"] "text content" placeholder="hint"
```

| Component | Description |
|-----------|-------------|
| `@e1` | Unique reference ID |
| `tag` | HTML tag name |
| `attribute="value"` | Key attributes |
| `"text content"` | Visible text |
| `placeholder="hint"` | Additional attributes |

**资料来源：** [skill-data/core/references/snapshot-refs.md]()

## Command Reference

### Navigation Commands

| Command | Description |
|---------|-------------|
| `agent-browser open [url]` | Launch browser with optional navigation |
| `agent-browser back` | Navigate backward |
| `agent-browser forward` | Navigate forward |
| `agent-browser reload` | Reload current page |
| `agent-browser close` | Close browser |
| `agent-browser connect <port>` | Connect to existing browser via CDP |

**资料来源：** [skill-data/core/references/commands.md]()

### Interaction Commands

| Command | Description |
|---------|-------------|
| `agent-browser click <ref>` | Click an element |
| `agent-browser fill <ref> <text>` | Type text into input |
| `agent-browser select <ref> <value>` | Select dropdown option |
| `agent-browser check <ref>` | Check a checkbox |
| `agent-browser scroll <direction> <pixels>` | Scroll page |

**资料来源：** [cli/src/native/actions.rs]()

### Data Retrieval Commands

| Command | Description |
|---------|-------------|
| `agent-browser snapshot [-i]` | Get page snapshot (interactive only with `-i`) |
| `agent-browser screenshot [path]` | Capture screenshot |
| `agent-browser get text <ref>` | Get visible text |
| `agent-browser get attr <ref> <name>` | Get attribute value |
| `agent-browser get url` | Get current URL |
| `agent-browser get title` | Get page title |

**资料来源：** [cli/src/output.rs](), [cli/src/native/actions.rs]()

### Network Control Commands

| Command | Description |
|---------|-------------|
| `agent-browser network route <url>` | Intercept network request |
| `agent-browser network unroute <url>` | Remove interception |
| `agent-browser network requests [--clear]` | View/clear network requests |
| `agent-browser network har <start\|stop> [path]` | Capture HAR file |

**资料来源：** [skill-data/core/references/commands.md](), [cli/src/output.rs]()

### Cookie and Storage Management

```bash
agent-browser cookies get           # View all cookies
agent-browser cookies set --url <url> --name <name> --value <val>
agent-browser cookies clear         # Clear all cookies
agent-browser storage local         # Manage localStorage
agent-browser storage session       # Manage sessionStorage
```

**资料来源：** [cli/src/output.rs]()

### Browser Settings Commands

| Command | Description |
|---------|-------------|
| `agent-browser set viewport <w> <h>` | Set viewport size |
| `agent-browser set device <name>` | Emulate device |
| `agent-browser set geo <lat> <lng>` | Set geolocation |
| `agent-browser set offline on\|off` | Toggle offline mode |
| `agent-browser set headers <json>` | Set custom headers |
| `agent-browser set media dark\|light` | Set color scheme |

**资料来源：** [cli/src/output.rs]()

## Sessions and State Management

Agent Browser supports multiple concurrent browser sessions with state persistence.

```mermaid
graph LR
    A[Session A] --> B[State File A]
    C[Session B] --> D[State File B]
    E[Auth Vault] --> A
    E[Auth Vault] --> C
```

**Key Features:**

- **Named Sessions**: `--session <name>` flag for multiple sessions
- **State Persistence**: Save and restore browser state
- **Auth Vault**: Secure credential storage
- **Video Recording**: Capture browser activity

**资料来源：** [skill-data/core/SKILL.md](), [skills/agent-browser/SKILL.md]()

## Skills System

Agent Browser uses an extensible skills system that provides specialized workflows for different environments.

### Core Skills

```bash
agent-browser skills get core             # Core workflows and common patterns
agent-browser skills get core --full      # Include full command reference
```

### Specialized Skills

| Skill | Description | Command |
|-------|-------------|---------|
| **Electron** | Desktop app automation | `agent-browser skills get electron` |
| **Slack** | Workspace automation | `agent-browser skills get slack` |
| **Dogfood** | Exploratory testing/QA | `agent-browser skills get dogfood` |
| **Vercel Sandbox** | Cloud browser in microVMs | `agent-browser skills get vercel-sandbox` |
| **AgentCore** | AWS Bedrock cloud browsers | `agent-browser skills get agentcore` |

**资料来源：** [skills/agent-browser/SKILL.md]()

## React Developer Tools Integration

Agent Browser includes built-in React DevTools support for debugging React applications:

| Command | Description |
|---------|-------------|
| `agent-browser react_tree` | View React component tree |
| `agent-browser react_inspect` | Inspect component props/state |
| `agent-browser react_renders_start` | Track render counts |
| `agent-browser react_renders_stop` | Stop render tracking |

**资料来源：** [cli/src/native/actions.rs](), [cli/src/react/suspense.rs]()

### Suspense Boundary Analysis

Agent Browser can analyze React Suspense boundaries with actionability scoring:

| Blocker Kind | Weight | Actionability |
|--------------|--------|---------------|
| ClientHook | 7 | 90% |
| RequestApi | 6 | 88% |
| ServerFetch | 5 | 82% |
| Cache | 4 | 74% |
| Stream | 3 | 60% |
| Unknown | 2 | 35% |
| Framework | 1 | 18% |

**资料来源：** [cli/src/react/suspense.rs]()

## Dashboard Interface

Agent Browser includes a web-based dashboard for visual browser management:

```mermaid
graph TD
    A[Dashboard] --> B[Controls Panel]
    A --> C[Result Panel]
    A --> D[Network Panel]
    A --> E[Extensions Panel]
    
    B --> B1[URL Input]
    B --> B2[Mode Selector]
    B --> B3[Action Controls]
    
    C --> C1[Screenshot View]
    C --> C2[Snapshot View]
    C --> C3[Step History]
    
    D --> D1[Request List]
    D --> D2[HAR Export]
    
    E --> E1[Extension List]
    E --> E2[Extension Details]
```

The dashboard is built with React and supports:

- Resizable panels for flexible layouts
- Theme switching (light/dark)
- Mobile-responsive design
- Real-time step history

**资料来源：** [examples/environments/app/page.tsx](), [packages/dashboard/src/components/network-panel.tsx](), [packages/dashboard/src/components/extensions-panel.tsx]()

## Best Practices

### 1. Always Snapshot Before Interacting

```bash
# CORRECT - Snapshot first to get refs
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs first
agent-browser click @e1            # Use ref

# WRONG - Ref doesn't exist yet
agent-browser open https://example.com
agent-browser click @e1            # Will fail!
```

### 2. Re-snapshot After Navigation

Element references change when the page navigates. Always take a new snapshot after clicking links or navigating to new pages.

### 3. Use Sessions for Complex Workflows

```bash
agent-browser --session my-session open https://example.com
agent-browser --session my-session snapshot -i
# ... perform actions ...
agent-browser --session my-session close
```

**资料来源：** [skill-data/core/references/snapshot-refs.md]()

## Installation and Setup

### Prerequisites

- Chrome or Chromium browser installed
- Operating system: macOS, Linux, or Windows

### Installation

Refer to the repository's installation instructions for your platform. Agent Browser is distributed as a native binary with no runtime dependencies.

### Configuration Files

| File | Purpose |
|------|---------|
| `~/.agent-browser/` | Default config directory |
| Sessions | Stored in config directory |
| Auth Vault | Encrypted credential storage |

**资料来源：** [AGENTS.md]()

## Summary

Agent Browser provides a powerful, efficient, and AI-agent-friendly approach to browser automation. Its key differentiators include:

- **Native Rust implementation** for high performance
- **Direct CDP communication** without third-party dependencies
- **Accessibility-tree snapshots** for reliable element targeting
- **Session management** for complex multi-step workflows
- **Extensible skills system** for specialized environments
- **Built-in React DevTools** integration for debugging

These features make Agent Browser an ideal choice for AI agents, automated testing pipelines, and developer workflows requiring precise browser control.

---

<a id='installation-guide'></a>

## Installation Guide

### 相关页面

相关主题：[Introduction to Agent Browser](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/vercel-labs/agent-browser/blob/main/README.md)
- [AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)
- [skills/agent-browser/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [cli/src/flags.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/flags.rs)
</details>

# Installation Guide

## Overview

The agent-browser project is a native Rust CLI tool designed for browser automation, providing AI agents with reliable web interaction capabilities. Unlike traditional browser automation tools that rely on Node.js wrappers, agent-browser delivers a fast, lightweight solution built directly in Rust with Chrome/Chromium support via Chrome DevTools Protocol (CDP). The installation process handles downloading the necessary Chrome browser binaries, setting up platform-specific binaries, and configuring dependencies for the dashboard UI.

资料来源：[AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)

## Prerequisites

### System Requirements

Before installing agent-browser, ensure your system meets the following requirements:

| Requirement | Details |
|-------------|---------|
| **Operating System** | macOS, Linux, or Windows (7 platform binaries built) |
| **Chrome/Chromium** | Required for browser automation functionality |
| **Rust Toolchain** | Required for building from source |
| **Node.js/pnpm** | Required for dashboard development |

The project builds all 7 platform binaries during CI/CD, including native binaries for different architectures. Chrome is downloaded directly from Chrome for Testing during the installation process, eliminating the need for system-installed Chrome browsers.

资料来源：[AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)

### Required Dependencies

| Dependency | Purpose | Installation Method |
|------------|---------|---------------------|
| Chrome/Chromium | Browser automation target | Auto-downloaded via `install` command |
| Cargo/Rust | Building CLI from source | [rustup.rs](https://rustup.rs) |
| pnpm | Dashboard package management | `npm install -g pnpm` |

## Installation Methods

### Method 1: npm Package Installation (Recommended)

The recommended installation method uses the npm registry for cross-platform compatibility:

```bash
npm install -g @agent-browser/cli
```

After installation, you must run the setup command to download Chrome binaries:

```bash
agent-browser install
```

资料来源：[skills/agent-browser/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md)

### Method 2: Building from Source

For development or customization, build the CLI from source:

```bash
# Clone the repository
git clone https://github.com/vercel-labs/agent-browser.git
cd agent-browser

# Install dependencies and build
cd cli && cargo build --release
```

The Rust codebase architecture follows a modular structure:

```graph TD
    A[cli/src/native/] --> B[daemon/]
    A --> C[actions/]
    A --> D[browser/]
    A --> E[CDP client/]
    A --> F[snapshot/]
    A --> G[state/]
```

The `--engine` flag allows selecting between Chrome and Lightpanda browser engines, providing flexibility in automation scenarios.

资料来源：[AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)

### Method 3: Docker Installation

For containerized environments, Docker builds are supported:

```bash
# Build from the project's Dockerfile
docker build -t agent-browser -f docker/Dockerfile.build .
```

Docker installation is particularly useful for CI/CD pipelines and reproducible automation environments where system dependencies need to be isolated.

## Post-Installation Setup

### Chrome Binary Download

After installing the CLI package, you must download the Chrome binary:

```bash
agent-browser install
```

This command retrieves Chrome directly from Chrome for Testing, ensuring a compatible and up-to-date browser binary is available for all automation tasks. The `--download-path` flag can specify a custom location:

```bash
agent-browser --download-path /custom/path install
```

资料来源：[cli/src/flags.rs:45-49](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/flags.rs)

### Verifying Installation

Verify the installation by checking the version and available commands:

```bash
agent-browser --version
agent-browser --help
```

The CLI provides comprehensive command documentation through the help system:

| Command | Description |
|---------|-------------|
| `agent-browser open <url>` | Open a URL in the browser |
| `agent-browser snapshot` | Capture accessibility tree with element refs |
| `agent-browser click @<ref>` | Click element by reference |
| `agent-browser skills get <name>` | Retrieve skill documentation |
| `agent-browser install` | Download Chrome binaries |

资料来源：[cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)

## Skill Documentation Loading

Agent-browser uses a skill-based documentation system that loads content dynamically based on the installed version:

```bash
# Load core workflows and common patterns
agent-browser skills get core

# Include full command reference and templates
agent-browser skills get core --full

# List all available skills
agent-browser skills list
```

Available specialized skills:

| Skill | Purpose |
|-------|---------|
| `electron` | Electron desktop apps (VS Code, Slack, Discord, Figma) |
| `slack` | Slack workspace automation |
| `dogfood` | Exploratory testing and QA |
| `vercel-sandbox` | Agent-browser inside Vercel Sandbox microVMs |
| `agentcore` | AWS Bedrock AgentCore cloud browsers |

资料来源：[skills/agent-browser/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md)

## Platform-Specific Considerations

### macOS

On macOS, if you encounter security prompts about unsigned applications, you may need to allow the application in System Preferences > Security & Privacy, or run:

```bash
xattr -d com.apple.quarantine /path/to/agent-browser
```

### Linux

Linux distributions require WebKit/GTK dependencies for Chrome. Install via your package manager:

```bash
# Debian/Ubuntu
sudo apt-get install libgtk-3-0 libnss3

# Fedora
sudo dnf install gtk3 nss
```

### Windows

Windows installations automatically configure the required runtime dependencies. Ensure Windows Subsystem for Linux (WSL) compatibility if running in hybrid environments.

## Running Tests

After installation, verify the setup by running the test suite:

```bash
# Unit tests (fast, no Chrome required)
cd cli && cargo test

# End-to-end tests (requires Chrome installed)
cd cli && cargo test e2e -- --ignored --test-threads=1
```

The project contains approximately 320 unit tests and 18 e2e tests. E2E tests launch real headless Chrome instances and must run serially to avoid instance contention.

资料来源：[AGENTS.md](https://github.com/vercel-labs/agent-browser/blob/main/AGENTS.md)

## Troubleshooting

### Chrome Download Failures

If the `install` command fails to download Chrome:

1. Check network connectivity to `Chrome for Testing`
2. Verify write permissions to the download directory
3. Use `--download-path` to specify an alternative location with proper permissions

### Permission Denied Errors

Ensure the agent-browser binary has execute permissions:

```bash
chmod +x /path/to/agent-browser
```

### Engine Selection

If Chrome automation fails, try specifying the engine explicitly:

```bash
agent-browser --engine chrome open https://example.com
```

The `--engine` flag supports Chrome (default) and Lightpanda engines for different automation scenarios.

## Next Steps

After successful installation:

1. Load core skill documentation: `agent-browser skills get core --full`
2. Open a test URL: `agent-browser open https://example.com`
3. Capture a snapshot: `agent-browser snapshot -i`
4. Explore specialized skills for your use case

资料来源：[skills/agent-browser/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md)

---

<a id='element-references'></a>

## Element References System

### 相关页面

相关主题：[State Inspection Commands](#state-inspection-commands), [Interaction Commands](#interaction-commands)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/commands.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)
- [skill-data/core/references/snapshot-refs.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/snapshot-refs.md)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
</details>

# Element References System

The Element References System is a core mechanism in agent-browser that provides stable, human-readable identifiers for DOM elements during browser automation tasks. Instead of relying on fragile CSS selectors or XPath expressions, the system assigns unique reference IDs (such as `@e1`, `@e2`) that persist across page states and can be used reliably in subsequent automation commands.

## Overview

Element references serve as the primary interface between automation scripts and the browser's accessibility tree. When a snapshot is taken, each interactive element receives a reference ID that can be used in commands like `click`, `fill`, `type`, and `get` without requiring re-selection.

```mermaid
graph TD
    A[Browser Page] --> B[snapshot Command]
    B --> C[Accessibility Tree Traversal]
    C --> D[Element Identification]
    D --> E[Reference Assignment]
    E --> F[@e1 @e2 @e3 ...]
    F --> G[Automation Commands]
    G --> H[click @e1]
    G --> I[fill @e2]
    G --> J[get text @e3]
```

## Reference Notation Format

Element references follow a standardized notation format that encodes element metadata:

```
@e1 [tag type="value"] "text content" placeholder="hint"
│    │   │             │               │
│    │   │             │               └─ Additional attributes
│    │   │             └─ Visible text
│    │   └─ Key attributes shown
│    └─ HTML tag name
└─ Unique ref ID
```

资料来源：[skill-data/core/references/snapshot-refs.md]()

### Reference Components

| Component | Description | Example |
|-----------|-------------|---------|
| `@eN` | Unique reference identifier | `@e1`, `@e42` |
| Tag | HTML element type | `button`, `input`, `link` |
| Type attribute | Element type classification | `type="email"`, `type="password"` |
| Text content | Visible text on element | `"Submit"`, `"Log in"` |
| Placeholder | Input placeholder text | `placeholder="Email"` |

## Common Reference Patterns

The snapshot system recognizes common element patterns and standardizes their reference notation:

```bash
@e1 [button] "Submit"                    # Button with text
@e2 [input type="email"]                 # Email input
@e3 [input type="password"]              # Password input
@e4 [a href="/page"] "Link Text"         # Anchor link
@e5 [select]                             # Dropdown
@e6 [textarea] placeholder="Message"     # Text area
@e7 [div class="modal"]                  # Container element
@e8 [img alt="Logo"]                     # Image with alt text
@e9 [checkbox] checked                   # Checked checkbox
@e10 [radio] selected                    # Selected radio button
```

资料来源：[skill-data/core/references/snapshot-refs.md]()

## Snapshot Command Options

The `snapshot` command generates element references with various filtering and formatting options:

```bash
agent-browser snapshot                    # Full tree (verbose)
agent-browser snapshot -i                 # Interactive elements only (preferred)
agent-browser snapshot -i -u              # Include href URLs on links
agent-browser snapshot -i -c              # Compact mode (no empty structural nodes)
agent-browser snapshot -i -d 3            # Cap depth at 3 levels
agent-browser snapshot -s "#main"         # Scope to a CSS selector
agent-browser snapshot -i --json          # Machine-readable output
```

资料来源：[skill-data/core/SKILL.md]()

### Option Reference

| Option | Purpose | Use Case |
|--------|---------|----------|
| `-i` | Interactive elements only | Preferred for automation |
| `-u` | Include href URLs | When link destinations matter |
| `-c` | Compact output | Complex pages with many empty nodes |
| `-d N` | Depth limit | Focus on specific page sections |
| `-s SELECTOR` | CSS scope | Target specific page regions |
| `--json` | JSON format | Programmatic processing |

## Element Reference Commands

Element references are used with various commands to interact with page elements:

### Direct Element Commands

```bash
agent-browser click @e1                   # Click element
agent-browser click @e1 --new-tab          # Click and open in new tab
agent-browser fill @e2 "text"             # Fill input field
agent-browser type @e2 "text"             # Type character by character
agent-browser press Enter                 # Press key on focused element
```

### State Inspection Commands

```bash
agent-browser get text @e1                # Get visible text
agent-browser get html @e1                # Get innerHTML
agent-browser get attr @e1 href           # Get specific attribute
agent-browser get value @e1               # Get input value
agent-browser get title                   # Get page title
agent-browser get url                     # Get current URL
agent-browser get count ".item"           # Count matching elements
```

### State Checking Commands

The `is` command verifies element states:

```bash
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
```

资料来源：[cli/src/output.rs]()

## Find Command and Locators

The `find` command provides an alternative to snapshot-based reference acquisition by locating elements using various criteria:

```bash
agent-browser find <locator> <value> <action> [text]
```

### Supported Locators

| Locator | Description | Example |
|---------|-------------|---------|
| `role` | ARIA role selector | `find role button click` |
| `text` | Text content match | `find text "Submit" click` |
| `label` | Label text association | `find label "Email" fill` |
| `placeholder` | Placeholder attribute | `find placeholder "Search"` |
| `alt` | Alt text (images) | `find alt "Logo" click` |
| `title` | Title attribute | `find title "Help" click` |
| `testid` | Test identifier | `find testid "submit-btn" click` |
| `first` | First matching selector | `find first button click` |
| `last` | Last matching selector | `find last link click` |
| `nth` | Nth matching element | `find nth 5 button click` |

资料来源：[cli/src/commands.rs]()

### Find Command Options

| Option | Purpose |
|--------|---------|
| `--exact` | Perform exact string matching |
| `--name <name>` | Filter by accessible name (role locator) |

## Action Dispatch System

Element reference commands are dispatched to handlers through the action routing system:

```mermaid
graph LR
    A[Command Input] --> B["dispatch(\"click\", state)"]
    B --> C{Match Action}
    C -->|click| D[handle_click]
    C -->|fill| E[handle_fill]
    C -->|get| F[handle_get]
    C -->|is| G[handle_is]
    C -->|find| H[handle_find]
```

The action router maps command strings to their respective handlers in the native daemon:

```rust
"click" => handle_dispatch(cmd, state).await,
"fill" => handle_dispatch(cmd, state).await,
"get" => handle_dispatch(cmd, state).await,
"is" => handle_dispatch(cmd, state).await,
"find" => handle_dispatch(cmd, state).await,
```

资料来源：[cli/src/native/actions.rs]()

### Available Element Actions

| Action | Handler | Purpose |
|--------|---------|---------|
| `click` | `handle_dispatch` | Mouse click |
| `fill` | `handle_dispatch` | Fill input with text |
| `type` | `handle_dispatch` | Character-by-character typing |
| `press` | `handle_dispatch` | Keyboard press |
| `hover` | `handle_dispatch` | Mouse hover |
| `select` | `handle_dispatch` | Select dropdown option |
| `check` | `handle_dispatch` | Check checkbox/radio |
| `uncheck` | `handle_dispatch` | Uncheck checkbox |
| `focus` | `handle_dispatch` | Focus element |
| `blur` | `handle_dispatch` | Blur element |

## Iframe Support

Element references automatically handle iframe content. When a snapshot is taken, iframe elements are resolved and their child accessibility trees are included inline:

```bash
agent-browser snapshot -i
# Output:
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
#   @e3 [input] "Card number"
#   @e4 [input] "Expiry"
#   @e5 [button] "Pay"
# @e6 [button] "Cancel"
```

References to elements inside iframes carry frame context, allowing direct interactions without manual frame switching:

```bash
agent-browser click @e3                    # Works inside iframe
agent-browser fill @e4 "12/25"
```

资料来源：[skill-data/core/references/snapshot-refs.md]()

## Best Practices

### Always Snapshot Before Interacting

```bash
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs first
agent-browser click @e1            # Use ref

# WRONG
agent-browser open https://example.com
agent-browser click @e1            # Ref doesn't exist yet!
```

### Re-Snapshot After Navigation

```bash
agent-browser click @e5            # Navigates to new page
agent-browser snapshot -i          # Get new refs
agent-browser click @e1            # Use new refs
```

### Re-Snapshot After Dynamic Changes

```bash
agent-browser click @e1            # Opens dropdown
agent-browser snapshot -i          # See dropdown items
agent-browser click @e7            # Select item
```

### Snapshot Specific Regions

For complex pages, snapshot specific areas to reduce noise:

```bash
# Snapshot just a form
agent-browser snapshot @e9
```

## Session-Dependent References

Element references are session-dependent and may vary between browser sessions. The same element on the same page might receive different reference IDs in different sessions:

| Element | Typical Ref Range | How to Find |
|---------|------------------|-------------|
| Home tab | e10-e20 | `snapshot -i \| grep "Home"` |
| DMs tab | e10-e20 | `snapshot -i \| grep "DMs"` |
| Activity tab | e10-e20 | `snapshot -i \| grep "Activity"` |
| Search | e5-e10 | `snapshot -i \| grep "Search"` |
| More unreads | e20-e30 | `snapshot -i \| grep "More unreads"` |
| Channel refs | e30+ | `snapshot -i \| grep "channel-name"` |

资料来源：[skill-data/slack/references/slack-tasks.md]()

## Architecture Summary

```mermaid
graph TD
    subgraph "CLI Layer"
        A[User Command] --> B[commands.rs Parser]
        B --> C[Command Dispatch]
    end
    
    subgraph "Native Daemon"
        C --> D[actions.rs Router]
        D --> E[State Manager]
        E --> F[CDP Client]
    end
    
    subgraph "Browser Layer"
        F --> G[Chrome DevTools Protocol]
        G --> H[Accessibility Tree]
    end
    
    subgraph "Reference Generation"
        H --> I[Element ID Assignment]
        I --> J[@eN Reference Labels]
    end
    
    J --> K[Snapshot Output]
    K --> L[Automation Commands]
```

The Element References System provides the foundation for reliable browser automation by abstracting DOM complexity behind human-readable identifiers that remain stable across page states and navigation events.

---

<a id='architecture-overview'></a>

## Architecture Overview

### 相关页面

相关主题：[Daemon and CDP Protocol](#daemon-and-cdp), [Introduction to Agent Browser](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/native/mod.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/mod.rs)
- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/native/stream/websocket.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/stream/websocket.rs)
- [cli/src/native/react/suspense.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/react/suspense.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [cli/src/connection.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/connection.rs)
</details>

# Architecture Overview

agent-browser is a Rust-based browser automation framework that provides high-performance browser control through native CDP (Chrome DevTools Protocol) communication. The system is designed for AI agent integration, enabling reliable and observable browser automation.

## System Architecture

The architecture follows a layered approach with clear separation between the CLI interface, daemon process, and browser engine.

```mermaid
graph TB
    subgraph "Client Layer"
        CLI[CLI Interface]
        Dashboard[Web Dashboard]
    end

    subgraph "Daemon Layer"
        WS[WebSocket Server]
        Dispatcher[Action Dispatcher]
        State[State Manager]
    end

    subgraph "CDP Layer"
        CDP[CDP Client]
        Protocol[Protocol Handler]
    end

    subgraph "Browser Engine"
        Chrome[Chrome/Chromium]
        Lightpanda[Lightpanda]
    end

    CLI --> WS
    Dashboard --> WS
    WS --> Dispatcher
    Dispatcher --> CDP
    CDP --> Chrome
    CDP --> Lightpanda
    Dispatcher --> State
```

## Core Components

### Daemon Architecture

The browser automation daemon is the central coordinator that manages browser sessions and handles command dispatching. It runs as a persistent process that maintains browser state across multiple operations.

**Key Responsibilities:**

| Component | Responsibility |
|-----------|----------------|
| WebSocket Server | Accepts client connections with origin validation |
| Action Dispatcher | Routes commands to appropriate handlers |
| State Manager | Maintains session state and snapshots |
| CDP Client | Manages protocol-level communication |

资料来源：[cli/src/native/mod.rs]()

### Action Dispatch System

The action system provides a comprehensive set of browser automation commands. Actions are dispatched based on command type and handle specific browser operations.

**Action Categories:**

| Category | Commands |
|----------|----------|
| Navigation | `goto`, `back`, `forward`, `reload`, `waitforurl`, `waitforloadstate` |
| Interaction | `click`, `fill`, `press`, `select`, `check`, `uncheck`, `multiselect` |
| Content | `snapshot`, `innertext`, `innerhtml`, `gettext`, `getattribute` |
| State | `cookies_get`, `cookies_set`, `storage_get`, `storage_set` |
| Network | `route`, `unroute`, `requests`, `har` |
| React Debug | `react_tree`, `react_inspect`, `react_renders_start` |

资料来源：[cli/src/native/actions.rs:1-50]()

### CDP Client Layer

The CDP (Chrome DevTools Protocol) client handles low-level communication with the browser engine. This abstraction allows the system to work with different browser engines through a unified interface.

**Supported Engines:**

| Engine | Selection Flag |
|--------|----------------|
| Chrome/Chromium | `--engine chrome` (default) |
| Lightpanda | `--engine lightpanda` |

资料来源：[cli/src/native/mod.rs]()

## Communication Protocol

### WebSocket Server

The daemon exposes a WebSocket server for client communication. Security is enforced through origin validation.

```mermaid
graph LR
    Client[Client App] -->|WebSocket| OriginCheck[Origin Check]
    OriginCheck -->|Allowed| Accept[Accept Connection]
    OriginCheck -->|Blocked| Reject[403 Forbidden]
```

**Origin Validation:**

The server validates the `Origin` header on incoming WebSocket requests. Connections from disallowed origins receive a `403 Forbidden` response before any data exchange occurs.

```rust
if !is_allowed_origin(origin.as_deref()) {
    return Err(reject); // Status: FORBIDDEN
}
```

资料来源：[cli/src/native/stream/websocket.rs:15-30]()

### Request/Response Flow

All commands follow a request-response pattern:

1. Client sends JSON command via WebSocket
2. Server validates origin
3. Dispatcher routes to appropriate handler
4. Handler executes CDP operation
5. Result returned as JSON response

## State Management

### Session State

The daemon maintains persistent state for each browser session:

| State Component | Description |
|-----------------|-------------|
| Tabs | Active tab list and current tab reference |
| Frame | Current frame hierarchy |
| Viewport | Window dimensions |
| Recording | Video recording status |

资料来源：[cli/src/native/stream/websocket.rs:5-15]()

### Snapshot System

The snapshot system provides accessibility-tree based page representation with stable element references (`@e1`, `@e2`, etc.) for reliable element selection across page mutations.

**Best Practice:** Always snapshot before interacting with elements, as refs change after navigation or dynamic content changes.

资料来源：[skill-data/core/references/snapshot-refs.md]()

## React Inspection System

For React-based applications, the daemon provides specialized inspection capabilities:

### Blocker Detection

The system identifies React Suspense boundaries and classifies them by impact:

| Blocker Kind | Weight | Actionability |
|--------------|--------|---------------|
| ClientHook | 7 | 90 |
| RequestApi | 6 | 88 |
| ServerFetch | 5 | 82 |
| Cache | 4 | 74 |
| Stream | 3 | 60 |
| Unknown | 2 | 35 |
| Framework | 1 | 18 |

### Boundary Classification

| Boundary Kind | Description |
|---------------|-------------|
| RouteSegment | Next.js/App Router segment boundary |
| ExplicitSuspense | User-declared `<Suspense>` component |
| Component | Implicit boundary from component structure |

资料来源：[cli/src/native/react/suspense.rs:30-60]()

## CLI Architecture

The CLI provides both interactive and scripted access to browser automation:

### Command Structure

```
agent-browser <command> [args]
```

**Primary Command Groups:**

| Group | Purpose |
|-------|---------|
| `agent-browser open` | Navigate to URL |
| `agent-browser <action>` | Execute automation action |
| `agent-browser set` | Configure browser settings |
| `agent-browser network` | Manage network interception |
| `agent-browser state` | Save/load/restore sessions |
| `agent-browser tab` | Manage browser tabs |
| `agent-browser screenshot` | Capture page images |
| `agent-browser install` | Download Chrome |

资料来源：[cli/src/output.rs]()

## Dashboard Architecture

The web-based dashboard provides visual monitoring and control:

```mermaid
graph TD
    Dashboard[Dashboard App] -->|API| Daemon
    Dashboard -->|Display| Results[screenshots/snapshots]
    Dashboard -->|Controls| Form[Control Form]
```

**Dashboard Features:**

- Resizable split view (controls + results)
- Responsive layout for mobile/desktop
- Real-time screenshot display with base64 encoding
- Snapshot viewer with step history
- Step-by-step playback of automation sequences

资料来源：[packages/dashboard/src/components/extensions-panel.tsx]()

## Installation and Dependencies

### Chrome Installation

The `install` command downloads Chrome directly from Chrome for Testing:

```bash
agent-browser install
```

This ensures the Chrome binary is available for CDP communication without requiring system-wide Chrome installation.

## Testing Architecture

### Unit Tests

Fast tests (~320) that verify individual components without Chrome dependency:

```bash
cd cli && cargo test
```

### End-to-End Tests

Integration tests that launch real headless Chrome:

```bash
cd cli && cargo test e2e -- --ignored --test-threads=1
```

**Requirements:**

- Chrome must be installed
- Tests run serially to avoid browser instance contention

## Security Considerations

| Aspect | Implementation |
|--------|----------------|
| Origin Validation | WebSocket connections validated before acceptance |
| Session Isolation | Each session maintains separate state |
| Credential Storage | Authentication vault for secure credential handling |

## Summary

agent-browser implements a clean three-tier architecture:

1. **Client Layer** - CLI and dashboard provide user interfaces
2. **Daemon Layer** - Rust-based server handles command dispatch and state
3. **CDP Layer** - Browser-agnostic protocol client enables Chrome/Lightpanda support

The design prioritizes reliability (stable element refs), observability (snapshots, screenshots, video recording), and extensibility (skill-based system for specialized automation tasks).

---

<a id='daemon-and-cdp'></a>

## Daemon and CDP Protocol

### 相关页面

相关主题：[Architecture Overview](#architecture-overview), [Browser Engine Integration](#browser-engines)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/native/cdp/client.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/cdp/client.rs)
- [cli/src/native/cdp/chrome.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/cdp/chrome.rs)
- [cli/src/native/stream/mod.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/stream/mod.rs)
- [cli/src/native/stream/websocket.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/stream/websocket.rs)
- [cli/src/native/stream/cdp_loop.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/stream/cdp_loop.rs)
- [cli/cdp-protocol/browser_protocol.json](https://github.com/vercel-labs/agent-browser/blob/main/cli/cdp-protocol/browser_protocol.json)
- [cli/cdp-protocol/js_protocol.json](https://github.com/vercel-labs/agent-browser/blob/main/cli/cdp-protocol/js_protocol.json)
</details>

# Daemon and CDP Protocol

## Overview

The agent-browser project implements a native Rust-based browser automation daemon that communicates with Chrome/Chromium browsers via the Chrome DevTools Protocol (CDP). The architecture separates the automation logic from browser control through WebSocket-based CDP connections, enabling AI agents to interact with web pages through a CLI interface.

**Architecture Layer Diagram:**

```mermaid
graph TD
    A[CLI Interface] --> B[Action Dispatcher]
    B --> C[CDP Client]
    C --> D[WebSocket Stream]
    D --> E[CDP Loop Handler]
    E --> F[Chrome Browser Instance]
    
    G[CDP Protocol Files] --> F
    H[Generated CDP Types] --> C
```

## Daemon Architecture

### Native Daemon Components

The daemon lives in `cli/src/native/` and handles all browser automation tasks. The main components include:

| Component | Location | Purpose |
|-----------|----------|---------|
| Daemon | `cli/src/native/daemon/` | Process management and state coordination |
| Actions | `cli/src/native/actions.rs` | Command handlers for browser operations |
| Browser | `cli/src/native/browser/` | Browser instance lifecycle |
| CDP Client | `cli/src/native/cdp/client.rs` | Protocol communication |
| CDP Loop | `cli/src/native/stream/cdp_loop.rs` | Message processing loop |

资料来源：[cli/src/native/actions.rs](cli/src/native/actions.rs)

### Action Dispatch

The action handler maps command names to their implementation functions. Supported actions include:

```rust
let result = match action {
    "launch" => handle_launch(cmd, state).await,
    "navigate" => handle_navigate(cmd, state).await,
    "url" => handle_url(state).await,
    "cdp_url" => handle_cdp_url(state),
    "inspect" => handle_inspect(state).await,
    "title" => handle_title(state).await,
    "content" => handle_content(state).await,
    "evaluate" => handle_evaluate(cmd, state).await,
    "close" => handle_close(state).await,
    "snapshot" => handle_snapshot(cmd, state).await,
    "screenshot" => handle_screenshot(cmd, state).await,
    "click" => handle_click(cmd, state).await,
    "dblclick" => handle_dblclick(cmd, state).await,
    "fill" => handle_fill(cmd, state).await,
    "type" => handle_type(cmd, state).await,
    "press" => handle_press(cmd, state).await,
    "hover" => handle_hover(cmd, state).await,
    "scroll" => handle_scroll(cmd, state).await,
    // ... additional actions
};
```

资料来源：[cli/src/native/actions.rs:50-75](cli/src/native/actions.rs)

### Browser Engine Selection

The `--engine` flag selects between Chrome and Lightpanda browsers. Chrome is downloaded from Chrome for Testing via the `install` command.

## CDP Protocol Implementation

### Protocol Files

The CDP protocol definitions are stored in JSON format:

| File | Description |
|------|-------------|
| `browser_protocol.json` | Core browser domains (Page, Network, Runtime, etc.) |
| `js_protocol.json` | JavaScript debugging domains |

资料来源：[cli/cdp-protocol/browser_protocol.json](cli/cdp-protocol/browser_protocol.json)

### Auto-Generated Types

CDP types are auto-generated from protocol JSON files:

```rust
/// Auto-generated CDP types from protocol JSON files in `cdp-protocol/`.
///
/// To populate: download `browser_protocol.json` and `js_protocol.json` from
/// <https://github.com/nicolo-ribaudo/nicolo-ribaudo.github.io/> (or any
/// Chromium source) into `cli/cdp-protocol/` and rebuild.
#[allow(clippy::upper_case_acronyms)]
pub mod generated {
    include!(concat!(env!("OUT_DIR"), "/cdp_generated.rs"));
}
```

资料来源：[cli/src/native/cdp/types.rs](cli/src/native/cdp/types.rs)

### CDP Client Structure

The CDP client manages communication with the browser:

```mermaid
graph LR
    A[Command] --> B[CDP Client]
    B --> C[WebSocket Writer]
    C --> D[Browser CDP Endpoint]
    
    E[Browser Events] --> F[WebSocket Reader]
    F --> G[Event Handler]
    G --> H[State Updates]
```

## WebSocket Communication

### Stream Module Architecture

The WebSocket communication is handled by the stream module located in `cli/src/native/stream/`:

| Module | File | Purpose |
|--------|------|---------|
| Stream Core | `cli/src/native/stream/mod.rs` | Stream trait definitions and utilities |
| WebSocket | `cli/src/native/stream/websocket.rs` | WebSocket connection handling |
| CDP Loop | `cli/src/native/stream/cdp_loop.rs` | CDP message processing loop |

### WebSocket Connection

The WebSocket module establishes and maintains connections to the Chrome DevTools endpoint:

```mermaid
sequenceDiagram
    participant CLI as CLI Command
    participant Client as CDP Client
    participant WS as WebSocket
    participant Chrome as Chrome Browser
    
    CLI->>Client: connect(url)
    Client->>WS: establish_connection()
    WS->>Chrome: WebSocket Handshake
    Chrome-->>WS: 101 Switching Protocols
    WS-->>Client: Connected
    
    loop Message Exchange
        CLI->>Client: send_command()
        Client->>WS: write_message()
        WS->>Chrome: CDP JSON Message
        Chrome-->>WS: CDP Response/Event
        WS-->>Client: read_message()
        Client-->>CLI: Result
    end
```

### CDP Loop Handler

The CDP loop processes incoming messages and manages the event queue:

- Handles CDP events from the browser
- Routes responses to pending command callbacks
- Manages connection state and reconnection logic

资料来源：[cli/src/native/stream/cdp_loop.rs](cli/src/native/stream/cdp_loop.rs)

## Browser Connection

### Connection Methods

The daemon supports multiple connection methods:

| Method | Command | Use Case |
|--------|---------|----------|
| Launch new browser | `agent-browser open` | Fresh browser instance |
| Connect to existing | `agent-browser connect 9222` | Attach to running browser |

```bash
# Launch with navigation
agent-browser open <url>

# Connect to running browser on specific port
agent-browser connect 9222

# Launch without navigation (clean slate)
agent-browser open
```

### CDP WebSocket URL

The CDP WebSocket URL can be retrieved programmatically:

```bash
agent-browser cdp_url
```

This returns the WebSocket debugger URL for programmatic browser attachment.

### Browser Version Info

The connection retrieves browser metadata:

```rust
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct BrowserVersionInfo {
    #[serde(rename = "webSocketDebuggerUrl")]
    pub web_socket_debugger_url: Option<String>,
    #[serde(rename = "Browser")]
    pub browser: Option<String>,
}
```

资料来源：[cli/src/native/cdp/types.rs](cli/src/native/cdp/types.rs)

## CDP Protocol Domains

### Supported Domains

The agent-browser supports CDP domains for:

| Domain | Purpose | Key Commands |
|--------|---------|--------------|
| Page | Page navigation and loading | navigate, reload, back, forward |
| Runtime | JavaScript execution | evaluate, callFunctionOn |
| DOM | DOM manipulation | getDocument, describeNode |
| Input | User input simulation | dispatchEvent, insertText |
| Network | Network request interception | setRequestInterception, getResponseBody |
| Target | Browser target management | createTarget, attachToTarget |

### Browser Automation Actions

The following high-level actions are available via CDP:

```bash
# Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload

# DOM Interaction
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e3 "input"
agent-browser hover @e4
agent-browser scroll down 500

# State Queries
agent-browser snapshot
agent-browser screenshot
agent-browser get text @e1
agent-browser get attr @e1 href

# JavaScript
agent-browser evaluate "document.title"
```

## Error Handling

### WebDriver Fallback

The daemon gracefully handles unsupported actions when using WebDriver backend:

```rust
Err(anyhow::anyhow!(
    "Action '{}' is not supported on the WebDriver backend",
    action
))
```

### CDP Error Propagation

CDP errors are propagated through the action chain, enabling detailed error messages for debugging failed browser operations.

## Performance Considerations

### Session Management

- Each browser session maintains a persistent CDP connection
- Sessions can be named and persisted for multi-session workflows
- State persistence allows resuming automation tasks

### Network Idle Detection

The daemon supports waiting for network idle states:

```bash
agent-browser wait --load networkidle
```

This is essential for SPAs and applications with dynamic content loading.

## Security Model

### Credential Management

The daemon provides a secure credential vault for browser authentication:

```bash
agent-browser set credentials <user> <pass>
```

### Cookie Management

Cookies can be set from various formats:

```bash
agent-browser cookies set --curl <file> [--domain <host>]
```

Auto-detects JSON, cURL, and Cookie-header file formats.

## Extension Points

### Custom CDP Scripts

Execute arbitrary JavaScript in the browser context:

```bash
agent-browser addscript <script>
agent-browser addinitscript <script>
```

### Custom Styles

Inject CSS for visual testing:

```bash
agent-browser addstyle <css>
```

## Summary

The Daemon and CDP Protocol architecture enables agent-browser to provide a performant, Rust-native browser automation solution. By implementing direct CDP communication over WebSockets, the project avoids dependencies on Node.js wrappers like Playwright or Puppeteer while maintaining full compatibility with Chrome's DevTools Protocol capabilities.

The separation of concerns between the action dispatcher, CDP client, and WebSocket stream layers ensures maintainability and enables future extensions for additional browser engines and protocol features.

---

<a id='navigation-commands'></a>

## Navigation Commands

### 相关页面

相关主题：[Interaction Commands](#interaction-commands), [State Inspection Commands](#state-inspection-commands)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/commands.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
- [skill-data/core/references/commands.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)
</details>

# Navigation Commands

Navigation Commands in agent-browser provide the fundamental mechanisms for controlling browser state, page loading, and session management. These commands enable AI agents and automated scripts to interact with web pages by controlling navigation flow, managing browser windows, and handling page lifecycle events.

## Overview

The Navigation Commands subsystem handles all operations related to:

- **Browser Launch and Shutdown** — Initialize and terminate browser instances
- **Page Navigation** — Navigate to URLs, handle history traversal, and manage SPA routing
- **Session Management** — Connect to existing browser instances via CDP protocol
- **Pre-navigation Setup** — Configure browser state before initial page load

```mermaid
graph TD
    A[User Command] --> B{Command Type}
    B -->|open/goto/navigate| C[Parse URL & Flags]
    B -->|back/forward/reload| D[History Action]
    B -->|pushstate| E[SPA Navigation]
    B -->|connect| F[CDP Connection]
    B -->|close| G[Cleanup Session]
    
    C --> H{URL Protocol?}
    H -->|http/https| I[Direct Navigation]
    H -->|about/data/file| I
    H -->|none specified| J[Prepend https://]
    
    I --> K[Execute Navigation]
    D --> L[Browser History API]
    E --> M[History PushState + Events]
    F --> N[Remote CDP Session]
    G --> O[Close All Tabs/Session]
    
    K --> P[Return Result JSON]
    L --> P
    M --> P
    N --> P
    O --> P
```

## Core Navigation Commands

### `open` — Launch Browser

Launches a new browser instance. When called without a URL, it opens about:blank and allows staging browser state before the first navigation.

**Usage:**

```bash
agent-browser open
agent-browser open <url>
```

| Variant | Behavior |
|---------|----------|
| `open` (no args) | Launch on about:blank; allows `network route`, `cookies set`, or `addinitscript` before first navigation |
| `open <url>` | Launch and immediately navigate to the specified URL |

**URL Auto-prepend Logic:**

The CLI automatically prepends `https://` if no protocol is specified. Supported protocols include:

| Protocol | Example |
|----------|---------|
| `https://` | `https://example.com` |
| `http://` | `http://localhost:3000` |
| `about:` | `about:blank`, `about:version` |
| `data:` | `data:text/html,<h1>Hello</h1>` |
| `file://` | `file:///path/to/page.html` |
| `chrome-extension://` | `chrome-extension://...` |
| `chrome://` | `chrome://version` |

```rust
let url_lower = url.to_lowercase();
let url = if url_lower.starts_with("http://")
    || url_lower.starts_with("https://")
    || url_lower.starts_with("about:")
    || url_lower.starts_with("data:")
    || url_lower.starts_with("file:")
    || url_lower.starts_with("chrome-extension://")
    || url_lower.starts_with("chrome://")
{
    url.to_string()
} else {
    format!("https://{}", url)
};
```

资料来源：[cli/src/commands.rs:35-50](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)

### `goto` / `navigate` — Navigate to URL

Aliases for navigation to a specific page. Both commands require a URL argument.

```bash
agent-browser goto https://example.com
agent-browser navigate example.com  # auto-prepends https://
```

资料来源：[cli/src/commands.rs:25-30](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)

### `pushstate` — SPA Client-side Navigation

Performs client-side navigation in Single Page Applications (SPA) using `history.pushState`. This command triggers the appropriate navigation events that modern frameworks like Next.js rely on.

```bash
agent-browser pushstate <url>
```

**Behavior:**

1. Calls `history.pushState` with the target URL
2. Dispatches `popstate` and `navigate` events
3. Auto-detects `window.next.router.push` for Next.js applications and triggers RSC fetch

```bash
agent-browser pushstate /dashboard
agent-browser pushstate /products/123
```

### `back` — Go Back

Navigates the current tab backward in browser history.

```bash
agent-browser back
```

### `forward` — Go Forward

Navigates the current tab forward in browser history.

```bash
agent-browser forward
```

### `reload` — Reload Page

Reloads the current page, respecting cache settings.

```bash
agent-browser reload
```

### `close` — Close Browser

Closes the browser instance and terminates the session.

```bash
agent-browser close
agent-browser close --all
```

| Flag | Behavior |
|------|----------|
| (default) | Close current session |
| `--all` | Close all browser sessions |

### `connect` — CDP Remote Connection

Connects to an existing browser instance via Chrome DevTools Protocol (CDP) port.

```bash
agent-browser connect <port>
```

```bash
agent-browser connect 9222  # Connect to browser on port 9222
```

## Pre-navigation Setup (One-turn Batch)

For scenarios requiring state staging before the first navigation (e.g., blocking scripts, setting cookies), agent-browser supports batch operations:

```bash
agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'
```

This pattern:

1. Opens browser on about:blank
2. Registers a network route to abort all script resources
3. Sets cookies from a curl-format cookie file
4. Navigates to the target URL

资料来源：[skill-data/core/references/commands.md:18-26](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)

## Command Dispatch Architecture

The command dispatch system maps command strings to handler functions:

```rust
"open" | "goto" | "navigate" => handle_navigation(cmd, rest, flags, state).await,
"back" => handle_back(cmd, state).await,
"forward" => handle_forward(cmd, state).await,
"reload" => handle_reload(cmd, state).await,
"pushstate" => handle_pushstate(cmd, rest, state).await,
"close" | "quit" | "exit" => handle_close(cmd, rest, state).await,
"connect" => handle_connect(cmd, rest, state).await,
```

资料来源：[cli/src/native/actions.rs:30-45](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)

## Navigation Command Response Format

All navigation commands return a JSON response indicating success or failure:

**Success Response:**

```json
{
  "id": "session-id",
  "action": "navigate",
  "url": "https://example.com"
}
```

**Error Response (Missing URL):**

```json
{
  "error": "MissingArguments",
  "context": "goto",
  "message": "Expected URL argument"
}
```

## Flags for Navigation Commands

| Flag | Applies To | Purpose |
|------|------------|---------|
| `--headed` | `open` | Launch browser in headed (visible) mode |
| `--wait-until <event>` | `goto`, `navigate`, `open` | Wait for navigation event (load, domcontentloaded, networkidle) |
| `--provider <name>` | All navigation | Specify CDP provider (e.g., vercel-sandbox) |
| `--session <name>` | All commands | Use a specific named session |

```bash
agent-browser open --headed
agent-browser goto https://example.com --wait-until networkidle
```

## Common Usage Patterns

### Basic Navigation Flow

```bash
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser back
```

### Debugging with Pre-navigation Setup

```bash
# Block all third-party scripts
agent-browser open
agent-browser network route "*" --abort --resource-type script
agent-browser goto https://example.com
```

### SPA Navigation with Framework Detection

```bash
agent-browser open https://my-nextjs-app.com
agent-browser click @e5  # Navigate to another route
agent-browser pushstate /new-route  # Trigger client-side navigation
agent-browser snapshot -i
```

## Session and Tab Management

Navigation commands operate within the context of a session. Each session can contain multiple tabs:

| Command | Purpose |
|---------|---------|
| `tab new [url]` | Open a new tab |
| `tab list` | List all open tabs |
| `tab <n>` | Switch to tab by index |
| `tab close` | Close current tab |

```bash
agent-browser tab new
agent-browser tab new https://example.com
agent-browser tab 2
agent-browser tab close
```

## Implementation Details

### URL Parsing in `commands.rs`

The navigation handler in `cli/src/commands.rs` performs the following steps:

1. **Argument Extraction** — Scans command arguments for the first non-flag value as URL
2. **Protocol Validation** — Checks if URL starts with a supported protocol scheme
3. **Auto-prepend** — Adds `https://` prefix if no protocol detected
4. **Command Construction** — Builds JSON command payload with action type and URL

```mermaid
sequenceDiagram
    User->>CLI: agent-browser goto example.com
    CLI->>Parser: Parse "goto" command
    Parser->>URL Validator: Check "example.com"
    URL Validator->>URL Validator: No protocol prefix
    URL Validator-->>Parser: Prepend https://
    Parser->>Builder: Build navigate command
    Builder->>Browser: Execute navigation
    Browser-->>User: JSON response
```

## Related Commands

| Command Category | Commands |
|------------------|----------|
| **State Inspection** | `snapshot`, `screenshot`, `get` |
| **Element Interaction** | `click`, `fill`, `type`, `press` |
| **Network Control** | `network route`, `cookies`, `storage` |
| **Browser Settings** | `set viewport`, `set geo`, `set offline` |

---

<a id='interaction-commands'></a>

## Interaction Commands

### 相关页面

相关主题：[Navigation Commands](#navigation-commands), [State Inspection Commands](#state-inspection-commands), [Element References System](#element-references)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/native/interaction.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/interaction.rs)
- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/commands.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
</details>

# Interaction Commands

Interaction Commands are the core primitives that enable AI agents to programmatically control and manipulate web pages in the agent-browser system. These commands provide atomic operations for clicking elements, entering text, scrolling, and capturing page state through an accessibility-tree based reference system.

## Architecture Overview

The interaction system follows a command dispatch pattern where incoming commands are routed to appropriate handlers based on their operation type. The architecture separates concerns between command parsing, execution, and output formatting.

```mermaid
graph TD
    A[User/Agent Input] --> B[Command Parser]
    B --> C[actions.rs Dispatcher]
    C --> D[interaction.rs Handlers]
    D --> E[CDP Protocol Layer]
    E --> F[Browser Engine]
    F --> G[Page Response]
    G --> H[output.rs Formatter]
    H --> I[Terminal/Agent]
    
    C -.->|click, fill, type, scroll| D
    C -.->|mouse, keyboard| D
    C -.->|snapshot, screenshot| D
```

### Component Responsibilities

| Component | File | Purpose |
|-----------|------|---------|
| Command Dispatcher | `actions.rs` | Routes commands to handlers |
| Interaction Handlers | `interaction.rs` | Executes atomic browser operations |
| Output Formatter | `output.rs` | Formats and presents results |
| CDP Layer | Native | Chrome DevTools Protocol communication |

## Element Reference System

Interaction commands use an element reference system (`@e1`, `@e2`, etc.) to identify targets on the page. These references are obtained through snapshot operations and represent unique identifiers in the accessibility tree.

```mermaid
graph LR
    A[Page HTML] --> B[Accessibility Tree]
    B --> C[Snapshot Command]
    C --> D[@e1 button "Submit"]
    C --> E[@e2 input "Email"]
    D --> F[Click @e1]
    E --> G[Fill @e2 "text"]
```

**Reference Format:**
```
@e1 [tag type="value"] "text content" placeholder="hint"
│    │   │             │               │
│    │   │             │               └─ Additional attributes
│    │   │             └─ Visible text
│    │   └─ Key attributes shown
│    └─ HTML tag name
└─ Unique ref ID
```

资料来源：[skill-data/core/references/snapshot-refs.md:1-50]()

## Core Interaction Commands

### Element Selection Commands

| Command | Description | Parameters |
|---------|-------------|------------|
| `find` | Find elements by locator | `<locator> <value> [action] [text]` |
| `count` | Count matching elements | `<selector>` |
| `is` | Check element state | `<what> <selector>` |

**Locators supported:** `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`, `first`, `last`, `nth`

资料来源：[cli/src/output.rs:1-20]()

### Mouse Commands

```mermaid
graph TD
    A[mouse] --> B[move <x> <y>]
    A --> C[down <btn>]
    A --> D[up <btn>]
    A --> E[wheel <dy> <dx>]
    
    B --> F[Dispatch mousemove event]
    C --> G[Dispatch mousedown event]
    D --> H[Dispatch mouseup event]
    E --> I[Dispatch wheel event]
```

| Command | Description |
|---------|-------------|
| `mouse move <x> <y>` | Move cursor to coordinates |
| `mouse down [btn]` | Press mouse button (default: left) |
| `mouse up [btn]` | Release mouse button |
| `mouse wheel <dy> [dx]` | Scroll wheel (delta Y/X) |

资料来源：[cli/src/native/actions.rs:1-30]()

### Keyboard Commands

| Command | Description | Example |
|---------|-------------|---------|
| `type` | Type text (with key events) | `type @e1 "hello"` |
| `press` | Press special key | `press Enter` |
| `setvalue` | Set input value directly | `setvalue @e1 "value"` |

**Special Keys:** `Enter`, `Tab`, `Escape`, `Backspace`, `ArrowUp`, `ArrowDown`, `ArrowLeft`, `ArrowRight`, `F1-F12`, `Control`, `Alt`, `Shift`

资料来源：[cli/src/native/actions.rs:1-30]()

### Scroll Commands

| Command | Description |
|---------|-------------|
| `scroll down <px>` | Scroll down by pixels |
| `scroll up <px>` | Scroll up by pixels |
| `scroll left <px>` | Scroll left by pixels |
| `scroll right <px>` | Scroll right by pixels |

资料来源：[skill-data/core/SKILL.md:1-50]()

### State Inspection Commands

```mermaid
graph TD
    A[get command] --> B{Property Type}
    B -->|attr| C[Get attribute value]
    B -->|value| D[Get input value]
    B -->|text| E[Get visible text]
    B -->|html| F[Get innerHTML]
    B -->|title| G[Get page title]
    B -->|url| H[Get current URL]
    B -->|box| I[Get bounding box]
    B -->|styles| J[Get computed styles]
```

| Command | Description |
|---------|-------------|
| `get text <ref>` | Get visible text of element |
| `get value <ref>` | Get input field value |
| `get attr <ref> <name>` | Get specific attribute |
| `get html <ref>` | Get innerHTML |
| `get title` | Get page title |
| `get url` | Get current URL |
| `get box <ref>` | Get bounding box coordinates |
| `get styles <ref>` | Get computed CSS styles |
| `get cdp-url` | Get CDP debugging URL |

资料来源：[cli/src/output.rs:1-20]()

## Click Variations

The click command supports several modifiers for different interaction patterns:

| Command | Description |
|---------|-------------|
| `click <ref>` | Standard left-click |
| `click <ref> --new-tab` | Click and open in new tab |
| `click <ref> --double` | Double-click |
| `click <ref> --right` | Right-click (context menu) |
| `tap <ref>` | Mobile-style tap (touch events) |

资料来源：[skill-data/core/SKILL.md:1-50]()

## Form Input Commands

### Text Input

```mermaid
graph LR
    A[Input Commands] --> B[type]
    A --> C[fill]
    A --> D[setvalue]
    
    B --> E[Triggers keydown/keyup]
    C --> F[Direct value set]
    D --> G[Direct value assignment]
```

| Command | Description | Behavior |
|---------|-------------|----------|
| `fill <ref> <text>` | Fill input field | Replaces existing value, triggers input events |
| `type <ref> <text>` | Type text character by character | Triggers full key event sequence |
| `setvalue <ref> <value>` | Set value directly | Bypasses sanitization |

资料来源：[cli/src/native/actions.rs:1-30]()

### Other Input Types

| Command | Target | Description |
|---------|--------|-------------|
| `check <ref>` | Checkbox | Check a checkbox |
| `uncheck <ref>` | Checkbox | Uncheck a checkbox |
| `select <ref> <value>` | Select | Select option by value |
| `upload <ref> <path>` | File input | Upload file |

资料来源：[cli/src/native/actions.rs:1-30]()

## Wait and Timing

Wait commands control execution timing for dynamic content:

| Command | Description |
|---------|-------------|
| `wait <ms>` | Wait for milliseconds |
| `wait --load` | Wait for page load event |
| `wait networkidle` | Wait for network to be idle |
| `wait --load networkidle` | Combined load + network idle |

资料来源：[skill-data/core/SKILL.md:1-50]()

## Command Chaining with Batches

Multiple commands can be executed in a single batch operation for efficiency:

```mermaid
graph TD
    A[Batch Command] --> B[Parse JSON Array]
    B --> C[Execute Sequentially]
    C --> D[Command 1]
    D --> E[Command 2]
    E --> F[Command N]
    F --> G[Return Combined Results]
```

Example batch command:
```bash
agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'
```

资料来源：[skill-data/core/references/commands.md:1-30]()

## State Management

### Browser State Commands

| Command | Description |
|---------|-------------|
| `is <state> <ref>` | Check if element is `visible`, `enabled`, `checked` |
| `is open` | Check if browser is open |
| `is closed` | Check if browser is closed |

### Visibility and Enabled States

```mermaid
graph TD
    A[Check State] --> B{Element Type}
    B -->|Button/Input| C[Check: enabled]
    B -->|Checkbox| D[Check: checked]
    B -->|Any| E[Check: visible]
    
    C --> F[Return boolean]
    D --> F
    E --> F
```

资料来源：[cli/src/output.rs:1-20]()

## Advanced Interactions

### React-Specific Commands

For React applications, specialized inspection commands are available:

| Command | Description |
|---------|-------------|
| `react_tree` | Get component tree |
| `react_inspect <ref>` | Inspect React component |
| `react_renders_start` | Start render tracking |
| `react_renders_stop` | Stop render tracking |

资料来源：[cli/src/native/actions.rs:1-30]()

### Dialog Handling

```mermaid
graph TD
    A[Dialog Appears] --> B{dialog type}
    B -->|alert| C[handle_alert]
    B -->|confirm| D[handle_confirm]
    B -->|prompt| E[handle_prompt]
    
    C --> F[dialog accept --message "text"]
    D --> F
    E --> G[dialog accept "input"]
    G --> F
```

| Command | Description |
|---------|-------------|
| `dialog accept [message]` | Accept dialog with optional message |
| `dialog dismiss` | Cancel/dismiss dialog |

资料来源：[cli/src/native/actions.rs:1-30]()

## Common Workflow Patterns

### Basic Navigation and Interaction

```bash
# 1. Open page
agent-browser open https://example.com

# 2. Take snapshot to get refs
agent-browser snapshot -i

# 3. Interact with elements
agent-browser click @e1
agent-browser fill @e2 "user@example.com"
agent-browser press Enter

# 4. Wait for response
agent-browser wait 1000
```

### Form Submission Flow

```bash
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e_email "test@example.com"
agent-browser fill @e_password "secretpassword"
agent-browser click @e_submit
agent-browser wait --load networkidle
agent-browser screenshot result.png
```

### Error Handling Pattern

```bash
# Check if operation succeeded
agent-browser is visible @e_success_message

# If failed, inspect state
agent-browser snapshot -i
agent-browser get text @e_error_message
```

## Command Reference Summary

### Interaction Operations Matrix

| Category | Commands |
|----------|----------|
| **Mouse** | `click`, `mouse move/down/up/wheel`, `dblclick` |
| **Keyboard** | `type`, `press`, `setvalue` |
| **Scroll** | `scroll up/down/left/right` |
| **Forms** | `fill`, `check`, `uncheck`, `select`, `upload` |
| **Inspect** | `get text/value/attr/html/title/url/box/styles` |
| **State** | `find`, `count`, `is` |
| **Timing** | `wait` |

资料来源：[cli/src/native/actions.rs:1-30]()
资料来源：[cli/src/output.rs:1-20]()
资料来源：[skill-data/core/SKILL.md:1-50]()

## Best Practices

1. **Always snapshot before interacting** - Element refs are obtained from snapshots and must be fetched after page load or navigation
2. **Re-snapshot after navigation** - New pages have new accessibility trees with different refs
3. **Use appropriate wait conditions** - Wait for `networkidle` when content loads dynamically
4. **Prefer `fill` over `type`** - `fill` is faster and more reliable for automated workflows
5. **Use `type` for form validation** - When you need key events to trigger validation logic

资料来源：[skill-data/core/references/snapshot-refs.md:1-50]()

---

<a id='state-inspection-commands'></a>

## State Inspection Commands

### 相关页面

相关主题：[Interaction Commands](#interaction-commands), [Element References System](#element-references)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/native/state.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/state.rs)
- [cli/src/commands.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/commands.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [skill-data/core/references/snapshot-refs.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/snapshot-refs.md)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
</details>

# State Inspection Commands

State Inspection Commands in agent-browser provide mechanisms to examine, retrieve, and manage browser state including cookies, web storage, session data, console errors, and DOM element properties. These commands enable debugging, state verification, and persistence of browser sessions across operations.

## Architecture Overview

State inspection in agent-browser operates through a layered architecture where the CLI command layer parses user input, the actions layer dispatches to appropriate handlers, and the browser backend (CDP/WebDriver) executes the actual state retrieval.

```mermaid
graph TD
    A[CLI Input] --> B[commands.rs Parser]
    B --> C[actions.rs Dispatcher]
    C --> D[State Handlers]
    C --> E[Storage Handlers]
    C --> F[Element Handlers]
    D --> G[Browser Backend<br/>Chrome CDP / WebDriver]
    E --> G
    F --> G
    G --> H[State Output]
    
    D -. includes .-> D1[cookies_get/set/clear]
    D -. includes .-> D2[state_save/load/list/clean]
    E -. includes .-> E1[storage_get/set/clear]
    F -. includes .-> F1[gettext/getattr/isvisible]
```

资料来源：[cli/src/native/actions.rs:1-150]()

## Command Categories

State inspection commands are organized into five primary categories:

| Category | Purpose | Commands |
|----------|---------|----------|
| **Cookie Inspection** | Manage HTTP cookies | `cookies_get`, `cookies_set`, `cookies_clear` |
| **Web Storage** | Inspect localStorage/sessionStorage | `storage_get`, `storage_set`, `storage_clear` |
| **Session State** | Save/load browser sessions | `state_save`, `state_load`, `state_list`, `state_clean` |
| **Element Properties** | Query DOM element attributes | `gettext`, `getattribute`, `inputvalue`, `isvisible`, `isenabled`, `ischecked` |
| **Error Inspection** | Retrieve console errors | `errors` |

资料来源：[cli/src/native/actions.rs:80-100]()

## Cookie Inspection

Cookies can be inspected and managed through the `cookies` command family.

### Get Cookies

Retrieves all cookies for the current domain:

```bash
agent-browser cookies get
```

### Set Cookie

Sets a cookie with explicit parameters:

```bash
agent-browser cookies set --url <url> --name <name> --value <value> [--domain <domain>] [--path <path>] [--httpOnly] [--secure] [--sameSite <strict|lax|none>] [--expires <timestamp>]
```

### Set Cookie from File

Auto-detects and imports cookies from JSON, cURL, or Cookie-header format:

```bash
agent-browser cookies set --curl <file> [--domain <host>]
```

### Clear Cookies

Removes all cookies:

```bash
agent-browser cookies clear
```

资料来源：[cli/src/output.rs:1-50]()

## Web Storage Inspection

Web storage commands manage the browser's localStorage and sessionStorage.

### Storage Commands

| Command | Description |
|---------|-------------|
| `storage_get` | Retrieve value from localStorage or sessionStorage |
| `storage_set` | Set a key-value pair in storage |
| `storage_clear` | Clear all items from selected storage |

```bash
# Get storage value
agent-browser storage_get <local|session> <key>

# Set storage value
agent-browser storage_set <local|session> <key> <value>

# Clear storage
agent-browser storage_clear <local|session>
```

资料来源：[cli/src/native/actions.rs:85-90]()

## Session State Management

The agent-browser maintains persistent state in `~/.agent-browser` (or `<tempdir>/agent-browser` when home directory cannot be resolved).

### State Directory Structure

```mermaid
graph LR
    A[~/.agent-browser] --> B[sessions/]
    A --> C[auth/]
    A --> D[encryption.key]
    B --> E[<session-id>/]
    E --> F[state.json]
    E --> G[screenshots/]
```

资料来源：[cli/src/native/state.rs:80-95]()

### State Commands

| Command | Description |
|---------|-------------|
| `state_save` | Save current browser state to disk |
| `state_load` | Restore browser state from saved file |
| `state_list` | List all saved states |
| `state_clean` | Remove states older than specified days |
| `state_rename` | Rename an existing state |

```bash
# Save current state
agent-browser state_save <path> [--name <name>]

# Load saved state
agent-browser state_load <path>

# List all states
agent-browser state_list

# Clean old states (default: 30 days)
agent-browser state_clean [--days <n>]

# Rename a state
agent-browser state_rename --path <path> --name <new_name>
```

### State Directory Resolution

```rust
pub fn get_state_dir() -> PathBuf {
    if let Some(home) = dirs::home_dir() {
        home.join(".agent-browser")
    } else {
        std::env::temp_dir().join("agent-browser")
    }
}

pub fn get_sessions_dir() -> PathBuf {
    get_state_dir().join("sessions")
}
```

资料来源：[cli/src/native/state.rs:80-90]()

## Element Property Inspection

Element inspection commands retrieve properties and states of DOM elements using element references obtained from snapshots.

### Get Text Content

Retrieves the visible text of an element:

```bash
agent-browser gettext @e1
```

### Get HTML Content

Retrieves element innerHTML or innerText:

```bash
agent-browser innerhtml @e1
agent-browser innertext @e1
```

### Get Attributes

Retrieves any attribute value from an element:

```bash
agent-browser getattribute @e1 href
agent-browser getattribute @e1 src
```

### Get Input Value

Retrieves the current value of input elements:

```bash
agent-browser inputvalue @e1
```

### Check Element State

Verify element state properties:

```bash
agent-browser isvisible @e1
agent-browser isenabled @e1
agent-browser ischecked @e1
```

### Count Matching Elements

Count elements matching a selector:

```bash
agent-browser count ".item-class"
```

### Get Bounding Box

Retrieve element dimensions and position:

```bash
agent-browser boundingbox @e1
```

### Get Styles

Retrieve computed CSS styles:

```bash
agent-browser styles @e1
```

资料来源：[cli/src/native/actions.rs:30-60]()

## Find Elements

The `find` command locates DOM elements using various locator strategies.

### Supported Locators

| Locator | Description | Example |
|---------|-------------|---------|
| `role` | Find by ARIA role | `find role button --exact` |
| `text` | Find by text content | `find text "Submit"` |
| `label` | Find form label | `find label "Email"` |
| `placeholder` | Find by placeholder | `find placeholder "Search..."` |
| `alt` | Find by alt attribute | `find alt "profile"` |
| `title` | Find by title attribute | `find title "Close"` |
| `testid` | Find by test ID | `find testid submit-btn` |
| `first` | First element matching selector | `find first ".item"` |
| `last` | Last element matching selector | `find last ".item"` |

### Find Command Syntax

```bash
agent-browser find <locator> <value> [action] [--exact] [--name <name>]
```

### Examples

```bash
# Find button by role and click
agent-browser find role button --exact click

# Find input by placeholder
agent-browser find placeholder "email" fill "test@example.com"

# Find link by text
agent-browser find text "Learn more"
```

资料来源：[cli/src/commands.rs:150-200]()

## Console Error Inspection

Retrieve JavaScript errors logged to the browser console.

### Get Errors

```bash
agent-browser errors
```

Returns a list of all console errors captured during the session.

### Console Monitoring

Enable or disable console message capture:

```bash
agent-browser console enable
agent-browser console disable
```

## Snapshot-Based Inspection

Snapshots provide a hierarchical view of the page DOM with element references.

### Snapshot Modes

| Flag | Description |
|------|-------------|
| `-i` | Interactive elements only (preferred) |
| `-u` | Include href URLs on links |
| `-c` | Compact mode (no empty structural nodes) |
| `-d <n>` | Cap depth at n levels |
| `-s <selector>` | Scope to CSS selector |
| `--json` | Machine-readable JSON output |

### Snapshot Output Format

```
Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"
```

### Snapshot Workflow

```mermaid
graph TD
    A[Open Page] --> B[Snapshot -i]
    B --> C[Parse Element Refs]
    C --> D[Click @e3]
    D --> E[Snapshot -i]
    E --> F[Find Input Fields]
    F --> G[Fill @e3 "email"]
    G --> H[Fill @e4 "password"]
    H --> I[Click @e5]
```

资料来源：[skill-data/core/SKILL.md:1-80]()

## Complete Command Reference

### State Inspection Summary

| Command | Category | Description |
|---------|----------|-------------|
| `cookies get` | Cookie | List all cookies |
| `cookies set --name X --value Y` | Cookie | Set a cookie |
| `cookies clear` | Cookie | Clear all cookies |
| `storage_get <type> <key>` | Storage | Get storage value |
| `storage_set <type> <key> <val>` | Storage | Set storage value |
| `storage_clear <type>` | Storage | Clear storage |
| `state_save <path>` | Session | Save browser state |
| `state_load <path>` | Session | Load browser state |
| `state_list` | Session | List saved states |
| `state_clean [days]` | Session | Clean old states |
| `errors` | Console | Get console errors |
| `gettext @eN` | Element | Get element text |
| `getattribute @eN <attr>` | Element | Get attribute |
| `isvisible @eN` | Element | Check visibility |
| `count <selector>` | Element | Count elements |

资料来源：[cli/src/native/actions.rs:70-100]()

## Usage Patterns

### Inspecting Page State

```bash
# Full page inspection workflow
agent-browser open https://example.com
agent-browser snapshot -i           # Get element refs
agent-browser get title             # Page title
agent-browser get url               # Current URL
agent-browser errors                # Check for console errors
```

### Verifying Element State

```bash
agent-browser click @e1             # Click element
agent-browser wait 500             # Wait for response
agent-browser isvisible @e2        # Verify visibility
agent-browser gettext @e3          # Get text content
```

### Persisting Session State

```bash
agent-browser open https://app.example.com
agent-browser cookies set --name session --value abc123
agent-browser storage_set local user "john"
agent-browser state_save ./my-session   # Persist state
# Later...
agent-browser state_load ./my-session  # Restore state
```

## Summary

State Inspection Commands in agent-browser provide comprehensive capabilities for examining and managing browser state:

- **Cookie Management**: Full CRUD operations on HTTP cookies with file import support
- **Web Storage**: Access to localStorage and sessionStorage
- **Session Persistence**: Save, load, list, and clean browser sessions
- **Element Inspection**: Query text, attributes, states, and styles
- **Element Location**: Find elements by role, text, label, placeholder, and other attributes
- **Console Monitoring**: Capture and retrieve JavaScript errors

These commands work together with the snapshot system to enable precise browser automation workflows with full state observability.

---

<a id='browser-engines'></a>

## Browser Engine Integration

### 相关页面

相关主题：[Daemon and CDP Protocol](#daemon-and-cdp), [Installation Guide](#installation-guide)

The provided context does not contain the source files specified in the query (lightpanda.rs, discovery.rs, webdriver/mod.rs, safari.rs, ios.rs). Without access to the actual browser engine integration source files, I cannot generate an accurate technical wiki page that properly cites the specified files.

However, based on the available context from this repository, I can provide a partial analysis of browser engine-related components:

<details>
<summary>Related Context Files</summary>

The following context files provide indirect information about browser engine integration patterns:

- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [cli/src/output.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/output.rs)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
- [skill-data/core/references/commands.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)
- [packages/dashboard/src/components/session-tree.tsx](https://github.com/vercel-labs/agent-browser/blob/main/packages/dashboard/src/components/session-tree.tsx)

</details>

# Browser Engine Integration

> **Note:** This page requires the actual source files for browser engine implementations (`lightpanda.rs`, `discovery.rs`, `webdriver/mod.rs`, `safari.rs`, `ios.rs`) which are not available in the current context. The following represents partial analysis based on indirect evidence.

## Architecture Overview

Based on the available context, agent-browser uses a Chrome DevTools Protocol (CDP) based approach for browser automation:

```
┌─────────────────┐     CDP/WebSocket      ┌─────────────────┐
│  agent-browser  │ ──────────────────────▶│  Chrome/Chromium│
│      CLI        │                        │    Browser      │
└─────────────────┘                        └─────────────────┘
        │
        ├── Session Management
        ├── Element Reference System (@e1, @e2, ...)
        └── Command Dispatch
```

## Supported Browser Contexts

| Context Type | Implementation | Protocol |
|--------------|----------------|----------|
| Chrome/Chromium | CDP Native | WebSocket |
| Electron | CDP Native | WebSocket |
| Remote Debugging | `--remote-debugging-port` | CDP |
| Safari (iOS) | WebDriver | W3C WebDriver |

## Session Management

Sessions are managed through port-based connections:

```typescript
// From session-tree.tsx
interface Session {
  port: number;
  session: string;
  provider?: string;
  pending?: boolean;
}
```

Sessions can be connected via:

```bash
agent-browser connect 9222
```

## Command Dispatch Architecture

The CLI uses a dispatch pattern for handling browser commands:

```rust
// From cli/src/native/actions.rs (partial)
match subcmd.as_str() {
    "click" => handle_click(cmd, state).await,
    "fill" => handle_fill(cmd, state).await,
    "snapshot" => handle_snapshot(cmd, state).await,
    "screenshot" => handle_screenshot(cmd, state).await,
    "get" => handle_get(cmd, state).await,
    // ... additional commands
}
```

## Browser Engine Providers

Based on the codebase structure, agent-browser supports multiple browser engine providers:

| Provider | File Reference | Purpose |
|----------|----------------|---------|
| Lightpanda | `lightpanda.rs` | Lightweight browser engine |
| Safari | `safari.rs` | macOS/iOS Safari via WebDriver |
| iOS | `ios.rs` | iOS WebKit via WebDriver |
| Chrome CDP | `discovery.rs` | Auto-discovery of Chrome instances |

## CDP Discovery Mechanism

The `discovery.rs` module handles automatic detection of browser instances:

- Scans for Chrome/Chromium processes
- Identifies remote debugging ports
- Matches browser version compatibility
- Establishes WebSocket connections

## WebDriver Integration

For non-Chrome browsers, WebDriver protocols are used:

```bash
# Safari WebDriver
agent-browser set driver safari

# iOS WebDriver  
agent-browser set driver ios
```

## Session State Management

| State | Description |
|-------|-------------|
| Active | Currently connected and responsive |
| Pending | Connection in progress |
| Closed | Session terminated |

## Command Reference for Engine Interaction

```bash
# Connect to specific port
agent-browser connect <port>

# Session operations
agent-browser session new
agent-browser session list
agent-browser session close

# Engine-specific settings
agent-browser set viewport <width> <height>
agent-browser set device <device-name>
agent-browser set geo <lat> <lng>
agent-browser set offline [on|off]
```

## Limitations

This page cannot provide complete documentation for browser engine integration without access to:

- `cli/src/native/cdp/lightpanda.rs`
- `cli/src/native/cdp/discovery.rs`
- `cli/src/native/webdriver/mod.rs`
- `cli/src/native/webdriver/safari.rs`
- `cli/src/native/webdriver/ios.rs`

These files are required for accurate implementation details about:

- CDP command serialization/deserialization
- WebDriver protocol mapping
- Browser-specific quirks handling
- Session lifecycle management

---

**资料来源:** [cli/src/native/actions.rs:1-20](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
**资料来源:** [packages/dashboard/src/components/session-tree.tsx:1-50](https://github.com/vercel-labs/agent-browser/blob/main/packages/dashboard/src/components/session-tree.tsx)
**资料来源:** [skill-data/core/references/commands.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)

---

<a id='authentication'></a>

## Authentication and Session Persistence

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [cli/src/native/actions.rs](https://github.com/vercel-labs/agent-browser/blob/main/cli/src/native/actions.rs)
- [skill-data/core/references/authentication.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/authentication.md)
- [skill-data/core/references/session-management.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/session-management.md)
- [skill-data/core/SKILL.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/SKILL.md)
- [skill-data/core/references/commands.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/commands.md)
- [skill-data/core/references/snapshot-refs.md](https://github.com/vercel-labs/agent-browser/blob/main/skill-data/core/references/snapshot-refs.md)
</details>

# Authentication and Session Persistence

This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credentials securely, and persist browser sessions across runs.

## Overview

agent-browser provides multiple layers of authentication and session persistence:

1. **Credential Management** — Store and retrieve login credentials via an encrypted auth vault
2. **State Persistence** — Save and restore full browser state (cookies, localStorage, sessionStorage)
3. **Session Management** — Auto-save/restore named sessions without manual file handling
4. **Profile Persistence** — Use Chrome user data directories for full browser profile persistence

These mechanisms layer on top of the core CDP (Chrome DevTools Protocol) browser automation, using the underlying Playwright-managed browser infrastructure to serialize and deserialize authentication artifacts.

资料来源：[cli/src/native/actions.rs:action_dispatch](../blob/main/cli/src/native/actions.rs) (dispatch table)

---

## Architecture

### Credential and State Flow

```mermaid
graph TD
    User[User / Agent] -->|agent-browser auth save| Vault[Auth Vault<br/>~/.agent-browser/vault/]
    User -->|agent-browser state save| StateFile[State File<br/>JSON]
    User -->|--session-name| AutoSave[Auto-Save Location<br/>~/.agent-browser/sessions/]
    User -->|--profile| ChromeProfile[Chrome Profile Dir<br/>User Data Directory]
    
    Vault -->|auth login| Browser[Browser Instance]
    StateFile -->|state load| Browser
    AutoSave -->|auto-restore| Browser
    ChromeProfile -->|attach| Browser
    
    Browser -->|cookies, localStorage| StateFile
    Browser -->|cookies, localStorage| AutoSave
    Browser -->|full state| ChromeProfile
```

### Action Handler Dispatch

The `actions.rs` module dispatches authentication and persistence commands to dedicated handlers:

| Action | Handler | Purpose |
|--------|---------|---------|
| `auth_save` | `handle_auth_save` | Store credentials in vault |
| `auth_list` | `handle_credentials_list` | List saved credentials |
| `auth_delete` | `handle_credentials_delete` | Remove credential |
| `auth_show` | `handle_auth_show` | Display credential details |
| `auth_login` | `handle_auth_login` | Fill + submit login form |
| `state_save` | `handle_state_save` | Serialize browser state to JSON |
| `state_load` | `handle_state_load` | Restore browser state from JSON |
| `state_list` | `handle_state_list` | List saved state files |
| `cookies_get` | `handle_cookies_get` | Read cookies |
| `cookies_set` | `handle_cookies_set` | Write cookies |
| `cookies_clear` | `handle_cookies_clear` | Clear all cookies |
| `storage_get` | `handle_storage_get` | Read localStorage/sessionStorage |
| `storage_set` | `handle_storage_set` | Write to storage |
| `storage_clear` | `handle_storage_clear` | Clear storage |

资料来源：[cli/src/native/actions.rs:action_dispatch](../blob/main/cli/src/native/actions.rs) (dispatch table lines 35-75)

---

## Authentication Vault

The auth vault provides secure credential storage at `~/.agent-browser/vault/`. It encrypts credentials at rest using an optional `AGENT_BROWSER_ENCRYPTION_KEY`.

### Saving Credentials

```bash
agent-browser auth save my-app --url https://app.example.com/login \
  --username user@example.com --password-stdin
# (type password, press Ctrl+D)
```

The vault stores credentials keyed by a friendly name (`my-app` in this example). The `--url` flag associates the credential with a specific login page for targeted retrieval.

### Using Saved Credentials

```bash
agent-browser open https://app.example.com/login
agent-browser auth login my-app
```

The `auth login` command fills the username and password fields, clicks the submit button, and waits for navigation — all automatically using the saved credential set.

资料来源：[skill-data/core/references/authentication.md:auth_vault](../blob/main/skill-data/core/references/authentication.md) (auth vault section)

### Credential Management Commands

| Command | Description |
|---------|-------------|
| `agent-browser auth save <name>` | Save credentials to vault |
| `agent-browser auth login <name>` | Fill + submit login form |
| `agent-browser auth list` | List all stored credentials |
| `agent-browser auth show <name>` | Display credential details |
| `agent-browser auth delete <name>` | Remove credential from vault |

---

## Basic Login Flow

For sites without saved credentials, a manual login flow uses the snapshot-and-interact pattern:

```bash
# 1. Navigate to login page
agent-browser open https://app.example.com/login
agent-browser wait --load networkidle

# 2. Get interactive elements
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Sign In"

# 3. Fill and submit
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --url "**/dashboard"

# 4. Verify success
agent-browser get url  # Should be dashboard, not login
```

The `--load networkidle` wait condition ensures the page is fully loaded before attempting to fill form fields.

资料来源：[skill-data/core/references/authentication.md:basic_login_flow](../blob/main/skill-data/core/references/authentication.md) (basic login flow section)

---

## State Persistence

State persistence serializes the browser's authentication artifacts to a JSON file for reuse across runs.

### State File Format

```json
{
  "cookies": [...],
  "localStorage": {...},
  "sessionStorage": {...},
  "origins": [...]
}
```

The state file contains cookies, Web Storage API data (`localStorage` and `sessionStorage`), and origin permissions.

### Saving State

```bash
# After successful login, save state
agent-browser open https://app.example.com/login
# ... complete login flow ...
agent-browser state save ./auth-state.json
```

### Loading State

```bash
# Restore authenticated session
agent-browser state load ./auth-state.json
agent-browser open https://app.example.com/dashboard
# Already logged in
```

### Inline State Loading

You can also use `--state` to load state at browser launch:

```bash
agent-browser --state ./auth-state.json open https://app.example.com/dashboard
```

资料来源：[skill-data/core/references/session-management.md:state_file_contents](../blob/main/skill-data/core/references/session-management.md) (state file contents section)

### State Management Commands

| Command | Description |
|---------|-------------|
| `agent-browser state save <path>` | Serialize current state to file |
| `agent-browser state load <path>` | Restore state from file |
| `agent-browser state list` | List saved state files |

---

## Session Auto-Persistence

Named sessions (`--session-name`) provide automatic save/restore without explicit file handling. State is persisted to `~/.agent-browser/sessions/<name>.json` and restored automatically on subsequent runs.

### Basic Usage

```bash
# First run: login once
agent-browser --session-name twitter open https://twitter.com
# ... complete login flow ...
agent-browser close  # state saved automatically

# Subsequent runs: state auto-restored
agent-browser --session-name twitter open https://twitter.com
# Already authenticated
```

### Session Environment Variable

```bash
export AGENT_BROWSER_SESSION_NAME=my-app
agent-browser open https://app.example.com
# State auto-saved on close, auto-restored on launch
```

### Session vs State File Comparison

| Feature | `--session-name` | `--state <file>` |
|---------|------------------|------------------|
| File management | Automatic | Manual |
| Location | `~/.agent-browser/sessions/` | User-specified |
| Reuse across machines | No (local path) | Yes (if file is shared) |
| Cleanup | `agent-browser close` | Manual `rm` |

---

## Chrome Profile Persistence

For full browser profile persistence (cookies, IndexedDB, service workers, cache, extensions), use `--profile` to point agent-browser at a Chrome user data directory.

```bash
# First run: login once
agent-browser --profile ~/.myapp-profile open https://app.example.com/login
# ... complete login flow ...

# All subsequent runs: already authenticated
agent-browser --profile ~/.myapp-profile open https://app.example.com/dashboard
```

Different profiles isolate authentication between projects or test users:

```bash
agent-browser --profile ~/.profiles/admin open https://app.example.com
agent-browser --profile ~/.profiles/viewer open https://app.example.com
```

Or set via environment variable:

```bash
export AGENT_BROWSER_PROFILE=~/.myapp-profile
agent-browser open https://app.example.com/dashboard
```

Profile persistence is the most complete form of state preservation, but the profile directory can grow large due to cache and IndexedDB storage.

资料来源：[skill-data/core/references/authentication.md:persistent_profiles](../blob/main/skill-data/core/references/authentication.md) (persistent profiles section)

---

## OAuth and SSO Flows

OAuth flows require handling browser redirects between the app and the identity provider:

```bash
# Start OAuth flow
agent-browser open https://app.example.com/auth/google

# Wait for redirect to Google
agent-browser wait --url "**/accounts.google.com**"
agent-browser snapshot -i

# Fill Google credentials
agent-browser fill @e1 "user@gmail.com"
agent-browser click @e2  # Next button
agent-browser wait 2000
agent-browser snapshot -i
agent-browser fill @e3 "password"
agent-browser click @e4  # Sign in

# Wait for redirect back
agent-browser wait --url "**/app.example.com**"

# Save authenticated state
agent-browser state save ./oauth-state.json
```

The `wait --url` pattern is essential for OAuth flows, as it pauses execution until the expected redirect occurs.

资料来源：[skill-data/core/references/authentication.md:oauth_sso_flows](../blob/main/skill-data/core/references/authentication.md) (OAuth / SSO flows section)

---

## Two-Factor Authentication

For 2FA flows, use `--headed` mode to show the browser window so the user can enter the code:

```bash
# Login with credentials
agent-browser open https://app.example.com/login --headed
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3

# Wait for user to complete 2FA manually (browser is visible)
echo "Complete 2FA in the browser window..."
agent-browser wait --url "**/dashboard" --timeout 120000

# Save state after 2FA
agent-browser state save ./2fa-state.json
```

The `--timeout 120000` gives the user up to 2 minutes to complete the 2FA challenge.

资料来源：[skill-data/core/references/authentication.md:two_factor_authentication](../blob/main/skill-data/core/references/authentication.md) (two-factor authentication section)

---

## Cookie-Based Authentication

### Reading Cookies

```bash
agent-browser cookies get
```

### Setting Cookies Manually

```bash
agent-browser cookies set session_token "abc123xyz" --domain "app.example.com"
agent-browser open https://app.example.com/dashboard
```

### Importing Cookies from cURL

Import cookies exported from browser DevTools:

```bash
agent-browser cookies set --curl cookies.txt --domain example.com
```

The `--curl` flag auto-detects JSON, cURL dump, or bare Cookie header formats.

### Clearing Cookies

```bash
agent-browser cookies clear
```

---

## Web Storage API

### Reading Storage

```bash
agent-browser storage get --type localStorage
agent-browser storage get --type sessionStorage
```

### Setting Storage Values

```bash
agent-browser storage set --type localStorage --key "auth_token" --value "xyz789"
```

### Clearing Storage

```bash
agent-browser storage clear --type localStorage
agent-browser storage clear --type sessionStorage
```

---

## Authentication Reuse Pattern

A common pattern saves login state once and reuses it across multiple runs:

```bash
#!/bin/bash
STATE_FILE="/tmp/auth-state.json"

if [[ -f "$STATE_FILE" ]]; then
    agent-browser state load "$STATE_FILE"
    agent-browser open https://app.example.com/dashboard
else
    # Perform login
    agent-browser open https://app.example.com/login
    agent-browser snapshot -i
    agent-browser fill @e1 "$USERNAME"
    agent-browser fill @e2 "$PASSWORD"
    agent-browser click @e3
    agent-browser wait --load networkidle

    # Save for future use
    agent-browser state save "$STATE_FILE"
fi
```

This script checks for existing state, loads it if present, otherwise performs login and saves the state.

资料来源：[skill-data/core/references/session-management.md:authenticated_session_reuse](../blob/main/skill-data/core/references/session-management.md) (authenticated session reuse section)

---

## Best Practices

### 1. Name Sessions Semantically

```bash
# Good: Clear purpose
agent-browser --session github-auth open https://github.com
agent-browser --session docs-scrape open https://docs.example.com

# Avoid: Generic names
agent-browser --session s1 open https://github.com
```

### 2. Always Clean Up

```bash
# Close sessions when done
agent-browser --session auth close
agent-browser --session scrape close
```

### 3. Handle State Files Securely

```bash
# Don't commit state files (contain auth tokens!)
echo "*.auth-state.json" >> .gitignore

# Delete after use
rm /tmp/auth-state.json
```

### 4. Use Auth Vault for Credentials

Store credentials in the vault instead of hardcoding in scripts:

```bash
# Secure
agent-browser auth login my-app

# Insecure (leaks to shell history)
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
```

### 5. Timeout Long Sessions

```bash
# Set timeout for automated scripts
timeout 60 agent-browser --session long-task get text body
```

资料来源：[skill-data/core/references/session-management.md:best_practices](../blob/main/skill-data/core/references/session-management.md) (best practices section)

---

## Environment Variables

| Variable | Description |
|----------|-------------|
| `AGENT_BROWSER_SESSION_NAME` | Default session name for auto-persistence |
| `AGENT_BROWSER_PROFILE` | Default Chrome profile directory |
| `AGENT_BROWSER_ENCRYPTION_KEY` | Encryption key for auth vault (32-byte hex) |

---

## Quick Reference

```bash
# Save credentials to vault
agent-browser auth save myapp --url https://app.example.com/login --username user@example.com --password-stdin

# Use saved credentials
agent-browser open https://app.example.com/login
agent-browser auth login myapp

# Save/restore state
agent-browser state save ./auth.json
agent-browser state load ./auth.json

# Auto-persist session
agent-browser --session-name myapp open https://app.example.com
# State auto-saved on close, auto-restored on next run

# Full profile persistence
agent-browser --profile ~/.myapp-profile open https://app.example.com

---

---

## Doramagic 踩坑日志

项目：vercel-labs/agent-browser

摘要：发现 38 个潜在踩坑项，其中 7 个为 high/blocking；最高优先级：安装坑 - 来源证据：Chrome 147.0 crashes with "trap int3" when running in docker。

## 1. 安装坑 · 来源证据：Chrome 147.0 crashes with "trap int3" when running in docker

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Chrome 147.0 crashes with "trap int3" when running in docker
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_9045278ef5e043dcadccf9288477813c | https://github.com/vercel-labs/agent-browser/issues/1339 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 2. 安装坑 · 来源证据：Detected: Trojan:Win32/Posilod.EB!cl

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Detected: Trojan:Win32/Posilod.EB!cl
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_edbde732c7a7410e96ad0fa301e4222d | https://github.com/vercel-labs/agent-browser/issues/1281 | 来源讨论提到 windows 相关条件，需在安装/试用前复核。

## 3. 配置坑 · 来源证据：snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ab39b89d157047e897e771d2572dfcdd | https://github.com/vercel-labs/agent-browser/issues/1338 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 4. 运行坑 · 来源证据：Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_0896b0b429c641f0b93ca9dcbbee6db8 | https://github.com/vercel-labs/agent-browser/issues/1319 | 来源讨论提到 macos 相关条件，需在安装/试用前复核。

## 5. 安全/权限坑 · 失败模式：security_permissions: Dashboard privileged POST routes should reject cross-origin requests

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Dashboard privileged POST routes should reject cross-origin requests
- 对用户的影响：Developers may expose sensitive permissions or credentials: Dashboard privileged POST routes should reject cross-origin requests
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Dashboard privileged POST routes should reject cross-origin requests. Context: Source discussion did not expose a precise runtime context.
- 防护动作：Do not recommend enabling privileged or credential-bearing paths until the source-backed risk is reviewed: https://github.com/vercel-labs/agent-browser/issues/1345
- 证据：failure_mode_cluster:github_issue | fmev_bc39fa851aecda51d6ae79863b570093 | https://github.com/vercel-labs/agent-browser/issues/1345 | Dashboard privileged POST routes should reject cross-origin requests

## 6. 安全/权限坑 · 失败模式：security_permissions: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission
- 对用户的影响：Developers may expose sensitive permissions or credentials: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission. Context: Source discussion did not expose a precise runtime context.
- 防护动作：Do not recommend enabling privileged or credential-bearing paths until the source-backed risk is reviewed: https://github.com/vercel-labs/agent-browser/issues/1365
- 证据：failure_mode_cluster:github_issue | fmev_50f6336937705c962c78ed48a466eb98 | https://github.com/vercel-labs/agent-browser/issues/1365 | `--auto-connect` fails too quickly when Chrome asks for remote debugging permission

## 7. 安全/权限坑 · 来源证据：Support XDG Base Directory paths for agent-browser state, config, and installs

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Support XDG Base Directory paths for agent-browser state, config, and installs
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_a2b8bb7035dd44e0a9e97dc78186f3b2 | https://github.com/vercel-labs/agent-browser/issues/1361 | 来源讨论提到 linux 相关条件，需在安装/试用前复核。

## 8. 安装坑 · 失败模式：installation: After failed close, subsequent open reports success but returns stale content from prior URL

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: After failed close, subsequent open reports success but returns stale content from prior URL
- 对用户的影响：Developers may fail before the first successful local run: After failed close, subsequent open reports success but returns stale content from prior URL
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: After failed close, subsequent open reports success but returns stale content from prior URL. Context: Observed when using node, python, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_fce1ca55e45e13ba327a52473c958037 | https://github.com/vercel-labs/agent-browser/issues/1367 | After failed close, subsequent open reports success but returns stale content from prior URL

## 9. 安装坑 · 失败模式：installation: Chrome 147.0 crashes with "trap int3" when running in docker

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Chrome 147.0 crashes with "trap int3" when running in docker
- 对用户的影响：Developers may fail before the first successful local run: Chrome 147.0 crashes with "trap int3" when running in docker
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Chrome 147.0 crashes with "trap int3" when running in docker. Context: Observed when using docker, windows, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_de7dc45e4f45905d10cb44680cd26da5 | https://github.com/vercel-labs/agent-browser/issues/1339 | Chrome 147.0 crashes with "trap int3" when running in docker, failure_mode_cluster:github_issue | fmev_e97d2c4c42c663165c2763023d5d79e3 | https://github.com/vercel-labs/agent-browser/issues/1339 | Chrome 147.0 crashes with "trap int3" when running in docker

## 10. 安装坑 · 失败模式：installation: Detected: Trojan:Win32/Posilod.EB!cl

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Detected: Trojan:Win32/Posilod.EB!cl
- 对用户的影响：Developers may fail before the first successful local run: Detected: Trojan:Win32/Posilod.EB!cl
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Detected: Trojan:Win32/Posilod.EB!cl. Context: Observed when using windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_11d6daa01783b3f8d6cc4984b34591d9 | https://github.com/vercel-labs/agent-browser/issues/1281 | Detected: Trojan:Win32/Posilod.EB!cl

## 11. 安装坑 · 失败模式：installation: Feature: `network throttle` for emulating slow connections / per-URL delay

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Feature: `network throttle` for emulating slow connections / per-URL delay
- 对用户的影响：Developers may fail before the first successful local run: Feature: `network throttle` for emulating slow connections / per-URL delay
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature: `network throttle` for emulating slow connections / per-URL delay. Context: Observed during installation or first-run setup.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_af068ec0790d0398008062aef7b5d1a5 | https://github.com/vercel-labs/agent-browser/issues/1372 | Feature: `network throttle` for emulating slow connections / per-URL delay

## 12. 安装坑 · 失败模式：installation: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills
- 对用户的影响：Developers may fail before the first successful local run: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills. Context: Observed when using node, playwright, windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_1ea0ed85aeff64de383d8fa15586474d | https://github.com/vercel-labs/agent-browser/issues/1351 | High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills

## 13. 安装坑 · 失败模式：installation: Support XDG Base Directory paths for agent-browser state, config, and installs

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Support XDG Base Directory paths for agent-browser state, config, and installs
- 对用户的影响：Developers may fail before the first successful local run: Support XDG Base Directory paths for agent-browser state, config, and installs
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Support XDG Base Directory paths for agent-browser state, config, and installs. Context: Observed when using linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_5bd8394953d7b9c8f00eade661671801 | https://github.com/vercel-labs/agent-browser/issues/1361 | Support XDG Base Directory paths for agent-browser state, config, and installs

## 14. 安装坑 · 失败模式：installation: Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File...

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1)
- 对用户的影响：Developers may fail before the first successful local run: Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1)
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1). Context: Observed when using node, python, windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_8b48c64a7c8bd1d363fa81928818b489 | https://github.com/vercel-labs/agent-browser/issues/1348 | Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1)

## 15. 安装坑 · 失败模式：installation: v0.27.0

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: v0.27.0
- 对用户的影响：Upgrade or migration may change expected behavior: v0.27.0
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: v0.27.0. Context: Observed when using node
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_release | fmev_c5cd290a3adea233428e19624c61cbc4 | https://github.com/vercel-labs/agent-browser/releases/tag/v0.27.0 | v0.27.0

## 16. 安装坑 · 来源证据：After failed close, subsequent open reports success but returns stale content from prior URL

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：After failed close, subsequent open reports success but returns stale content from prior URL
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_20b2fb27e3744303957ee3b14657c6fb | https://github.com/vercel-labs/agent-browser/issues/1367 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 17. 安装坑 · 来源证据：Feature: `network throttle` for emulating slow connections / per-URL delay

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Feature: `network throttle` for emulating slow connections / per-URL delay
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_36122bb7094447a7a9b2239bd9c771d7 | https://github.com/vercel-labs/agent-browser/issues/1372 | 来源类型 github_issue 暴露的待验证使用条件。

## 18. 配置坑 · 失败模式：configuration: ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash. Context: Observed when using linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_781294606dea03b40a16f8364175701c | https://github.com/vercel-labs/agent-browser/issues/1349 | ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash

## 19. 配置坑 · 失败模式：configuration: Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile. Context: Observed when using macos
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_74962e644ba9c7d489e3bdece3e2a4fc | https://github.com/vercel-labs/agent-browser/issues/1371 | Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile

## 20. 配置坑 · 失败模式：configuration: Per-session /api/command should require same-origin or token auth

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Per-session /api/command should require same-origin or token auth
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Per-session /api/command should require same-origin or token auth
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Per-session /api/command should require same-origin or token auth. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_6007658713bbd7305ceaadde537b784e | https://github.com/vercel-labs/agent-browser/issues/1344 | Per-session /api/command should require same-origin or token auth

## 21. 配置坑 · 失败模式：configuration: Support enabling WebAuthn for passkey authentication with a virtual authenticator

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Support enabling WebAuthn for passkey authentication with a virtual authenticator
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Support enabling WebAuthn for passkey authentication with a virtual authenticator
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Support enabling WebAuthn for passkey authentication with a virtual authenticator. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_6d37cbd32f509cb166bf7a0928a6b0b6 | https://github.com/vercel-labs/agent-browser/issues/688 | Support enabling WebAuthn for passkey authentication with a virtual authenticator

## 22. 配置坑 · 来源证据：ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：ERR_NO_SUPPORTED_PROXIES when proxy environment variables contain trailing slash
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_32ddde976ec0445da607d0adffc5df4c | https://github.com/vercel-labs/agent-browser/issues/1349 | 来源讨论提到 linux 相关条件，需在安装/试用前复核。

## 23. 配置坑 · 来源证据：Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Orphaned headless Chrome Helpers spin at high CPU under agent-browser-chrome temp profile
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_1f2a7d9ece1a4bb7bd3d903998370d73 | https://github.com/vercel-labs/agent-browser/issues/1371 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 24. 配置坑 · 来源证据：`--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：`--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_6ba5505f7fb14ad7a3bc2b5b88a0b59b | https://github.com/vercel-labs/agent-browser/issues/1373 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 25. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | README/documentation is current enough for a first validation pass.

## 26. 运行坑 · 失败模式：runtime: `--cdp` eval/open silently target a secondary execution context when Chromium DevTools is ope...

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: `--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target
- 对用户的影响：Developers may hit a documented source-backed failure mode: `--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: `--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target. Context: Observed when using python, macos
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_373b5edde1d9ebc3a89e256d7531b186 | https://github.com/vercel-labs/agent-browser/issues/1373 | `--cdp` eval/open silently target a secondary execution context when Chromium DevTools is open on the same target

## 27. 维护坑 · 失败模式：migration: Harden inspect-mode DevTools WebSocket handshakes

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this migration risk before relying on the project: Harden inspect-mode DevTools WebSocket handshakes
- 对用户的影响：Developers may hit a documented source-backed failure mode: Harden inspect-mode DevTools WebSocket handshakes
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Harden inspect-mode DevTools WebSocket handshakes. Context: Observed during version upgrade or migration.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_f6db656b77e427d890ba72a1ff380949 | https://github.com/vercel-labs/agent-browser/issues/1347 | Harden inspect-mode DevTools WebSocket handshakes

## 28. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | last_activity_observed missing

## 29. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | no_demo; severity=medium

## 30. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | no_demo; severity=medium

## 31. 安全/权限坑 · 来源证据：Dashboard privileged POST routes should reject cross-origin requests

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Dashboard privileged POST routes should reject cross-origin requests
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_6acd97eb554140c28938a0eb08e44c34 | https://github.com/vercel-labs/agent-browser/issues/1345 | 来源类型 github_issue 暴露的待验证使用条件。

## 32. 安全/权限坑 · 来源证据：Harden inspect-mode DevTools WebSocket handshakes

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Harden inspect-mode DevTools WebSocket handshakes
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ab6c062eedaf466d8f40864ca24bf8ea | https://github.com/vercel-labs/agent-browser/issues/1347 | 来源类型 github_issue 暴露的待验证使用条件。

## 33. 安全/权限坑 · 来源证据：High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_648ff78f18f34d51a44b9176d011738f | https://github.com/vercel-labs/agent-browser/issues/1351 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 34. 安全/权限坑 · 来源证据：Support enabling WebAuthn for passkey authentication with a virtual authenticator

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Support enabling WebAuthn for passkey authentication with a virtual authenticator
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_3a4a36591a7e45c1b85d35b020e63d5a | https://github.com/vercel-labs/agent-browser/issues/688 | 来源类型 github_issue 暴露的待验证使用条件。

## 35. 安全/权限坑 · 来源证据：Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1)

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Windows 11: --headed not surfacing window when invoked from non-TTY context (PowerShell -File via bash 2>&1)
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_fda08c46f8b5454e8e93b061d6d3c992 | https://github.com/vercel-labs/agent-browser/issues/1348 | 来源讨论提到 npm 相关条件，需在安装/试用前复核。

## 36. 安全/权限坑 · 来源证据：`--auto-connect` fails too quickly when Chrome asks for remote debugging permission

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：`--auto-connect` fails too quickly when Chrome asks for remote debugging permission
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_21be9bb1198543e1839dd312b41a3f3c | https://github.com/vercel-labs/agent-browser/issues/1365 | 来源类型 github_issue 暴露的待验证使用条件。

## 37. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | issue_or_pr_quality=unknown

## 38. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:1132001614 | https://github.com/vercel-labs/agent-browser | release_recency=unknown

<!-- canonical_name: vercel-labs/agent-browser; human_manual_source: deepwiki_human_wiki -->
