agent-browser Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

agent-browser

Agent Browser serves as a bridge between AI agents and web browsers, enabling autonomous web navigation, interaction, and data extraction. It is compatible with a wide range of AI agent pl...

Introduction to Agent Browser

Related topics: Installation Guide, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Accessibility-Tree Snapshots

Continue reading this section for the full explanation and source context.

Section Element Reference Notation

Continue reading this section for the full explanation and source context.

Section Navigation Commands

Continue reading this section for the full explanation and source context.

Related topics: Installation Guide, Architecture Overview

Introduction to Agent Browser

Agent Browser is a high-performance, native Rust CLI tool designed for browser automation and AI agent integration. Unlike traditional browser automation frameworks that rely on Node.js wrappers or third-party libraries, Agent Browser communicates directly with Chrome/Chromium via the Chrome DevTools Protocol (CDP), providing a lightweight and reliable solution for web interaction tasks.

Overview

Agent Browser serves as a bridge between AI agents and web browsers, enabling autonomous web navigation, interaction, and data extraction. It is compatible with a wide range of AI agent platforms including Cursor, Claude Code, Codex, Continue, and Windsurf.

Aspect	Description
Language	Rust (native CLI)
Protocol	Chrome DevTools Protocol (CDP)
Dependencies	No Playwright or Puppeteer dependency
Platform	Chrome/Chromium
License	See repository LICENSE

Sources: skills/agent-browser/SKILL.md

Architecture

Agent Browser follows a modular architecture with distinct layers for CLI handling, native browser control, and extensible skills.

graph TD
    A[User / AI Agent] --> B[CLI Layer<br/>Rust Commands]
    B --> C[Native Actions Layer<br/>CDP Dispatcher]
    C --> D[Chrome/Chromium<br/>via CDP]
    
    E[Skills System] --> B
    E --> F[Core Skills]
    E --> G[Specialized Skills]
    
    G --> G1[Electron Apps]
    G --> G2[Slack Workspace]
    G --> G3[Exploratory Testing]
    G --> G4[Cloud Providers]
    
    H[Session Management] --> C
    H --> H1[Auth Vault]
    H --> H2[State Persistence]
    H --> H3[Video Recording]

Sources: skill-data/core/SKILL.md, skills/agent-browser/SKILL.md

Core Concepts

Accessibility-Tree Snapshots

Agent Browser generates accessibility-tree snapshots that provide structured, human-readable representations of web pages. Each interactive element receives a unique reference ID (e.g., @e1, @e2) that can be used for subsequent interactions.

Example snapshot output:

Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"

Sources: skill-data/core/references/snapshot-refs.md, skill-data/core/SKILL.md

Element Reference Notation

Element references follow a consistent notation pattern:

@e1 [tag attribute="value"] "text content" placeholder="hint"

Component	Description
`@e1`	Unique reference ID
`tag`	HTML tag name
`attribute="value"`	Key attributes
`"text content"`	Visible text
`placeholder="hint"`	Additional attributes

Sources: skill-data/core/references/snapshot-refs.md

Command Reference

Command	Description
`agent-browser open [url]`	Launch browser with optional navigation
`agent-browser back`	Navigate backward
`agent-browser forward`	Navigate forward
`agent-browser reload`	Reload current page
`agent-browser close`	Close browser
`agent-browser connect <port>`	Connect to existing browser via CDP

Sources: skill-data/core/references/commands.md

Interaction Commands

Command	Description
`agent-browser click <ref>`	Click an element
`agent-browser fill <ref> <text>`	Type text into input
`agent-browser select <ref> <value>`	Select dropdown option
`agent-browser check <ref>`	Check a checkbox
`agent-browser scroll <direction> <pixels>`	Scroll page

Sources: cli/src/native/actions.rs

Data Retrieval Commands

Command	Description
`agent-browser snapshot [-i]`	Get page snapshot (interactive only with `-i`)
`agent-browser screenshot [path]`	Capture screenshot
`agent-browser get text <ref>`	Get visible text
`agent-browser get attr <ref> <name>`	Get attribute value
`agent-browser get url`	Get current URL
`agent-browser get title`	Get page title

Sources: cli/src/output.rs, cli/src/native/actions.rs

Network Control Commands

Command	Description
`agent-browser network route <url>`	Intercept network request
`agent-browser network unroute <url>`	Remove interception
`agent-browser network requests [--clear]`	View/clear network requests
`agent-browser network har <start\	stop> [path]`	Capture HAR file

Sources: skill-data/core/references/commands.md, cli/src/output.rs

agent-browser cookies get           # View all cookies
agent-browser cookies set --url <url> --name <name> --value <val>
agent-browser cookies clear         # Clear all cookies
agent-browser storage local         # Manage localStorage
agent-browser storage session       # Manage sessionStorage

Sources: cli/src/output.rs

Browser Settings Commands

Command	Description
`agent-browser set viewport <w> <h>`	Set viewport size
`agent-browser set device <name>`	Emulate device
`agent-browser set geo <lat> <lng>`	Set geolocation
`agent-browser set offline on\	off`	Toggle offline mode
`agent-browser set headers <json>`	Set custom headers
`agent-browser set media dark\	light`	Set color scheme

Sources: cli/src/output.rs

Sessions and State Management

Agent Browser supports multiple concurrent browser sessions with state persistence.

graph LR
    A[Session A] --> B[State File A]
    C[Session B] --> D[State File B]
    E[Auth Vault] --> A
    E[Auth Vault] --> C

Key Features:

Named Sessions: --session <name> flag for multiple sessions
State Persistence: Save and restore browser state
Auth Vault: Secure credential storage
Video Recording: Capture browser activity

Sources: skill-data/core/SKILL.md, skills/agent-browser/SKILL.md

Skills System

Agent Browser uses an extensible skills system that provides specialized workflows for different environments.

Core Skills

agent-browser skills get core             # Core workflows and common patterns
agent-browser skills get core --full      # Include full command reference

Specialized Skills

Skill	Description	Command
Electron	Desktop app automation	`agent-browser skills get electron`
Slack	Workspace automation	`agent-browser skills get slack`
Dogfood	Exploratory testing/QA	`agent-browser skills get dogfood`
Vercel Sandbox	Cloud browser in microVMs	`agent-browser skills get vercel-sandbox`
AgentCore	AWS Bedrock cloud browsers	`agent-browser skills get agentcore`

Sources: skills/agent-browser/SKILL.md

React Developer Tools Integration

Agent Browser includes built-in React DevTools support for debugging React applications:

Command	Description
`agent-browser react_tree`	View React component tree
`agent-browser react_inspect`	Inspect component props/state
`agent-browser react_renders_start`	Track render counts
`agent-browser react_renders_stop`	Stop render tracking

Sources: cli/src/native/actions.rs, cli/src/react/suspense.rs

Suspense Boundary Analysis

Agent Browser can analyze React Suspense boundaries with actionability scoring:

Blocker Kind	Weight	Actionability
ClientHook	7	90%
RequestApi	6	88%
ServerFetch	5	82%
Cache	4	74%
Stream	3	60%
Unknown	2	35%
Framework	1	18%

Sources: cli/src/react/suspense.rs

Dashboard Interface

Agent Browser includes a web-based dashboard for visual browser management:

graph TD
    A[Dashboard] --> B[Controls Panel]
    A --> C[Result Panel]
    A --> D[Network Panel]
    A --> E[Extensions Panel]
    
    B --> B1[URL Input]
    B --> B2[Mode Selector]
    B --> B3[Action Controls]
    
    C --> C1[Screenshot View]
    C --> C2[Snapshot View]
    C --> C3[Step History]
    
    D --> D1[Request List]
    D --> D2[HAR Export]
    
    E --> E1[Extension List]
    E --> E2[Extension Details]

The dashboard is built with React and supports:

Resizable panels for flexible layouts
Theme switching (light/dark)
Mobile-responsive design
Real-time step history

Sources: examples/environments/app/page.tsx, packages/dashboard/src/components/network-panel.tsx, packages/dashboard/src/components/extensions-panel.tsx

Best Practices

1. Always Snapshot Before Interacting

# CORRECT - Snapshot first to get refs
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs first
agent-browser click @e1            # Use ref

# WRONG - Ref doesn't exist yet
agent-browser open https://example.com
agent-browser click @e1            # Will fail!

Element references change when the page navigates. Always take a new snapshot after clicking links or navigating to new pages.

3. Use Sessions for Complex Workflows

agent-browser --session my-session open https://example.com
agent-browser --session my-session snapshot -i
# ... perform actions ...
agent-browser --session my-session close

Sources: skill-data/core/references/snapshot-refs.md

Installation and Setup

Prerequisites

Chrome or Chromium browser installed
Operating system: macOS, Linux, or Windows

Installation

Refer to the repository's installation instructions for your platform. Agent Browser is distributed as a native binary with no runtime dependencies.

Configuration Files

File	Purpose
`~/.agent-browser/`	Default config directory
Sessions	Stored in config directory
Auth Vault	Encrypted credential storage

Sources: AGENTS.md

Summary

Agent Browser provides a powerful, efficient, and AI-agent-friendly approach to browser automation. Its key differentiators include:

Native Rust implementation for high performance
Direct CDP communication without third-party dependencies
Accessibility-tree snapshots for reliable element targeting
Session management for complex multi-step workflows
Extensible skills system for specialized environments
Built-in React DevTools integration for debugging

These features make Agent Browser an ideal choice for AI agents, automated testing pipelines, and developer workflows requiring precise browser control.

Source: https://github.com/vercel-labs/agent-browser / Human Manual

Installation Guide

Related topics: Introduction to Agent Browser

Section Related Pages

Continue reading this section for the full explanation and source context.

Section System Requirements

Continue reading this section for the full explanation and source context.

Section Required Dependencies

Continue reading this section for the full explanation and source context.

Section Method 1: npm Package Installation (Recommended)

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Agent Browser

Installation Guide

Overview

The agent-browser project is a native Rust CLI tool designed for browser automation, providing AI agents with reliable web interaction capabilities. Unlike traditional browser automation tools that rely on Node.js wrappers, agent-browser delivers a fast, lightweight solution built directly in Rust with Chrome/Chromium support via Chrome DevTools Protocol (CDP). The installation process handles downloading the necessary Chrome browser binaries, setting up platform-specific binaries, and configuring dependencies for the dashboard UI.

Sources: AGENTS.md

Prerequisites

System Requirements

Before installing agent-browser, ensure your system meets the following requirements:

Requirement	Details
Operating System	macOS, Linux, or Windows (7 platform binaries built)
Chrome/Chromium	Required for browser automation functionality
Rust Toolchain	Required for building from source
Node.js/pnpm	Required for dashboard development

The project builds all 7 platform binaries during CI/CD, including native binaries for different architectures. Chrome is downloaded directly from Chrome for Testing during the installation process, eliminating the need for system-installed Chrome browsers.

Sources: AGENTS.md

Required Dependencies

Dependency	Purpose	Installation Method
Chrome/Chromium	Browser automation target	Auto-downloaded via `install` command
Cargo/Rust	Building CLI from source	rustup.rs
pnpm	Dashboard package management	`npm install -g pnpm`

Installation Methods

Method 1: npm Package Installation (Recommended)

The recommended installation method uses the npm registry for cross-platform compatibility:

npm install -g @agent-browser/cli

After installation, you must run the setup command to download Chrome binaries:

agent-browser install

Sources: skills/agent-browser/SKILL.md

Method 2: Building from Source

For development or customization, build the CLI from source:

# Clone the repository
git clone https://github.com/vercel-labs/agent-browser.git
cd agent-browser

# Install dependencies and build
cd cli && cargo build --release

The Rust codebase architecture follows a modular structure:

    A[cli/src/native/] --> B[daemon/]
    A --> C[actions/]
    A --> D[browser/]
    A --> E[CDP client/]
    A --> F[snapshot/]
    A --> G[state/]

The --engine flag allows selecting between Chrome and Lightpanda browser engines, providing flexibility in automation scenarios.

Sources: AGENTS.md

Method 3: Docker Installation

For containerized environments, Docker builds are supported:

# Build from the project's Dockerfile
docker build -t agent-browser -f docker/Dockerfile.build .

Docker installation is particularly useful for CI/CD pipelines and reproducible automation environments where system dependencies need to be isolated.

Post-Installation Setup

Chrome Binary Download

After installing the CLI package, you must download the Chrome binary:

agent-browser install

This command retrieves Chrome directly from Chrome for Testing, ensuring a compatible and up-to-date browser binary is available for all automation tasks. The --download-path flag can specify a custom location:

agent-browser --download-path /custom/path install

Sources: cli/src/flags.rs:45-49

Verifying Installation

Verify the installation by checking the version and available commands:

agent-browser --version
agent-browser --help

The CLI provides comprehensive command documentation through the help system:

Command	Description
`agent-browser open <url>`	Open a URL in the browser
`agent-browser snapshot`	Capture accessibility tree with element refs
`agent-browser click @<ref>`	Click element by reference
`agent-browser skills get <name>`	Retrieve skill documentation
`agent-browser install`	Download Chrome binaries

Sources: cli/src/output.rs

Skill Documentation Loading

Agent-browser uses a skill-based documentation system that loads content dynamically based on the installed version:

# Load core workflows and common patterns
agent-browser skills get core

# Include full command reference and templates
agent-browser skills get core --full

# List all available skills
agent-browser skills list

Available specialized skills:

Skill	Purpose
`electron`	Electron desktop apps (VS Code, Slack, Discord, Figma)
`slack`	Slack workspace automation
`dogfood`	Exploratory testing and QA
`vercel-sandbox`	Agent-browser inside Vercel Sandbox microVMs
`agentcore`	AWS Bedrock AgentCore cloud browsers

Sources: skills/agent-browser/SKILL.md

Platform-Specific Considerations

macOS

On macOS, if you encounter security prompts about unsigned applications, you may need to allow the application in System Preferences > Security & Privacy, or run:

xattr -d com.apple.quarantine /path/to/agent-browser

Linux

Linux distributions require WebKit/GTK dependencies for Chrome. Install via your package manager:

# Debian/Ubuntu
sudo apt-get install libgtk-3-0 libnss3

# Fedora
sudo dnf install gtk3 nss

Windows

Windows installations automatically configure the required runtime dependencies. Ensure Windows Subsystem for Linux (WSL) compatibility if running in hybrid environments.

Running Tests

After installation, verify the setup by running the test suite:

# Unit tests (fast, no Chrome required)
cd cli && cargo test

# End-to-end tests (requires Chrome installed)
cd cli && cargo test e2e -- --ignored --test-threads=1

The project contains approximately 320 unit tests and 18 e2e tests. E2E tests launch real headless Chrome instances and must run serially to avoid instance contention.

Sources: AGENTS.md

Troubleshooting

Chrome Download Failures

If the install command fails to download Chrome:

Check network connectivity to Chrome for Testing
Verify write permissions to the download directory
Use --download-path to specify an alternative location with proper permissions

Permission Denied Errors

Ensure the agent-browser binary has execute permissions:

chmod +x /path/to/agent-browser

Engine Selection

If Chrome automation fails, try specifying the engine explicitly:

agent-browser --engine chrome open https://example.com

The --engine flag supports Chrome (default) and Lightpanda engines for different automation scenarios.

Next Steps

After successful installation:

Load core skill documentation: agent-browser skills get core --full
Open a test URL: agent-browser open https://example.com
Capture a snapshot: agent-browser snapshot -i
Explore specialized skills for your use case

Sources: skills/agent-browser/SKILL.md

Sources: AGENTS.md

Element References System

Related topics: State Inspection Commands, Interaction Commands

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Reference Components

Continue reading this section for the full explanation and source context.

Section Option Reference

Continue reading this section for the full explanation and source context.

Section Direct Element Commands

Continue reading this section for the full explanation and source context.

Element References System

The Element References System is a core mechanism in agent-browser that provides stable, human-readable identifiers for DOM elements during browser automation tasks. Instead of relying on fragile CSS selectors or XPath expressions, the system assigns unique reference IDs (such as @e1, @e2) that persist across page states and can be used reliably in subsequent automation commands.

Overview

Element references serve as the primary interface between automation scripts and the browser's accessibility tree. When a snapshot is taken, each interactive element receives a reference ID that can be used in commands like click, fill, type, and get without requiring re-selection.

graph TD
    A[Browser Page] --> B[snapshot Command]
    B --> C[Accessibility Tree Traversal]
    C --> D[Element Identification]
    D --> E[Reference Assignment]
    E --> F[@e1 @e2 @e3 ...]
    F --> G[Automation Commands]
    G --> H[click @e1]
    G --> I[fill @e2]
    G --> J[get text @e3]

Reference Notation Format

Element references follow a standardized notation format that encodes element metadata:

@e1 [tag type="value"] "text content" placeholder="hint"
│    │   │             │               │
│    │   │             │               └─ Additional attributes
│    │   │             └─ Visible text
│    │   └─ Key attributes shown
│    └─ HTML tag name
└─ Unique ref ID

Sources: skill-data/core/references/snapshot-refs.md

Reference Components

Component	Description	Example
`@eN`	Unique reference identifier	`@e1`, `@e42`
Tag	HTML element type	`button`, `input`, `link`
Type attribute	Element type classification	`type="email"`, `type="password"`
Text content	Visible text on element	`"Submit"`, `"Log in"`
Placeholder	Input placeholder text	`placeholder="Email"`

Common Reference Patterns

The snapshot system recognizes common element patterns and standardizes their reference notation:

@e1 [button] "Submit"                    # Button with text
@e2 [input type="email"]                 # Email input
@e3 [input type="password"]              # Password input
@e4 [a href="/page"] "Link Text"         # Anchor link
@e5 [select]                             # Dropdown
@e6 [textarea] placeholder="Message"     # Text area
@e7 [div class="modal"]                  # Container element
@e8 [img alt="Logo"]                     # Image with alt text
@e9 [checkbox] checked                   # Checked checkbox
@e10 [radio] selected                    # Selected radio button

Sources: skill-data/core/references/snapshot-refs.md

Snapshot Command Options

The snapshot command generates element references with various filtering and formatting options:

agent-browser snapshot                    # Full tree (verbose)
agent-browser snapshot -i                 # Interactive elements only (preferred)
agent-browser snapshot -i -u              # Include href URLs on links
agent-browser snapshot -i -c              # Compact mode (no empty structural nodes)
agent-browser snapshot -i -d 3            # Cap depth at 3 levels
agent-browser snapshot -s "#main"         # Scope to a CSS selector
agent-browser snapshot -i --json          # Machine-readable output

Sources: skill-data/core/SKILL.md

Option Reference

Option	Purpose	Use Case
`-i`	Interactive elements only	Preferred for automation
`-u`	Include href URLs	When link destinations matter
`-c`	Compact output	Complex pages with many empty nodes
`-d N`	Depth limit	Focus on specific page sections
`-s SELECTOR`	CSS scope	Target specific page regions
`--json`	JSON format	Programmatic processing

Element Reference Commands

Element references are used with various commands to interact with page elements:

Direct Element Commands

agent-browser click @e1                   # Click element
agent-browser click @e1 --new-tab          # Click and open in new tab
agent-browser fill @e2 "text"             # Fill input field
agent-browser type @e2 "text"             # Type character by character
agent-browser press Enter                 # Press key on focused element

State Inspection Commands

agent-browser get text @e1                # Get visible text
agent-browser get html @e1                # Get innerHTML
agent-browser get attr @e1 href           # Get specific attribute
agent-browser get value @e1               # Get input value
agent-browser get title                   # Get page title
agent-browser get url                     # Get current URL
agent-browser get count ".item"           # Count matching elements

State Checking Commands

The is command verifies element states:

agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1

Sources: cli/src/output.rs

Find Command and Locators

The find command provides an alternative to snapshot-based reference acquisition by locating elements using various criteria:

agent-browser find <locator> <value> <action> [text]

Supported Locators

Locator	Description	Example
`role`	ARIA role selector	`find role button click`
`text`	Text content match	`find text "Submit" click`
`label`	Label text association	`find label "Email" fill`
`placeholder`	Placeholder attribute	`find placeholder "Search"`
`alt`	Alt text (images)	`find alt "Logo" click`
`title`	Title attribute	`find title "Help" click`
`testid`	Test identifier	`find testid "submit-btn" click`
`first`	First matching selector	`find first button click`
`last`	Last matching selector	`find last link click`
`nth`	Nth matching element	`find nth 5 button click`

Sources: cli/src/commands.rs

Find Command Options

Option	Purpose
`--exact`	Perform exact string matching
`--name <name>`	Filter by accessible name (role locator)

Action Dispatch System

Element reference commands are dispatched to handlers through the action routing system:

graph LR
    A[Command Input] --> B["dispatch(\"click\", state)"]
    B --> C{Match Action}
    C -->|click| D[handle_click]
    C -->|fill| E[handle_fill]
    C -->|get| F[handle_get]
    C -->|is| G[handle_is]
    C -->|find| H[handle_find]

The action router maps command strings to their respective handlers in the native daemon:

"click" => handle_dispatch(cmd, state).await,
"fill" => handle_dispatch(cmd, state).await,
"get" => handle_dispatch(cmd, state).await,
"is" => handle_dispatch(cmd, state).await,
"find" => handle_dispatch(cmd, state).await,

Sources: cli/src/native/actions.rs

Available Element Actions

Action	Handler	Purpose
`click`	`handle_dispatch`	Mouse click
`fill`	`handle_dispatch`	Fill input with text
`type`	`handle_dispatch`	Character-by-character typing
`press`	`handle_dispatch`	Keyboard press
`hover`	`handle_dispatch`	Mouse hover
`select`	`handle_dispatch`	Select dropdown option
`check`	`handle_dispatch`	Check checkbox/radio
`uncheck`	`handle_dispatch`	Uncheck checkbox
`focus`	`handle_dispatch`	Focus element
`blur`	`handle_dispatch`	Blur element

Iframe Support

Element references automatically handle iframe content. When a snapshot is taken, iframe elements are resolved and their child accessibility trees are included inline:

agent-browser snapshot -i
# Output:
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
#   @e3 [input] "Card number"
#   @e4 [input] "Expiry"
#   @e5 [button] "Pay"
# @e6 [button] "Cancel"

References to elements inside iframes carry frame context, allowing direct interactions without manual frame switching:

agent-browser click @e3                    # Works inside iframe
agent-browser fill @e4 "12/25"

Sources: skill-data/core/references/snapshot-refs.md

Best Practices

Always Snapshot Before Interacting

# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs first
agent-browser click @e1            # Use ref

# WRONG
agent-browser open https://example.com
agent-browser click @e1            # Ref doesn't exist yet!

agent-browser click @e5            # Navigates to new page
agent-browser snapshot -i          # Get new refs
agent-browser click @e1            # Use new refs

Re-Snapshot After Dynamic Changes

agent-browser click @e1            # Opens dropdown
agent-browser snapshot -i          # See dropdown items
agent-browser click @e7            # Select item

Snapshot Specific Regions

For complex pages, snapshot specific areas to reduce noise:

# Snapshot just a form
agent-browser snapshot @e9

Session-Dependent References

Element references are session-dependent and may vary between browser sessions. The same element on the same page might receive different reference IDs in different sessions:

Element	Typical Ref Range	How to Find
Home tab	e10-e20	`snapshot -i \	grep "Home"`
DMs tab	e10-e20	`snapshot -i \	grep "DMs"`
Activity tab	e10-e20	`snapshot -i \	grep "Activity"`
Search	e5-e10	`snapshot -i \	grep "Search"`
More unreads	e20-e30	`snapshot -i \	grep "More unreads"`
Channel refs	e30+	`snapshot -i \	grep "channel-name"`

Sources: skill-data/slack/references/slack-tasks.md

Architecture Summary

graph TD
    subgraph "CLI Layer"
        A[User Command] --> B[commands.rs Parser]
        B --> C[Command Dispatch]
    end
    
    subgraph "Native Daemon"
        C --> D[actions.rs Router]
        D --> E[State Manager]
        E --> F[CDP Client]
    end
    
    subgraph "Browser Layer"
        F --> G[Chrome DevTools Protocol]
        G --> H[Accessibility Tree]
    end
    
    subgraph "Reference Generation"
        H --> I[Element ID Assignment]
        I --> J[@eN Reference Labels]
    end
    
    J --> K[Snapshot Output]
    K --> L[Automation Commands]

The Element References System provides the foundation for reliable browser automation by abstracting DOM complexity behind human-readable identifiers that remain stable across page states and navigation events.

Sources: skill-data/core/references/snapshot-refs.md

Architecture Overview

Related topics: Daemon and CDP Protocol, Introduction to Agent Browser

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Daemon Architecture

Continue reading this section for the full explanation and source context.

Section Action Dispatch System

Continue reading this section for the full explanation and source context.

Section CDP Client Layer

Continue reading this section for the full explanation and source context.

Architecture Overview

agent-browser is a Rust-based browser automation framework that provides high-performance browser control through native CDP (Chrome DevTools Protocol) communication. The system is designed for AI agent integration, enabling reliable and observable browser automation.

System Architecture

The architecture follows a layered approach with clear separation between the CLI interface, daemon process, and browser engine.

graph TB
    subgraph "Client Layer"
        CLI[CLI Interface]
        Dashboard[Web Dashboard]
    end

    subgraph "Daemon Layer"
        WS[WebSocket Server]
        Dispatcher[Action Dispatcher]
        State[State Manager]
    end

    subgraph "CDP Layer"
        CDP[CDP Client]
        Protocol[Protocol Handler]
    end

    subgraph "Browser Engine"
        Chrome[Chrome/Chromium]
        Lightpanda[Lightpanda]
    end

    CLI --> WS
    Dashboard --> WS
    WS --> Dispatcher
    Dispatcher --> CDP
    CDP --> Chrome
    CDP --> Lightpanda
    Dispatcher --> State

Core Components

Daemon Architecture

The browser automation daemon is the central coordinator that manages browser sessions and handles command dispatching. It runs as a persistent process that maintains browser state across multiple operations.

Key Responsibilities:

Component	Responsibility
WebSocket Server	Accepts client connections with origin validation
Action Dispatcher	Routes commands to appropriate handlers
State Manager	Maintains session state and snapshots
CDP Client	Manages protocol-level communication

Sources: cli/src/native/mod.rs

Action Dispatch System

The action system provides a comprehensive set of browser automation commands. Actions are dispatched based on command type and handle specific browser operations.

Action Categories:

Category	Commands
Navigation	`goto`, `back`, `forward`, `reload`, `waitforurl`, `waitforloadstate`
Interaction	`click`, `fill`, `press`, `select`, `check`, `uncheck`, `multiselect`
Content	`snapshot`, `innertext`, `innerhtml`, `gettext`, `getattribute`
State	`cookies_get`, `cookies_set`, `storage_get`, `storage_set`
Network	`route`, `unroute`, `requests`, `har`
React Debug	`react_tree`, `react_inspect`, `react_renders_start`

Sources: cli/src/native/actions.rs:1-50

CDP Client Layer

The CDP (Chrome DevTools Protocol) client handles low-level communication with the browser engine. This abstraction allows the system to work with different browser engines through a unified interface.

Supported Engines:

Engine	Selection Flag
Chrome/Chromium	`--engine chrome` (default)
Lightpanda	`--engine lightpanda`

Sources: cli/src/native/mod.rs

Communication Protocol

WebSocket Server

The daemon exposes a WebSocket server for client communication. Security is enforced through origin validation.

graph LR
    Client[Client App] -->|WebSocket| OriginCheck[Origin Check]
    OriginCheck -->|Allowed| Accept[Accept Connection]
    OriginCheck -->|Blocked| Reject[403 Forbidden]

Origin Validation:

The server validates the Origin header on incoming WebSocket requests. Connections from disallowed origins receive a 403 Forbidden response before any data exchange occurs.

if !is_allowed_origin(origin.as_deref()) {
    return Err(reject); // Status: FORBIDDEN
}

Sources: cli/src/native/stream/websocket.rs:15-30

Request/Response Flow

All commands follow a request-response pattern:

Client sends JSON command via WebSocket
Server validates origin
Dispatcher routes to appropriate handler
Handler executes CDP operation
Result returned as JSON response

State Management

Session State

The daemon maintains persistent state for each browser session:

State Component	Description
Tabs	Active tab list and current tab reference
Frame	Current frame hierarchy
Viewport	Window dimensions
Recording	Video recording status

Sources: cli/src/native/stream/websocket.rs:5-15

Snapshot System

The snapshot system provides accessibility-tree based page representation with stable element references (@e1, @e2, etc.) for reliable element selection across page mutations.

Best Practice: Always snapshot before interacting with elements, as refs change after navigation or dynamic content changes.

Sources: skill-data/core/references/snapshot-refs.md

React Inspection System

For React-based applications, the daemon provides specialized inspection capabilities:

Blocker Detection

The system identifies React Suspense boundaries and classifies them by impact:

Blocker Kind	Weight	Actionability
ClientHook	7	90
RequestApi	6	88
ServerFetch	5	82
Cache	4	74
Stream	3	60
Unknown	2	35
Framework	1	18

Boundary Classification

Boundary Kind	Description
RouteSegment	Next.js/App Router segment boundary
ExplicitSuspense	User-declared `<Suspense>` component
Component	Implicit boundary from component structure

Sources: cli/src/native/react/suspense.rs:30-60

CLI Architecture

The CLI provides both interactive and scripted access to browser automation:

Command Structure

agent-browser <command> [args]

Primary Command Groups:

Group	Purpose
`agent-browser open`	Navigate to URL
`agent-browser <action>`	Execute automation action
`agent-browser set`	Configure browser settings
`agent-browser network`	Manage network interception
`agent-browser state`	Save/load/restore sessions
`agent-browser tab`	Manage browser tabs
`agent-browser screenshot`	Capture page images
`agent-browser install`	Download Chrome

Sources: cli/src/output.rs

Dashboard Architecture

The web-based dashboard provides visual monitoring and control:

graph TD
    Dashboard[Dashboard App] -->|API| Daemon
    Dashboard -->|Display| Results[screenshots/snapshots]
    Dashboard -->|Controls| Form[Control Form]

Dashboard Features:

Resizable split view (controls + results)
Responsive layout for mobile/desktop
Real-time screenshot display with base64 encoding
Snapshot viewer with step history
Step-by-step playback of automation sequences

Sources: packages/dashboard/src/components/extensions-panel.tsx

Installation and Dependencies

Chrome Installation

The install command downloads Chrome directly from Chrome for Testing:

agent-browser install

This ensures the Chrome binary is available for CDP communication without requiring system-wide Chrome installation.

Testing Architecture

Unit Tests

Fast tests (~320) that verify individual components without Chrome dependency:

cd cli && cargo test

End-to-End Tests

Integration tests that launch real headless Chrome:

cd cli && cargo test e2e -- --ignored --test-threads=1

Requirements:

Chrome must be installed
Tests run serially to avoid browser instance contention

Security Considerations

Aspect	Implementation
Origin Validation	WebSocket connections validated before acceptance
Session Isolation	Each session maintains separate state
Credential Storage	Authentication vault for secure credential handling

Summary

agent-browser implements a clean three-tier architecture:

Client Layer - CLI and dashboard provide user interfaces
Daemon Layer - Rust-based server handles command dispatch and state
CDP Layer - Browser-agnostic protocol client enables Chrome/Lightpanda support

The design prioritizes reliability (stable element refs), observability (snapshots, screenshots, video recording), and extensibility (skill-based system for specialized automation tasks).

Sources: cli/src/native/mod.rs

Daemon and CDP Protocol

Related topics: Architecture Overview, Browser Engine Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Native Daemon Components

Continue reading this section for the full explanation and source context.

Section Action Dispatch

Continue reading this section for the full explanation and source context.

Section Browser Engine Selection

Continue reading this section for the full explanation and source context.

Daemon and CDP Protocol

Overview

The agent-browser project implements a native Rust-based browser automation daemon that communicates with Chrome/Chromium browsers via the Chrome DevTools Protocol (CDP). The architecture separates the automation logic from browser control through WebSocket-based CDP connections, enabling AI agents to interact with web pages through a CLI interface.

Architecture Layer Diagram:

graph TD
    A[CLI Interface] --> B[Action Dispatcher]
    B --> C[CDP Client]
    C --> D[WebSocket Stream]
    D --> E[CDP Loop Handler]
    E --> F[Chrome Browser Instance]
    
    G[CDP Protocol Files] --> F
    H[Generated CDP Types] --> C

Daemon Architecture

Native Daemon Components

The daemon lives in cli/src/native/ and handles all browser automation tasks. The main components include:

Component	Location	Purpose
Daemon	`cli/src/native/daemon/`	Process management and state coordination
Actions	`cli/src/native/actions.rs`	Command handlers for browser operations
Browser	`cli/src/native/browser/`	Browser instance lifecycle
CDP Client	`cli/src/native/cdp/client.rs`	Protocol communication
CDP Loop	`cli/src/native/stream/cdp_loop.rs`	Message processing loop

Sources: cli/src/native/actions.rs

Action Dispatch

The action handler maps command names to their implementation functions. Supported actions include:

let result = match action {
    "launch" => handle_launch(cmd, state).await,
    "navigate" => handle_navigate(cmd, state).await,
    "url" => handle_url(state).await,
    "cdp_url" => handle_cdp_url(state),
    "inspect" => handle_inspect(state).await,
    "title" => handle_title(state).await,
    "content" => handle_content(state).await,
    "evaluate" => handle_evaluate(cmd, state).await,
    "close" => handle_close(state).await,
    "snapshot" => handle_snapshot(cmd, state).await,
    "screenshot" => handle_screenshot(cmd, state).await,
    "click" => handle_click(cmd, state).await,
    "dblclick" => handle_dblclick(cmd, state).await,
    "fill" => handle_fill(cmd, state).await,
    "type" => handle_type(cmd, state).await,
    "press" => handle_press(cmd, state).await,
    "hover" => handle_hover(cmd, state).await,
    "scroll" => handle_scroll(cmd, state).await,
    // ... additional actions
};

Sources: cli/src/native/actions.rs:50-75

Browser Engine Selection

The --engine flag selects between Chrome and Lightpanda browsers. Chrome is downloaded from Chrome for Testing via the install command.

CDP Protocol Implementation

Protocol Files

The CDP protocol definitions are stored in JSON format:

File	Description
`browser_protocol.json`	Core browser domains (Page, Network, Runtime, etc.)
`js_protocol.json`	JavaScript debugging domains

Sources: cli/cdp-protocol/browser_protocol.json

Auto-Generated Types

CDP types are auto-generated from protocol JSON files:

/// Auto-generated CDP types from protocol JSON files in `cdp-protocol/`.
///
/// To populate: download `browser_protocol.json` and `js_protocol.json` from
/// <https://github.com/nicolo-ribaudo/nicolo-ribaudo.github.io/> (or any
/// Chromium source) into `cli/cdp-protocol/` and rebuild.
#[allow(clippy::upper_case_acronyms)]
pub mod generated {
    include!(concat!(env!("OUT_DIR"), "/cdp_generated.rs"));
}

Sources: cli/src/native/cdp/types.rs

CDP Client Structure

The CDP client manages communication with the browser:

graph LR
    A[Command] --> B[CDP Client]
    B --> C[WebSocket Writer]
    C --> D[Browser CDP Endpoint]
    
    E[Browser Events] --> F[WebSocket Reader]
    F --> G[Event Handler]
    G --> H[State Updates]

WebSocket Communication

Stream Module Architecture

The WebSocket communication is handled by the stream module located in cli/src/native/stream/:

Module	File	Purpose
Stream Core	`cli/src/native/stream/mod.rs`	Stream trait definitions and utilities
WebSocket	`cli/src/native/stream/websocket.rs`	WebSocket connection handling
CDP Loop	`cli/src/native/stream/cdp_loop.rs`	CDP message processing loop

WebSocket Connection

The WebSocket module establishes and maintains connections to the Chrome DevTools endpoint:

sequenceDiagram
    participant CLI as CLI Command
    participant Client as CDP Client
    participant WS as WebSocket
    participant Chrome as Chrome Browser
    
    CLI->>Client: connect(url)
    Client->>WS: establish_connection()
    WS->>Chrome: WebSocket Handshake
    Chrome-->>WS: 101 Switching Protocols
    WS-->>Client: Connected
    
    loop Message Exchange
        CLI->>Client: send_command()
        Client->>WS: write_message()
        WS->>Chrome: CDP JSON Message
        Chrome-->>WS: CDP Response/Event
        WS-->>Client: read_message()
        Client-->>CLI: Result
    end

CDP Loop Handler

The CDP loop processes incoming messages and manages the event queue:

Handles CDP events from the browser
Routes responses to pending command callbacks
Manages connection state and reconnection logic

Sources: cli/src/native/stream/cdp_loop.rs

Browser Connection

Connection Methods

The daemon supports multiple connection methods:

Method	Command	Use Case
Launch new browser	`agent-browser open`	Fresh browser instance
Connect to existing	`agent-browser connect 9222`	Attach to running browser

# Launch with navigation
agent-browser open <url>

# Connect to running browser on specific port
agent-browser connect 9222

# Launch without navigation (clean slate)
agent-browser open

CDP WebSocket URL

The CDP WebSocket URL can be retrieved programmatically:

agent-browser cdp_url

This returns the WebSocket debugger URL for programmatic browser attachment.

Browser Version Info

The connection retrieves browser metadata:

#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct BrowserVersionInfo {
    #[serde(rename = "webSocketDebuggerUrl")]
    pub web_socket_debugger_url: Option<String>,
    #[serde(rename = "Browser")]
    pub browser: Option<String>,
}

Sources: cli/src/native/cdp/types.rs

CDP Protocol Domains

Supported Domains

The agent-browser supports CDP domains for:

Domain	Purpose	Key Commands
Page	Page navigation and loading	navigate, reload, back, forward
Runtime	JavaScript execution	evaluate, callFunctionOn
DOM	DOM manipulation	getDocument, describeNode
Input	User input simulation	dispatchEvent, insertText
Network	Network request interception	setRequestInterception, getResponseBody
Target	Browser target management	createTarget, attachToTarget

Browser Automation Actions

The following high-level actions are available via CDP:

# Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload

# DOM Interaction
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e3 "input"
agent-browser hover @e4
agent-browser scroll down 500

# State Queries
agent-browser snapshot
agent-browser screenshot
agent-browser get text @e1
agent-browser get attr @e1 href

# JavaScript
agent-browser evaluate "document.title"

Error Handling

WebDriver Fallback

The daemon gracefully handles unsupported actions when using WebDriver backend:

Err(anyhow::anyhow!(
    "Action '{}' is not supported on the WebDriver backend",
    action
))

CDP Error Propagation

CDP errors are propagated through the action chain, enabling detailed error messages for debugging failed browser operations.

Performance Considerations

Session Management

Each browser session maintains a persistent CDP connection
Sessions can be named and persisted for multi-session workflows
State persistence allows resuming automation tasks

Network Idle Detection

The daemon supports waiting for network idle states:

agent-browser wait --load networkidle

This is essential for SPAs and applications with dynamic content loading.

Security Model

Credential Management

The daemon provides a secure credential vault for browser authentication:

agent-browser set credentials <user> <pass>

Cookies can be set from various formats:

agent-browser cookies set --curl <file> [--domain <host>]

Auto-detects JSON, cURL, and Cookie-header file formats.

Extension Points

Custom CDP Scripts

Execute arbitrary JavaScript in the browser context:

agent-browser addscript <script>
agent-browser addinitscript <script>

Custom Styles

Inject CSS for visual testing:

agent-browser addstyle <css>

Summary

The Daemon and CDP Protocol architecture enables agent-browser to provide a performant, Rust-native browser automation solution. By implementing direct CDP communication over WebSockets, the project avoids dependencies on Node.js wrappers like Playwright or Puppeteer while maintaining full compatibility with Chrome's DevTools Protocol capabilities.

The separation of concerns between the action dispatcher, CDP client, and WebSocket stream layers ensures maintainability and enables future extensions for additional browser engines and protocol features.

Sources: cli/src/native/actions.rs

Navigation Commands

Related topics: Interaction Commands, State Inspection Commands

Section Related Pages

Continue reading this section for the full explanation and source context.

Section open — Launch Browser

Continue reading this section for the full explanation and source context.

Section goto / navigate — Navigate to URL

Continue reading this section for the full explanation and source context.

Section pushstate — SPA Client-side Navigation

Continue reading this section for the full explanation and source context.

Navigation Commands in agent-browser provide the fundamental mechanisms for controlling browser state, page loading, and session management. These commands enable AI agents and automated scripts to interact with web pages by controlling navigation flow, managing browser windows, and handling page lifecycle events.

The Navigation Commands subsystem handles all operations related to:

Browser Launch and Shutdown — Initialize and terminate browser instances
Page Navigation — Navigate to URLs, handle history traversal, and manage SPA routing
Session Management — Connect to existing browser instances via CDP protocol
Pre-navigation Setup — Configure browser state before initial page load

graph TD
    A[User Command] --> B{Command Type}
    B -->|open/goto/navigate| C[Parse URL & Flags]
    B -->|back/forward/reload| D[History Action]
    B -->|pushstate| E[SPA Navigation]
    B -->|connect| F[CDP Connection]
    B -->|close| G[Cleanup Session]
    
    C --> H{URL Protocol?}
    H -->|http/https| I[Direct Navigation]
    H -->|about/data/file| I
    H -->|none specified| J[Prepend https://]
    
    I --> K[Execute Navigation]
    D --> L[Browser History API]
    E --> M[History PushState + Events]
    F --> N[Remote CDP Session]
    G --> O[Close All Tabs/Session]
    
    K --> P[Return Result JSON]
    L --> P
    M --> P
    N --> P
    O --> P

Launches a new browser instance. When called without a URL, it opens about:blank and allows staging browser state before the first navigation.

Usage:

agent-browser open
agent-browser open <url>

Variant	Behavior
`open` (no args)	Launch on about:blank; allows `network route`, `cookies set`, or `addinitscript` before first navigation
`open <url>`	Launch and immediately navigate to the specified URL

URL Auto-prepend Logic:

The CLI automatically prepends https:// if no protocol is specified. Supported protocols include:

Protocol	Example
`https://`	`https://example.com`
`http://`	`http://localhost:3000`
`about:`	`about:blank`, `about:version`
`data:`	`data:text/html,<h1>Hello</h1>`
`file://`	`file:///path/to/page.html`
`chrome-extension://`	`chrome-extension://...`
`chrome://`	`chrome://version`

let url_lower = url.to_lowercase();
let url = if url_lower.starts_with("http://")
    || url_lower.starts_with("https://")
    || url_lower.starts_with("about:")
    || url_lower.starts_with("data:")
    || url_lower.starts_with("file:")
    || url_lower.starts_with("chrome-extension://")
    || url_lower.starts_with("chrome://")
{
    url.to_string()
} else {
    format!("https://{}", url)
};

Sources: cli/src/commands.rs:35-50

Aliases for navigation to a specific page. Both commands require a URL argument.

agent-browser goto https://example.com
agent-browser navigate example.com  # auto-prepends https://

Sources: cli/src/commands.rs:25-30

Performs client-side navigation in Single Page Applications (SPA) using history.pushState. This command triggers the appropriate navigation events that modern frameworks like Next.js rely on.

agent-browser pushstate <url>

Behavior:

Calls history.pushState with the target URL
Dispatches popstate and navigate events
Auto-detects window.next.router.push for Next.js applications and triggers RSC fetch

agent-browser pushstate /dashboard
agent-browser pushstate /products/123

Navigates the current tab backward in browser history.

agent-browser back

Navigates the current tab forward in browser history.

agent-browser forward

Reloads the current page, respecting cache settings.

agent-browser reload

Closes the browser instance and terminates the session.

agent-browser close
agent-browser close --all

Flag	Behavior
(default)	Close current session
`--all`	Close all browser sessions

Connects to an existing browser instance via Chrome DevTools Protocol (CDP) port.

agent-browser connect <port>

agent-browser connect 9222  # Connect to browser on port 9222

For scenarios requiring state staging before the first navigation (e.g., blocking scripts, setting cookies), agent-browser supports batch operations:

agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'

This pattern:

Opens browser on about:blank
Registers a network route to abort all script resources
Sets cookies from a curl-format cookie file
Navigates to the target URL

Sources: skill-data/core/references/commands.md:18-26

The command dispatch system maps command strings to handler functions:

"open" | "goto" | "navigate" => handle_navigation(cmd, rest, flags, state).await,
"back" => handle_back(cmd, state).await,
"forward" => handle_forward(cmd, state).await,
"reload" => handle_reload(cmd, state).await,
"pushstate" => handle_pushstate(cmd, rest, state).await,
"close" | "quit" | "exit" => handle_close(cmd, rest, state).await,
"connect" => handle_connect(cmd, rest, state).await,

Sources: cli/src/native/actions.rs:30-45

All navigation commands return a JSON response indicating success or failure:

Success Response:

{
  "id": "session-id",
  "action": "navigate",
  "url": "https://example.com"
}

Error Response (Missing URL):

{
  "error": "MissingArguments",
  "context": "goto",
  "message": "Expected URL argument"
}

Flag	Applies To	Purpose
`--headed`	`open`	Launch browser in headed (visible) mode
`--wait-until <event>`	`goto`, `navigate`, `open`	Wait for navigation event (load, domcontentloaded, networkidle)
`--provider <name>`	All navigation	Specify CDP provider (e.g., vercel-sandbox)
`--session <name>`	All commands	Use a specific named session

agent-browser open --headed
agent-browser goto https://example.com --wait-until networkidle

agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser back

# Block all third-party scripts
agent-browser open
agent-browser network route "*" --abort --resource-type script
agent-browser goto https://example.com

agent-browser open https://my-nextjs-app.com
agent-browser click @e5  # Navigate to another route
agent-browser pushstate /new-route  # Trigger client-side navigation
agent-browser snapshot -i

Navigation commands operate within the context of a session. Each session can contain multiple tabs:

Command	Purpose
`tab new [url]`	Open a new tab
`tab list`	List all open tabs
`tab <n>`	Switch to tab by index
`tab close`	Close current tab

agent-browser tab new
agent-browser tab new https://example.com
agent-browser tab 2
agent-browser tab close

The navigation handler in cli/src/commands.rs performs the following steps:

Argument Extraction — Scans command arguments for the first non-flag value as URL
Protocol Validation — Checks if URL starts with a supported protocol scheme
Auto-prepend — Adds https:// prefix if no protocol detected
Command Construction — Builds JSON command payload with action type and URL

sequenceDiagram
    User->>CLI: agent-browser goto example.com
    CLI->>Parser: Parse "goto" command
    Parser->>URL Validator: Check "example.com"
    URL Validator->>URL Validator: No protocol prefix
    URL Validator-->>Parser: Prepend https://
    Parser->>Builder: Build navigate command
    Builder->>Browser: Execute navigation
    Browser-->>User: JSON response

Command Category	Commands
State Inspection	`snapshot`, `screenshot`, `get`
Element Interaction	`click`, `fill`, `type`, `press`
Network Control	`network route`, `cookies`, `storage`
Browser Settings	`set viewport`, `set geo`, `set offline`

Sources: cli/src/commands.rs:35-50

Interaction Commands

Related topics: Navigation Commands, State Inspection Commands, Element References System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Element Selection Commands

Continue reading this section for the full explanation and source context.

Section Mouse Commands

Continue reading this section for the full explanation and source context.

Interaction Commands

Interaction Commands are the core primitives that enable AI agents to programmatically control and manipulate web pages in the agent-browser system. These commands provide atomic operations for clicking elements, entering text, scrolling, and capturing page state through an accessibility-tree based reference system.

Architecture Overview

The interaction system follows a command dispatch pattern where incoming commands are routed to appropriate handlers based on their operation type. The architecture separates concerns between command parsing, execution, and output formatting.

graph TD
    A[User/Agent Input] --> B[Command Parser]
    B --> C[actions.rs Dispatcher]
    C --> D[interaction.rs Handlers]
    D --> E[CDP Protocol Layer]
    E --> F[Browser Engine]
    F --> G[Page Response]
    G --> H[output.rs Formatter]
    H --> I[Terminal/Agent]
    
    C -.->|click, fill, type, scroll| D
    C -.->|mouse, keyboard| D
    C -.->|snapshot, screenshot| D

Component Responsibilities

Component	File	Purpose
Command Dispatcher	`actions.rs`	Routes commands to handlers
Interaction Handlers	`interaction.rs`	Executes atomic browser operations
Output Formatter	`output.rs`	Formats and presents results
CDP Layer	Native	Chrome DevTools Protocol communication

Element Reference System

Interaction commands use an element reference system (@e1, @e2, etc.) to identify targets on the page. These references are obtained through snapshot operations and represent unique identifiers in the accessibility tree.

graph LR
    A[Page HTML] --> B[Accessibility Tree]
    B --> C[Snapshot Command]
    C --> D[@e1 button "Submit"]
    C --> E[@e2 input "Email"]
    D --> F[Click @e1]
    E --> G[Fill @e2 "text"]

Reference Format:

@e1 [tag type="value"] "text content" placeholder="hint"
│    │   │             │               │
│    │   │             │               └─ Additional attributes
│    │   │             └─ Visible text
│    │   └─ Key attributes shown
│    └─ HTML tag name
└─ Unique ref ID

Sources: skill-data/core/references/snapshot-refs.md:1-50

Core Interaction Commands

Element Selection Commands

Command	Description	Parameters
`find`	Find elements by locator	`<locator> <value> [action] [text]`
`count`	Count matching elements	`<selector>`
`is`	Check element state	`<what> <selector>`

Locators supported: role, text, label, placeholder, alt, title, testid, first, last, nth

Sources: cli/src/output.rs:1-20

Mouse Commands

graph TD
    A[mouse] --> B[move <x> <y>]
    A --> C[down <btn>]
    A --> D[up <btn>]
    A --> E[wheel <dy> <dx>]
    
    B --> F[Dispatch mousemove event]
    C --> G[Dispatch mousedown event]
    D --> H[Dispatch mouseup event]
    E --> I[Dispatch wheel event]

Command	Description
`mouse move <x> <y>`	Move cursor to coordinates
`mouse down [btn]`	Press mouse button (default: left)
`mouse up [btn]`	Release mouse button
`mouse wheel <dy> [dx]`	Scroll wheel (delta Y/X)

Sources: cli/src/native/actions.rs:1-30

Keyboard Commands

Command	Description	Example
`type`	Type text (with key events)	`type @e1 "hello"`
`press`	Press special key	`press Enter`
`setvalue`	Set input value directly	`setvalue @e1 "value"`

Special Keys: Enter, Tab, Escape, Backspace, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, F1-F12, Control, Alt, Shift

Sources: cli/src/native/actions.rs:1-30

Scroll Commands

Command	Description
`scroll down <px>`	Scroll down by pixels
`scroll up <px>`	Scroll up by pixels
`scroll left <px>`	Scroll left by pixels
`scroll right <px>`	Scroll right by pixels

Sources: skill-data/core/SKILL.md:1-50

State Inspection Commands

graph TD
    A[get command] --> B{Property Type}
    B -->|attr| C[Get attribute value]
    B -->|value| D[Get input value]
    B -->|text| E[Get visible text]
    B -->|html| F[Get innerHTML]
    B -->|title| G[Get page title]
    B -->|url| H[Get current URL]
    B -->|box| I[Get bounding box]
    B -->|styles| J[Get computed styles]

Command	Description
`get text <ref>`	Get visible text of element
`get value <ref>`	Get input field value
`get attr <ref> <name>`	Get specific attribute
`get html <ref>`	Get innerHTML
`get title`	Get page title
`get url`	Get current URL
`get box <ref>`	Get bounding box coordinates
`get styles <ref>`	Get computed CSS styles
`get cdp-url`	Get CDP debugging URL

Sources: cli/src/output.rs:1-20

Click Variations

The click command supports several modifiers for different interaction patterns:

Command	Description
`click <ref>`	Standard left-click
`click <ref> --new-tab`	Click and open in new tab
`click <ref> --double`	Double-click
`click <ref> --right`	Right-click (context menu)
`tap <ref>`	Mobile-style tap (touch events)

Sources: skill-data/core/SKILL.md:1-50

Form Input Commands

Text Input

graph LR
    A[Input Commands] --> B[type]
    A --> C[fill]
    A --> D[setvalue]
    
    B --> E[Triggers keydown/keyup]
    C --> F[Direct value set]
    D --> G[Direct value assignment]

Command	Description	Behavior
`fill <ref> <text>`	Fill input field	Replaces existing value, triggers input events
`type <ref> <text>`	Type text character by character	Triggers full key event sequence
`setvalue <ref> <value>`	Set value directly	Bypasses sanitization

Sources: cli/src/native/actions.rs:1-30

Other Input Types

Command	Target	Description
`check <ref>`	Checkbox	Check a checkbox
`uncheck <ref>`	Checkbox	Uncheck a checkbox
`select <ref> <value>`	Select	Select option by value
`upload <ref> <path>`	File input	Upload file

Sources: cli/src/native/actions.rs:1-30

Wait and Timing

Wait commands control execution timing for dynamic content:

Command	Description
`wait <ms>`	Wait for milliseconds
`wait --load`	Wait for page load event
`wait networkidle`	Wait for network to be idle
`wait --load networkidle`	Combined load + network idle

Sources: skill-data/core/SKILL.md:1-50

Command Chaining with Batches

Multiple commands can be executed in a single batch operation for efficiency:

graph TD
    A[Batch Command] --> B[Parse JSON Array]
    B --> C[Execute Sequentially]
    C --> D[Command 1]
    D --> E[Command 2]
    E --> F[Command N]
    F --> G[Return Combined Results]

Example batch command:

agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'

Sources: skill-data/core/references/commands.md:1-30

State Management

Browser State Commands

Command	Description
`is <state> <ref>`	Check if element is `visible`, `enabled`, `checked`
`is open`	Check if browser is open
`is closed`	Check if browser is closed

Visibility and Enabled States

graph TD
    A[Check State] --> B{Element Type}
    B -->|Button/Input| C[Check: enabled]
    B -->|Checkbox| D[Check: checked]
    B -->|Any| E[Check: visible]
    
    C --> F[Return boolean]
    D --> F
    E --> F

Sources: cli/src/output.rs:1-20

Advanced Interactions

React-Specific Commands

For React applications, specialized inspection commands are available:

Command	Description
`react_tree`	Get component tree
`react_inspect <ref>`	Inspect React component
`react_renders_start`	Start render tracking
`react_renders_stop`	Stop render tracking

Sources: cli/src/native/actions.rs:1-30

Dialog Handling

graph TD
    A[Dialog Appears] --> B{dialog type}
    B -->|alert| C[handle_alert]
    B -->|confirm| D[handle_confirm]
    B -->|prompt| E[handle_prompt]
    
    C --> F[dialog accept --message "text"]
    D --> F
    E --> G[dialog accept "input"]
    G --> F

Command	Description
`dialog accept [message]`	Accept dialog with optional message
`dialog dismiss`	Cancel/dismiss dialog

Sources: cli/src/native/actions.rs:1-30

Common Workflow Patterns

# 1. Open page
agent-browser open https://example.com

# 2. Take snapshot to get refs
agent-browser snapshot -i

# 3. Interact with elements
agent-browser click @e1
agent-browser fill @e2 "[email protected]"
agent-browser press Enter

# 4. Wait for response
agent-browser wait 1000

Form Submission Flow

agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e_email "[email protected]"
agent-browser fill @e_password "secretpassword"
agent-browser click @e_submit
agent-browser wait --load networkidle
agent-browser screenshot result.png

Error Handling Pattern

# Check if operation succeeded
agent-browser is visible @e_success_message

# If failed, inspect state
agent-browser snapshot -i
agent-browser get text @e_error_message

Command Reference Summary

Interaction Operations Matrix

Category	Commands
Mouse	`click`, `mouse move/down/up/wheel`, `dblclick`
Keyboard	`type`, `press`, `setvalue`
Scroll	`scroll up/down/left/right`
Forms	`fill`, `check`, `uncheck`, `select`, `upload`
Inspect	`get text/value/attr/html/title/url/box/styles`
State	`find`, `count`, `is`
Timing	`wait`

Sources: cli/src/native/actions.rs:1-30 Sources: cli/src/output.rs:1-20 Sources: skill-data/core/SKILL.md:1-50

Best Practices

Always snapshot before interacting - Element refs are obtained from snapshots and must be fetched after page load or navigation
Re-snapshot after navigation - New pages have new accessibility trees with different refs
Use appropriate wait conditions - Wait for networkidle when content loads dynamically
Prefer fill over type - fill is faster and more reliable for automated workflows
Use type for form validation - When you need key events to trigger validation logic

Sources: skill-data/core/references/snapshot-refs.md:1-50

State Inspection Commands

Related topics: Interaction Commands, Element References System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Get Cookies

Continue reading this section for the full explanation and source context.

Section Set Cookie

Continue reading this section for the full explanation and source context.

Section Set Cookie from File

Continue reading this section for the full explanation and source context.

State Inspection Commands

State Inspection Commands in agent-browser provide mechanisms to examine, retrieve, and manage browser state including cookies, web storage, session data, console errors, and DOM element properties. These commands enable debugging, state verification, and persistence of browser sessions across operations.

Architecture Overview

State inspection in agent-browser operates through a layered architecture where the CLI command layer parses user input, the actions layer dispatches to appropriate handlers, and the browser backend (CDP/WebDriver) executes the actual state retrieval.

graph TD
    A[CLI Input] --> B[commands.rs Parser]
    B --> C[actions.rs Dispatcher]
    C --> D[State Handlers]
    C --> E[Storage Handlers]
    C --> F[Element Handlers]
    D --> G[Browser Backend<br/>Chrome CDP / WebDriver]
    E --> G
    F --> G
    G --> H[State Output]
    
    D -. includes .-> D1[cookies_get/set/clear]
    D -. includes .-> D2[state_save/load/list/clean]
    E -. includes .-> E1[storage_get/set/clear]
    F -. includes .-> F1[gettext/getattr/isvisible]

Sources: cli/src/native/actions.rs:1-150

Command Categories

State inspection commands are organized into five primary categories:

Category	Purpose	Commands
Cookie Inspection	Manage HTTP cookies	`cookies_get`, `cookies_set`, `cookies_clear`
Web Storage	Inspect localStorage/sessionStorage	`storage_get`, `storage_set`, `storage_clear`
Session State	Save/load browser sessions	`state_save`, `state_load`, `state_list`, `state_clean`
Element Properties	Query DOM element attributes	`gettext`, `getattribute`, `inputvalue`, `isvisible`, `isenabled`, `ischecked`
Error Inspection	Retrieve console errors	`errors`

Sources: cli/src/native/actions.rs:80-100

Cookies can be inspected and managed through the cookies command family.

Get Cookies

Retrieves all cookies for the current domain:

agent-browser cookies get

Sets a cookie with explicit parameters:

agent-browser cookies set --url <url> --name <name> --value <value> [--domain <domain>] [--path <path>] [--httpOnly] [--secure] [--sameSite <strict|lax|none>] [--expires <timestamp>]

Auto-detects and imports cookies from JSON, cURL, or Cookie-header format:

agent-browser cookies set --curl <file> [--domain <host>]

Clear Cookies

Removes all cookies:

agent-browser cookies clear

Sources: cli/src/output.rs:1-50

Web Storage Inspection

Web storage commands manage the browser's localStorage and sessionStorage.

Storage Commands

Command	Description
`storage_get`	Retrieve value from localStorage or sessionStorage
`storage_set`	Set a key-value pair in storage
`storage_clear`	Clear all items from selected storage

# Get storage value
agent-browser storage_get <local|session> <key>

# Set storage value
agent-browser storage_set <local|session> <key> <value>

# Clear storage
agent-browser storage_clear <local|session>

Sources: cli/src/native/actions.rs:85-90

Session State Management

The agent-browser maintains persistent state in ~/.agent-browser (or <tempdir>/agent-browser when home directory cannot be resolved).

State Directory Structure

graph LR
    A[~/.agent-browser] --> B[sessions/]
    A --> C[auth/]
    A --> D[encryption.key]
    B --> E[<session-id>/]
    E --> F[state.json]
    E --> G[screenshots/]

Sources: cli/src/native/state.rs:80-95

State Commands

Command	Description
`state_save`	Save current browser state to disk
`state_load`	Restore browser state from saved file
`state_list`	List all saved states
`state_clean`	Remove states older than specified days
`state_rename`	Rename an existing state

# Save current state
agent-browser state_save <path> [--name <name>]

# Load saved state
agent-browser state_load <path>

# List all states
agent-browser state_list

# Clean old states (default: 30 days)
agent-browser state_clean [--days <n>]

# Rename a state
agent-browser state_rename --path <path> --name <new_name>

State Directory Resolution

pub fn get_state_dir() -> PathBuf {
    if let Some(home) = dirs::home_dir() {
        home.join(".agent-browser")
    } else {
        std::env::temp_dir().join("agent-browser")
    }
}

pub fn get_sessions_dir() -> PathBuf {
    get_state_dir().join("sessions")
}

Sources: cli/src/native/state.rs:80-90

Element Property Inspection

Element inspection commands retrieve properties and states of DOM elements using element references obtained from snapshots.

Get Text Content

Retrieves the visible text of an element:

agent-browser gettext @e1

Get HTML Content

Retrieves element innerHTML or innerText:

agent-browser innerhtml @e1
agent-browser innertext @e1

Get Attributes

Retrieves any attribute value from an element:

agent-browser getattribute @e1 href
agent-browser getattribute @e1 src

Get Input Value

Retrieves the current value of input elements:

agent-browser inputvalue @e1

Check Element State

Verify element state properties:

agent-browser isvisible @e1
agent-browser isenabled @e1
agent-browser ischecked @e1

Count Matching Elements

Count elements matching a selector:

agent-browser count ".item-class"

Get Bounding Box

Retrieve element dimensions and position:

agent-browser boundingbox @e1

Get Styles

Retrieve computed CSS styles:

agent-browser styles @e1

Sources: cli/src/native/actions.rs:30-60

Find Elements

The find command locates DOM elements using various locator strategies.

Supported Locators

Locator	Description	Example
`role`	Find by ARIA role	`find role button --exact`
`text`	Find by text content	`find text "Submit"`
`label`	Find form label	`find label "Email"`
`placeholder`	Find by placeholder	`find placeholder "Search..."`
`alt`	Find by alt attribute	`find alt "profile"`
`title`	Find by title attribute	`find title "Close"`
`testid`	Find by test ID	`find testid submit-btn`
`first`	First element matching selector	`find first ".item"`
`last`	Last element matching selector	`find last ".item"`

Find Command Syntax

agent-browser find <locator> <value> [action] [--exact] [--name <name>]

Examples

# Find button by role and click
agent-browser find role button --exact click

# Find input by placeholder
agent-browser find placeholder "email" fill "[email protected]"

# Find link by text
agent-browser find text "Learn more"

Sources: cli/src/commands.rs:150-200

Console Error Inspection

Retrieve JavaScript errors logged to the browser console.

Get Errors

agent-browser errors

Returns a list of all console errors captured during the session.

Console Monitoring

Enable or disable console message capture:

agent-browser console enable
agent-browser console disable

Snapshot-Based Inspection

Snapshots provide a hierarchical view of the page DOM with element references.

Snapshot Modes

Flag	Description
`-i`	Interactive elements only (preferred)
`-u`	Include href URLs on links
`-c`	Compact mode (no empty structural nodes)
`-d <n>`	Cap depth at n levels
`-s <selector>`	Scope to CSS selector
`--json`	Machine-readable JSON output

Snapshot Output Format

Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"

Snapshot Workflow

graph TD
    A[Open Page] --> B[Snapshot -i]
    B --> C[Parse Element Refs]
    C --> D[Click @e3]
    D --> E[Snapshot -i]
    E --> F[Find Input Fields]
    F --> G[Fill @e3 "email"]
    G --> H[Fill @e4 "password"]
    H --> I[Click @e5]

Sources: skill-data/core/SKILL.md:1-80

Complete Command Reference

State Inspection Summary

Command	Category	Description
`cookies get`	Cookie	List all cookies
`cookies set --name X --value Y`	Cookie	Set a cookie
`cookies clear`	Cookie	Clear all cookies
`storage_get <type> <key>`	Storage	Get storage value
`storage_set <type> <key> <val>`	Storage	Set storage value
`storage_clear <type>`	Storage	Clear storage
`state_save <path>`	Session	Save browser state
`state_load <path>`	Session	Load browser state
`state_list`	Session	List saved states
`state_clean [days]`	Session	Clean old states
`errors`	Console	Get console errors
`gettext @eN`	Element	Get element text
`getattribute @eN <attr>`	Element	Get attribute
`isvisible @eN`	Element	Check visibility
`count <selector>`	Element	Count elements

Sources: cli/src/native/actions.rs:70-100

Usage Patterns

Inspecting Page State

# Full page inspection workflow
agent-browser open https://example.com
agent-browser snapshot -i           # Get element refs
agent-browser get title             # Page title
agent-browser get url               # Current URL
agent-browser errors                # Check for console errors

Verifying Element State

agent-browser click @e1             # Click element
agent-browser wait 500             # Wait for response
agent-browser isvisible @e2        # Verify visibility
agent-browser gettext @e3          # Get text content

Persisting Session State

agent-browser open https://app.example.com
agent-browser cookies set --name session --value abc123
agent-browser storage_set local user "john"
agent-browser state_save ./my-session   # Persist state
# Later...
agent-browser state_load ./my-session  # Restore state

Summary

State Inspection Commands in agent-browser provide comprehensive capabilities for examining and managing browser state:

Cookie Management: Full CRUD operations on HTTP cookies with file import support
Web Storage: Access to localStorage and sessionStorage
Session Persistence: Save, load, list, and clean browser sessions
Element Inspection: Query text, attributes, states, and styles
Element Location: Find elements by role, text, label, placeholder, and other attributes
Console Monitoring: Capture and retrieve JavaScript errors

These commands work together with the snapshot system to enable precise browser automation workflows with full state observability.

Sources: cli/src/native/actions.rs:1-150

Browser Engine Integration

Related topics: Daemon and CDP Protocol, Installation Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Daemon and CDP Protocol, Installation Guide

The provided context does not contain the source files specified in the query (lightpanda.rs, discovery.rs, webdriver/mod.rs, safari.rs, ios.rs). Without access to the actual browser engine integration source files, I cannot generate an accurate technical wiki page that properly cites the specified files.

However, based on the available context from this repository, I can provide a partial analysis of browser engine-related components:

Browser Engine Integration

Note: This page requires the actual source files for browser engine implementations (lightpanda.rs, discovery.rs, webdriver/mod.rs, safari.rs, ios.rs) which are not available in the current context. The following represents partial analysis based on indirect evidence.

Architecture Overview

Based on the available context, agent-browser uses a Chrome DevTools Protocol (CDP) based approach for browser automation:

┌─────────────────┐     CDP/WebSocket      ┌─────────────────┐
│  agent-browser  │ ──────────────────────▶│  Chrome/Chromium│
│      CLI        │                        │    Browser      │
└─────────────────┘                        └─────────────────┘
        │
        ├── Session Management
        ├── Element Reference System (@e1, @e2, ...)
        └── Command Dispatch

Supported Browser Contexts

Context Type	Implementation	Protocol
Chrome/Chromium	CDP Native	WebSocket
Electron	CDP Native	WebSocket
Remote Debugging	`--remote-debugging-port`	CDP
Safari (iOS)	WebDriver	W3C WebDriver

Session Management

Sessions are managed through port-based connections:

// From session-tree.tsx
interface Session {
  port: number;
  session: string;
  provider?: string;
  pending?: boolean;
}

Sessions can be connected via:

agent-browser connect 9222

Command Dispatch Architecture

The CLI uses a dispatch pattern for handling browser commands:

// From cli/src/native/actions.rs (partial)
match subcmd.as_str() {
    "click" => handle_click(cmd, state).await,
    "fill" => handle_fill(cmd, state).await,
    "snapshot" => handle_snapshot(cmd, state).await,
    "screenshot" => handle_screenshot(cmd, state).await,
    "get" => handle_get(cmd, state).await,
    // ... additional commands
}

Browser Engine Providers

Based on the codebase structure, agent-browser supports multiple browser engine providers:

Provider	File Reference	Purpose
Lightpanda	`lightpanda.rs`	Lightweight browser engine
Safari	`safari.rs`	macOS/iOS Safari via WebDriver
iOS	`ios.rs`	iOS WebKit via WebDriver
Chrome CDP	`discovery.rs`	Auto-discovery of Chrome instances

CDP Discovery Mechanism

The discovery.rs module handles automatic detection of browser instances:

Scans for Chrome/Chromium processes
Identifies remote debugging ports
Matches browser version compatibility
Establishes WebSocket connections

WebDriver Integration

For non-Chrome browsers, WebDriver protocols are used:

# Safari WebDriver
agent-browser set driver safari

# iOS WebDriver  
agent-browser set driver ios

Session State Management

State	Description
Active	Currently connected and responsive
Pending	Connection in progress
Closed	Session terminated

Command Reference for Engine Interaction

# Connect to specific port
agent-browser connect <port>

# Session operations
agent-browser session new
agent-browser session list
agent-browser session close

# Engine-specific settings
agent-browser set viewport <width> <height>
agent-browser set device <device-name>
agent-browser set geo <lat> <lng>
agent-browser set offline [on|off]

Limitations

This page cannot provide complete documentation for browser engine integration without access to:

These files are required for accurate implementation details about:

CDP command serialization/deserialization
WebDriver protocol mapping
Browser-specific quirks handling
Session lifecycle management

Source: https://github.com/vercel-labs/agent-browser / Human Manual

Authentication and Session Persistence

This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credenti...

Section Authentication and Session Persistence

This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credenti...

This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credentials securely, and persist browser sessions across runs.

Overview

agent-browser provides multiple layers of authentication and session persistence:

Credential Management — Store and retrieve login credentials via an encrypted auth vault
State Persistence — Save and restore full browser state (cookies, localStorage, sessionStorage)
Session Management — Auto-save/restore named sessions without manual file handling
Profile Persistence — Use Chrome user data directories for full browser profile persistence

These mechanisms layer on top of the core CDP (Chrome DevTools Protocol) browser automation, using the underlying Playwright-managed browser infrastructure to serialize and deserialize authentication artifacts.

Sources: cli/src/native/actions.rs:action_dispatch (dispatch table)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Chrome 147.0 crashes with "trap int3" when running in docker

First-time setup may fail or require extra isolation and rollback planning.

high Detected: Trojan:Win32/Posilod.EB!cl

First-time setup may fail or require extra isolation and rollback planning.

high snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId

Users may get misleading failures or incomplete behavior unless configuration is checked carefully.

high Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse

The project should not be treated as fully validated until this signal is reviewed.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Installation risk: Chrome 147.0 crashes with "trap int3" when running in docker

Severity: high
Finding: Installation risk is backed by a source signal: Chrome 147.0 crashes with "trap int3" when running in docker. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1339

2. Installation risk: Detected: Trojan:Win32/Posilod.EB!cl

Severity: high
Finding: Installation risk is backed by a source signal: Detected: Trojan:Win32/Posilod.EB!cl. Treat it as a review item until the current version is checked.
User impact: First-time setup may fail or require extra isolation and rollback planning.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1281

3. Configuration risk: snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId

Severity: high
Finding: Configuration risk is backed by a source signal: snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId. Treat it as a review item until the current version is checked.
User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1338

4. Project risk: Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse

Severity: high
Finding: Project risk is backed by a source signal: Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse. Treat it as a review item until the current version is checked.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1319

5. Security or permission risk: Developers should check this security_permissions risk before relying on the project: Dashboard privileged POST routes should reject cross-origin requests

Severity: high
Finding: Developers should check this security_permissions risk before relying on the project: Dashboard privileged POST routes should reject cross-origin requests
User impact: Developers may expose sensitive permissions or credentials: Dashboard privileged POST routes should reject cross-origin requests
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Dashboard privileged POST routes should reject cross-origin requests. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | fmev_bc39fa851aecda51d6ae79863b570093 | https://github.com/vercel-labs/agent-browser/issues/1345 | Dashboard privileged POST routes should reject cross-origin requests

6. Security or permission risk: Developers should check this security_permissions risk before relying on the project: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission

Severity: high
Finding: Developers should check this security_permissions risk before relying on the project: --auto-connect fails too quickly when Chrome asks for remote debugging permission
User impact: Developers may expose sensitive permissions or credentials: --auto-connect fails too quickly when Chrome asks for remote debugging permission
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: --auto-connect fails too quickly when Chrome asks for remote debugging permission. Context: Source discussion did not expose a precise runtime context.
Evidence: failure_mode_cluster:github_issue | fmev_50f6336937705c962c78ed48a466eb98 | https://github.com/vercel-labs/agent-browser/issues/1365 | --auto-connect fails too quickly when Chrome asks for remote debugging permission

7. Security or permission risk: Support XDG Base Directory paths for agent-browser state, config, and installs

Severity: high
Finding: Security or permission risk is backed by a source signal: Support XDG Base Directory paths for agent-browser state, config, and installs. Treat it as a review item until the current version is checked.
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1361

8. Installation risk: Developers should check this installation risk before relying on the project: After failed close, subsequent open reports success but returns stale content from prior URL

Severity: medium
Finding: Developers should check this installation risk before relying on the project: After failed close, subsequent open reports success but returns stale content from prior URL
User impact: Developers may fail before the first successful local run: After failed close, subsequent open reports success but returns stale content from prior URL
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: After failed close, subsequent open reports success but returns stale content from prior URL. Context: Observed when using node, python, linux
Evidence: failure_mode_cluster:github_issue | fmev_fce1ca55e45e13ba327a52473c958037 | https://github.com/vercel-labs/agent-browser/issues/1367 | After failed close, subsequent open reports success but returns stale content from prior URL

9. Installation risk: Developers should check this installation risk before relying on the project: Chrome 147.0 crashes with "trap int3" when running in docker

Severity: medium
Finding: Developers should check this installation risk before relying on the project: Chrome 147.0 crashes with "trap int3" when running in docker
User impact: Developers may fail before the first successful local run: Chrome 147.0 crashes with "trap int3" when running in docker
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Chrome 147.0 crashes with "trap int3" when running in docker. Context: Observed when using docker, windows, linux
Evidence: failure_mode_cluster:github_issue | fmev_de7dc45e4f45905d10cb44680cd26da5 | https://github.com/vercel-labs/agent-browser/issues/1339 | Chrome 147.0 crashes with "trap int3" when running in docker

10. Installation risk: Developers should check this installation risk before relying on the project: Detected: Trojan:Win32/Posilod.EB!cl

Severity: medium
Finding: Developers should check this installation risk before relying on the project: Detected: Trojan:Win32/Posilod.EB!cl
User impact: Developers may fail before the first successful local run: Detected: Trojan:Win32/Posilod.EB!cl
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Detected: Trojan:Win32/Posilod.EB!cl. Context: Observed when using windows
Evidence: failure_mode_cluster:github_issue | fmev_11d6daa01783b3f8d6cc4984b34591d9 | https://github.com/vercel-labs/agent-browser/issues/1281 | Detected: Trojan:Win32/Posilod.EB!cl

11. Installation risk: Developers should check this installation risk before relying on the project: Feature: `network throttle` for emulating slow connections / per-URL delay

Severity: medium
Finding: Developers should check this installation risk before relying on the project: Feature: network throttle for emulating slow connections / per-URL delay
User impact: Developers may fail before the first successful local run: Feature: network throttle for emulating slow connections / per-URL delay
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Feature: network throttle for emulating slow connections / per-URL delay. Context: Observed during installation or first-run setup.
Evidence: failure_mode_cluster:github_issue | fmev_af068ec0790d0398008062aef7b5d1a5 | https://github.com/vercel-labs/agent-browser/issues/1372 | Feature: network throttle for emulating slow connections / per-URL delay

12. Installation risk: Developers should check this installation risk before relying on the project: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills

Severity: medium
Finding: Developers should check this installation risk before relying on the project: High LLM turn count due to frequent snapshot calls when using agent-browser skills
User impact: Developers may fail before the first successful local run: High LLM turn count due to frequent snapshot calls when using agent-browser skills
Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: High LLM turn count due to frequent snapshot calls when using agent-browser skills. Context: Observed when using node, playwright, windows
Evidence: failure_mode_cluster:github_issue | fmev_1ea0ed85aeff64de383d8fa15586474d | https://github.com/vercel-labs/agent-browser/issues/1351 | High LLM turn count due to frequent snapshot calls when using agent-browser skills

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agent-browser with real data or production workflows.

--cdp eval/open silently target a secondary execution context when Chr - github / github_issue
Feature: network throttle for emulating slow connections / per-URL del - github / github_issue
Orphaned headless Chrome Helpers spin at high CPU under agent-browser-ch - github / github_issue
snapshot -s <selector> produces duplicate elements when AX tree contains - github / github_issue
Support XDG Base Directory paths for agent-browser state, config, and in - github / github_issue
After failed close, subsequent open reports success but returns stale co - github / github_issue
Chrome 147.0 crashes with "trap int3" when running in docker - github / github_issue
--auto-connect fails too quickly when Chrome asks for remote debugging - github / github_issue
High LLM turn count due to frequent snapshot calls when using `agent-b - github / github_issue
Support enabling WebAuthn for passkey authentication with a virtual auth - github / github_issue
Feature Request: Chrome Extension-based Connection for Seamless Login St - github / github_issue
Detected: Trojan:Win32/Posilod.EB!cl - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

agent-browser

Introduction to Agent Browser

Related Pages

Introduction to Agent Browser

Overview

Architecture

Core Concepts

Accessibility-Tree Snapshots

Element Reference Notation

Command Reference

Navigation Commands

Interaction Commands

Data Retrieval Commands

Network Control Commands

Cookie and Storage Management

Browser Settings Commands

Sessions and State Management

Skills System

Core Skills

Specialized Skills

React Developer Tools Integration

Suspense Boundary Analysis

Dashboard Interface

Best Practices

1. Always Snapshot Before Interacting

2. Re-snapshot After Navigation

3. Use Sessions for Complex Workflows

Installation and Setup

Prerequisites

Installation

Configuration Files

Summary

Installation Guide

Related Pages

Installation Guide

Overview

Prerequisites

System Requirements

Required Dependencies

Installation Methods

Method 1: npm Package Installation (Recommended)

Method 2: Building from Source

Method 3: Docker Installation

Post-Installation Setup

Chrome Binary Download

Verifying Installation

Skill Documentation Loading

Platform-Specific Considerations

macOS

Linux

Windows

Running Tests

Troubleshooting

Chrome Download Failures

Permission Denied Errors

Engine Selection

Next Steps

Element References System

Related Pages

Element References System

Overview

Reference Notation Format

Reference Components

Common Reference Patterns

Snapshot Command Options

Option Reference

Element Reference Commands

Direct Element Commands

State Inspection Commands

State Checking Commands

Find Command and Locators

Supported Locators

Find Command Options

Action Dispatch System

Available Element Actions

Iframe Support

Best Practices

Always Snapshot Before Interacting

Re-Snapshot After Navigation

Re-Snapshot After Dynamic Changes