Doramagic Project Pack · Human Manual
agent-browser
Agent Browser serves as a bridge between AI agents and web browsers, enabling autonomous web navigation, interaction, and data extraction. It is compatible with a wide range of AI agent pl...
Introduction to Agent Browser
Related topics: Installation Guide, Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Installation Guide, Architecture Overview
Introduction to Agent Browser
Agent Browser is a high-performance, native Rust CLI tool designed for browser automation and AI agent integration. Unlike traditional browser automation frameworks that rely on Node.js wrappers or third-party libraries, Agent Browser communicates directly with Chrome/Chromium via the Chrome DevTools Protocol (CDP), providing a lightweight and reliable solution for web interaction tasks.
Overview
Agent Browser serves as a bridge between AI agents and web browsers, enabling autonomous web navigation, interaction, and data extraction. It is compatible with a wide range of AI agent platforms including Cursor, Claude Code, Codex, Continue, and Windsurf.
| Aspect | Description |
|---|---|
| Language | Rust (native CLI) |
| Protocol | Chrome DevTools Protocol (CDP) |
| Dependencies | No Playwright or Puppeteer dependency |
| Platform | Chrome/Chromium |
| License | See repository LICENSE |
Sources: skills/agent-browser/SKILL.md
Architecture
Agent Browser follows a modular architecture with distinct layers for CLI handling, native browser control, and extensible skills.
graph TD
A[User / AI Agent] --> B[CLI Layer<br/>Rust Commands]
B --> C[Native Actions Layer<br/>CDP Dispatcher]
C --> D[Chrome/Chromium<br/>via CDP]
E[Skills System] --> B
E --> F[Core Skills]
E --> G[Specialized Skills]
G --> G1[Electron Apps]
G --> G2[Slack Workspace]
G --> G3[Exploratory Testing]
G --> G4[Cloud Providers]
H[Session Management] --> C
H --> H1[Auth Vault]
H --> H2[State Persistence]
H --> H3[Video Recording]Sources: skill-data/core/SKILL.md, skills/agent-browser/SKILL.md
Core Concepts
Accessibility-Tree Snapshots
Agent Browser generates accessibility-tree snapshots that provide structured, human-readable representations of web pages. Each interactive element receives a unique reference ID (e.g., @e1, @e2) that can be used for subsequent interactions.
Example snapshot output:
Page: Example - Log in
URL: https://example.com/login
@e1 [heading] "Log in"
@e2 [form]
@e3 [input type="email"] placeholder="Email"
@e4 [input type="password"] placeholder="Password"
@e5 [button type="submit"] "Continue"
@e6 [link] "Forgot password?"
Sources: skill-data/core/references/snapshot-refs.md, skill-data/core/SKILL.md
Element Reference Notation
Element references follow a consistent notation pattern:
@e1 [tag attribute="value"] "text content" placeholder="hint"
| Component | Description |
|---|---|
@e1 | Unique reference ID |
tag | HTML tag name |
attribute="value" | Key attributes |
"text content" | Visible text |
placeholder="hint" | Additional attributes |
Sources: skill-data/core/references/snapshot-refs.md
Command Reference
Navigation Commands
| Command | Description |
|---|---|
agent-browser open [url] | Launch browser with optional navigation |
agent-browser back | Navigate backward |
agent-browser forward | Navigate forward |
agent-browser reload | Reload current page |
agent-browser close | Close browser |
agent-browser connect <port> | Connect to existing browser via CDP |
Sources: skill-data/core/references/commands.md
Interaction Commands
| Command | Description |
|---|---|
agent-browser click <ref> | Click an element |
agent-browser fill <ref> <text> | Type text into input |
agent-browser select <ref> <value> | Select dropdown option |
agent-browser check <ref> | Check a checkbox |
agent-browser scroll <direction> <pixels> | Scroll page |
Sources: cli/src/native/actions.rs
Data Retrieval Commands
| Command | Description |
|---|---|
agent-browser snapshot [-i] | Get page snapshot (interactive only with -i) |
agent-browser screenshot [path] | Capture screenshot |
agent-browser get text <ref> | Get visible text |
agent-browser get attr <ref> <name> | Get attribute value |
agent-browser get url | Get current URL |
agent-browser get title | Get page title |
Sources: cli/src/output.rs, cli/src/native/actions.rs
Network Control Commands
| Command | Description | |
|---|---|---|
agent-browser network route <url> | Intercept network request | |
agent-browser network unroute <url> | Remove interception | |
agent-browser network requests [--clear] | View/clear network requests | |
| `agent-browser network har <start\ | stop> [path]` | Capture HAR file |
Sources: skill-data/core/references/commands.md, cli/src/output.rs
Cookie and Storage Management
agent-browser cookies get # View all cookies
agent-browser cookies set --url <url> --name <name> --value <val>
agent-browser cookies clear # Clear all cookies
agent-browser storage local # Manage localStorage
agent-browser storage session # Manage sessionStorage
Sources: cli/src/output.rs
Browser Settings Commands
| Command | Description | |
|---|---|---|
agent-browser set viewport <w> <h> | Set viewport size | |
agent-browser set device <name> | Emulate device | |
agent-browser set geo <lat> <lng> | Set geolocation | |
| `agent-browser set offline on\ | off` | Toggle offline mode |
agent-browser set headers <json> | Set custom headers | |
| `agent-browser set media dark\ | light` | Set color scheme |
Sources: cli/src/output.rs
Sessions and State Management
Agent Browser supports multiple concurrent browser sessions with state persistence.
graph LR
A[Session A] --> B[State File A]
C[Session B] --> D[State File B]
E[Auth Vault] --> A
E[Auth Vault] --> CKey Features:
- Named Sessions:
--session <name>flag for multiple sessions - State Persistence: Save and restore browser state
- Auth Vault: Secure credential storage
- Video Recording: Capture browser activity
Sources: skill-data/core/SKILL.md, skills/agent-browser/SKILL.md
Skills System
Agent Browser uses an extensible skills system that provides specialized workflows for different environments.
Core Skills
agent-browser skills get core # Core workflows and common patterns
agent-browser skills get core --full # Include full command reference
Specialized Skills
| Skill | Description | Command |
|---|---|---|
| Electron | Desktop app automation | agent-browser skills get electron |
| Slack | Workspace automation | agent-browser skills get slack |
| Dogfood | Exploratory testing/QA | agent-browser skills get dogfood |
| Vercel Sandbox | Cloud browser in microVMs | agent-browser skills get vercel-sandbox |
| AgentCore | AWS Bedrock cloud browsers | agent-browser skills get agentcore |
Sources: skills/agent-browser/SKILL.md
React Developer Tools Integration
Agent Browser includes built-in React DevTools support for debugging React applications:
| Command | Description |
|---|---|
agent-browser react_tree | View React component tree |
agent-browser react_inspect | Inspect component props/state |
agent-browser react_renders_start | Track render counts |
agent-browser react_renders_stop | Stop render tracking |
Sources: cli/src/native/actions.rs, cli/src/react/suspense.rs
Suspense Boundary Analysis
Agent Browser can analyze React Suspense boundaries with actionability scoring:
| Blocker Kind | Weight | Actionability |
|---|---|---|
| ClientHook | 7 | 90% |
| RequestApi | 6 | 88% |
| ServerFetch | 5 | 82% |
| Cache | 4 | 74% |
| Stream | 3 | 60% |
| Unknown | 2 | 35% |
| Framework | 1 | 18% |
Sources: cli/src/react/suspense.rs
Dashboard Interface
Agent Browser includes a web-based dashboard for visual browser management:
graph TD
A[Dashboard] --> B[Controls Panel]
A --> C[Result Panel]
A --> D[Network Panel]
A --> E[Extensions Panel]
B --> B1[URL Input]
B --> B2[Mode Selector]
B --> B3[Action Controls]
C --> C1[Screenshot View]
C --> C2[Snapshot View]
C --> C3[Step History]
D --> D1[Request List]
D --> D2[HAR Export]
E --> E1[Extension List]
E --> E2[Extension Details]The dashboard is built with React and supports:
- Resizable panels for flexible layouts
- Theme switching (light/dark)
- Mobile-responsive design
- Real-time step history
Sources: examples/environments/app/page.tsx, packages/dashboard/src/components/network-panel.tsx, packages/dashboard/src/components/extensions-panel.tsx
Best Practices
1. Always Snapshot Before Interacting
# CORRECT - Snapshot first to get refs
agent-browser open https://example.com
agent-browser snapshot -i # Get refs first
agent-browser click @e1 # Use ref
# WRONG - Ref doesn't exist yet
agent-browser open https://example.com
agent-browser click @e1 # Will fail!
2. Re-snapshot After Navigation
Element references change when the page navigates. Always take a new snapshot after clicking links or navigating to new pages.
3. Use Sessions for Complex Workflows
agent-browser --session my-session open https://example.com
agent-browser --session my-session snapshot -i
# ... perform actions ...
agent-browser --session my-session close
Sources: skill-data/core/references/snapshot-refs.md
Installation and Setup
Prerequisites
- Chrome or Chromium browser installed
- Operating system: macOS, Linux, or Windows
Installation
Refer to the repository's installation instructions for your platform. Agent Browser is distributed as a native binary with no runtime dependencies.
Configuration Files
| File | Purpose |
|---|---|
~/.agent-browser/ | Default config directory |
| Sessions | Stored in config directory |
| Auth Vault | Encrypted credential storage |
Sources: AGENTS.md
Summary
Agent Browser provides a powerful, efficient, and AI-agent-friendly approach to browser automation. Its key differentiators include:
- Native Rust implementation for high performance
- Direct CDP communication without third-party dependencies
- Accessibility-tree snapshots for reliable element targeting
- Session management for complex multi-step workflows
- Extensible skills system for specialized environments
- Built-in React DevTools integration for debugging
These features make Agent Browser an ideal choice for AI agents, automated testing pipelines, and developer workflows requiring precise browser control.
Source: https://github.com/vercel-labs/agent-browser / Human Manual
Installation Guide
Related topics: Introduction to Agent Browser
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to Agent Browser
Installation Guide
Overview
The agent-browser project is a native Rust CLI tool designed for browser automation, providing AI agents with reliable web interaction capabilities. Unlike traditional browser automation tools that rely on Node.js wrappers, agent-browser delivers a fast, lightweight solution built directly in Rust with Chrome/Chromium support via Chrome DevTools Protocol (CDP). The installation process handles downloading the necessary Chrome browser binaries, setting up platform-specific binaries, and configuring dependencies for the dashboard UI.
Sources: AGENTS.md
Prerequisites
System Requirements
Before installing agent-browser, ensure your system meets the following requirements:
| Requirement | Details |
|---|---|
| Operating System | macOS, Linux, or Windows (7 platform binaries built) |
| Chrome/Chromium | Required for browser automation functionality |
| Rust Toolchain | Required for building from source |
| Node.js/pnpm | Required for dashboard development |
The project builds all 7 platform binaries during CI/CD, including native binaries for different architectures. Chrome is downloaded directly from Chrome for Testing during the installation process, eliminating the need for system-installed Chrome browsers.
Sources: AGENTS.md
Required Dependencies
| Dependency | Purpose | Installation Method |
|---|---|---|
| Chrome/Chromium | Browser automation target | Auto-downloaded via install command |
| Cargo/Rust | Building CLI from source | rustup.rs |
| pnpm | Dashboard package management | npm install -g pnpm |
Installation Methods
Method 1: npm Package Installation (Recommended)
The recommended installation method uses the npm registry for cross-platform compatibility:
npm install -g @agent-browser/cli
After installation, you must run the setup command to download Chrome binaries:
agent-browser install
Sources: skills/agent-browser/SKILL.md
Method 2: Building from Source
For development or customization, build the CLI from source:
# Clone the repository
git clone https://github.com/vercel-labs/agent-browser.git
cd agent-browser
# Install dependencies and build
cd cli && cargo build --release
The Rust codebase architecture follows a modular structure:
A[cli/src/native/] --> B[daemon/]
A --> C[actions/]
A --> D[browser/]
A --> E[CDP client/]
A --> F[snapshot/]
A --> G[state/]
The --engine flag allows selecting between Chrome and Lightpanda browser engines, providing flexibility in automation scenarios.
Sources: AGENTS.md
Method 3: Docker Installation
For containerized environments, Docker builds are supported:
# Build from the project's Dockerfile
docker build -t agent-browser -f docker/Dockerfile.build .
Docker installation is particularly useful for CI/CD pipelines and reproducible automation environments where system dependencies need to be isolated.
Post-Installation Setup
Chrome Binary Download
After installing the CLI package, you must download the Chrome binary:
agent-browser install
This command retrieves Chrome directly from Chrome for Testing, ensuring a compatible and up-to-date browser binary is available for all automation tasks. The --download-path flag can specify a custom location:
agent-browser --download-path /custom/path install
Sources: cli/src/flags.rs:45-49
Verifying Installation
Verify the installation by checking the version and available commands:
agent-browser --version
agent-browser --help
The CLI provides comprehensive command documentation through the help system:
| Command | Description |
|---|---|
agent-browser open <url> | Open a URL in the browser |
agent-browser snapshot | Capture accessibility tree with element refs |
agent-browser click @<ref> | Click element by reference |
agent-browser skills get <name> | Retrieve skill documentation |
agent-browser install | Download Chrome binaries |
Sources: cli/src/output.rs
Skill Documentation Loading
Agent-browser uses a skill-based documentation system that loads content dynamically based on the installed version:
# Load core workflows and common patterns
agent-browser skills get core
# Include full command reference and templates
agent-browser skills get core --full
# List all available skills
agent-browser skills list
Available specialized skills:
| Skill | Purpose |
|---|---|
electron | Electron desktop apps (VS Code, Slack, Discord, Figma) |
slack | Slack workspace automation |
dogfood | Exploratory testing and QA |
vercel-sandbox | Agent-browser inside Vercel Sandbox microVMs |
agentcore | AWS Bedrock AgentCore cloud browsers |
Sources: skills/agent-browser/SKILL.md
Platform-Specific Considerations
macOS
On macOS, if you encounter security prompts about unsigned applications, you may need to allow the application in System Preferences > Security & Privacy, or run:
xattr -d com.apple.quarantine /path/to/agent-browser
Linux
Linux distributions require WebKit/GTK dependencies for Chrome. Install via your package manager:
# Debian/Ubuntu
sudo apt-get install libgtk-3-0 libnss3
# Fedora
sudo dnf install gtk3 nss
Windows
Windows installations automatically configure the required runtime dependencies. Ensure Windows Subsystem for Linux (WSL) compatibility if running in hybrid environments.
Running Tests
After installation, verify the setup by running the test suite:
# Unit tests (fast, no Chrome required)
cd cli && cargo test
# End-to-end tests (requires Chrome installed)
cd cli && cargo test e2e -- --ignored --test-threads=1
The project contains approximately 320 unit tests and 18 e2e tests. E2E tests launch real headless Chrome instances and must run serially to avoid instance contention.
Sources: AGENTS.md
Troubleshooting
Chrome Download Failures
If the install command fails to download Chrome:
- Check network connectivity to
Chrome for Testing - Verify write permissions to the download directory
- Use
--download-pathto specify an alternative location with proper permissions
Permission Denied Errors
Ensure the agent-browser binary has execute permissions:
chmod +x /path/to/agent-browser
Engine Selection
If Chrome automation fails, try specifying the engine explicitly:
agent-browser --engine chrome open https://example.com
The --engine flag supports Chrome (default) and Lightpanda engines for different automation scenarios.
Next Steps
After successful installation:
- Load core skill documentation:
agent-browser skills get core --full - Open a test URL:
agent-browser open https://example.com - Capture a snapshot:
agent-browser snapshot -i - Explore specialized skills for your use case
Sources: skills/agent-browser/SKILL.md
Sources: AGENTS.md
Element References System
Related topics: State Inspection Commands, Interaction Commands
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: State Inspection Commands, Interaction Commands
Element References System
The Element References System is a core mechanism in agent-browser that provides stable, human-readable identifiers for DOM elements during browser automation tasks. Instead of relying on fragile CSS selectors or XPath expressions, the system assigns unique reference IDs (such as @e1, @e2) that persist across page states and can be used reliably in subsequent automation commands.
Overview
Element references serve as the primary interface between automation scripts and the browser's accessibility tree. When a snapshot is taken, each interactive element receives a reference ID that can be used in commands like click, fill, type, and get without requiring re-selection.
graph TD
A[Browser Page] --> B[snapshot Command]
B --> C[Accessibility Tree Traversal]
C --> D[Element Identification]
D --> E[Reference Assignment]
E --> F[@e1 @e2 @e3 ...]
F --> G[Automation Commands]
G --> H[click @e1]
G --> I[fill @e2]
G --> J[get text @e3]Reference Notation Format
Element references follow a standardized notation format that encodes element metadata:
@e1 [tag type="value"] "text content" placeholder="hint"
│ │ │ │ │
│ │ │ │ └─ Additional attributes
│ │ │ └─ Visible text
│ │ └─ Key attributes shown
│ └─ HTML tag name
└─ Unique ref ID
Sources: skill-data/core/references/snapshot-refs.md
Reference Components
| Component | Description | Example |
|---|---|---|
@eN | Unique reference identifier | @e1, @e42 |
| Tag | HTML element type | button, input, link |
| Type attribute | Element type classification | type="email", type="password" |
| Text content | Visible text on element | "Submit", "Log in" |
| Placeholder | Input placeholder text | placeholder="Email" |
Common Reference Patterns
The snapshot system recognizes common element patterns and standardizes their reference notation:
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a href="/page"] "Link Text" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [div class="modal"] # Container element
@e8 [img alt="Logo"] # Image with alt text
@e9 [checkbox] checked # Checked checkbox
@e10 [radio] selected # Selected radio button
Sources: skill-data/core/references/snapshot-refs.md
Snapshot Command Options
The snapshot command generates element references with various filtering and formatting options:
agent-browser snapshot # Full tree (verbose)
agent-browser snapshot -i # Interactive elements only (preferred)
agent-browser snapshot -i -u # Include href URLs on links
agent-browser snapshot -i -c # Compact mode (no empty structural nodes)
agent-browser snapshot -i -d 3 # Cap depth at 3 levels
agent-browser snapshot -s "#main" # Scope to a CSS selector
agent-browser snapshot -i --json # Machine-readable output
Sources: skill-data/core/SKILL.md
Option Reference
| Option | Purpose | Use Case |
|---|---|---|
-i | Interactive elements only | Preferred for automation |
-u | Include href URLs | When link destinations matter |
-c | Compact output | Complex pages with many empty nodes |
-d N | Depth limit | Focus on specific page sections |
-s SELECTOR | CSS scope | Target specific page regions |
--json | JSON format | Programmatic processing |
Element Reference Commands
Element references are used with various commands to interact with page elements:
Direct Element Commands
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Fill input field
agent-browser type @e2 "text" # Type character by character
agent-browser press Enter # Press key on focused element
State Inspection Commands
agent-browser get text @e1 # Get visible text
agent-browser get html @e1 # Get innerHTML
agent-browser get attr @e1 href # Get specific attribute
agent-browser get value @e1 # Get input value
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count ".item" # Count matching elements
State Checking Commands
The is command verifies element states:
agent-browser is visible @e1
agent-browser is enabled @e1
agent-browser is checked @e1
Sources: cli/src/output.rs
Find Command and Locators
The find command provides an alternative to snapshot-based reference acquisition by locating elements using various criteria:
agent-browser find <locator> <value> <action> [text]
Supported Locators
| Locator | Description | Example |
|---|---|---|
role | ARIA role selector | find role button click |
text | Text content match | find text "Submit" click |
label | Label text association | find label "Email" fill |
placeholder | Placeholder attribute | find placeholder "Search" |
alt | Alt text (images) | find alt "Logo" click |
title | Title attribute | find title "Help" click |
testid | Test identifier | find testid "submit-btn" click |
first | First matching selector | find first button click |
last | Last matching selector | find last link click |
nth | Nth matching element | find nth 5 button click |
Sources: cli/src/commands.rs
Find Command Options
| Option | Purpose |
|---|---|
--exact | Perform exact string matching |
--name <name> | Filter by accessible name (role locator) |
Action Dispatch System
Element reference commands are dispatched to handlers through the action routing system:
graph LR
A[Command Input] --> B["dispatch(\"click\", state)"]
B --> C{Match Action}
C -->|click| D[handle_click]
C -->|fill| E[handle_fill]
C -->|get| F[handle_get]
C -->|is| G[handle_is]
C -->|find| H[handle_find]The action router maps command strings to their respective handlers in the native daemon:
"click" => handle_dispatch(cmd, state).await,
"fill" => handle_dispatch(cmd, state).await,
"get" => handle_dispatch(cmd, state).await,
"is" => handle_dispatch(cmd, state).await,
"find" => handle_dispatch(cmd, state).await,
Sources: cli/src/native/actions.rs
Available Element Actions
| Action | Handler | Purpose |
|---|---|---|
click | handle_dispatch | Mouse click |
fill | handle_dispatch | Fill input with text |
type | handle_dispatch | Character-by-character typing |
press | handle_dispatch | Keyboard press |
hover | handle_dispatch | Mouse hover |
select | handle_dispatch | Select dropdown option |
check | handle_dispatch | Check checkbox/radio |
uncheck | handle_dispatch | Uncheck checkbox |
focus | handle_dispatch | Focus element |
blur | handle_dispatch | Blur element |
Iframe Support
Element references automatically handle iframe content. When a snapshot is taken, iframe elements are resolved and their child accessibility trees are included inline:
agent-browser snapshot -i
# Output:
# @e1 [heading] "Checkout"
# @e2 [Iframe] "payment-frame"
# @e3 [input] "Card number"
# @e4 [input] "Expiry"
# @e5 [button] "Pay"
# @e6 [button] "Cancel"
References to elements inside iframes carry frame context, allowing direct interactions without manual frame switching:
agent-browser click @e3 # Works inside iframe
agent-browser fill @e4 "12/25"
Sources: skill-data/core/references/snapshot-refs.md
Best Practices
Always Snapshot Before Interacting
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i # Get refs first
agent-browser click @e1 # Use ref
# WRONG
agent-browser open https://example.com
agent-browser click @e1 # Ref doesn't exist yet!
Re-Snapshot After Navigation
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # Get new refs
agent-browser click @e1 # Use new refs
Re-Snapshot After Dynamic Changes
agent-browser click @e1 # Opens dropdown
agent-browser snapshot -i # See dropdown items
agent-browser click @e7 # Select item
Snapshot Specific Regions
For complex pages, snapshot specific areas to reduce noise:
# Snapshot just a form
agent-browser snapshot @e9
Session-Dependent References
Element references are session-dependent and may vary between browser sessions. The same element on the same page might receive different reference IDs in different sessions:
| Element | Typical Ref Range | How to Find | |
|---|---|---|---|
| Home tab | e10-e20 | `snapshot -i \ | grep "Home"` |
| DMs tab | e10-e20 | `snapshot -i \ | grep "DMs"` |
| Activity tab | e10-e20 | `snapshot -i \ | grep "Activity"` |
| Search | e5-e10 | `snapshot -i \ | grep "Search"` |
| More unreads | e20-e30 | `snapshot -i \ | grep "More unreads"` |
| Channel refs | e30+ | `snapshot -i \ | grep "channel-name"` |
Sources: skill-data/slack/references/slack-tasks.md
Architecture Summary
graph TD
subgraph "CLI Layer"
A[User Command] --> B[commands.rs Parser]
B --> C[Command Dispatch]
end
subgraph "Native Daemon"
C --> D[actions.rs Router]
D --> E[State Manager]
E --> F[CDP Client]
end
subgraph "Browser Layer"
F --> G[Chrome DevTools Protocol]
G --> H[Accessibility Tree]
end
subgraph "Reference Generation"
H --> I[Element ID Assignment]
I --> J[@eN Reference Labels]
end
J --> K[Snapshot Output]
K --> L[Automation Commands]The Element References System provides the foundation for reliable browser automation by abstracting DOM complexity behind human-readable identifiers that remain stable across page states and navigation events.
Architecture Overview
Related topics: Daemon and CDP Protocol, Introduction to Agent Browser
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Daemon and CDP Protocol, Introduction to Agent Browser
Architecture Overview
agent-browser is a Rust-based browser automation framework that provides high-performance browser control through native CDP (Chrome DevTools Protocol) communication. The system is designed for AI agent integration, enabling reliable and observable browser automation.
System Architecture
The architecture follows a layered approach with clear separation between the CLI interface, daemon process, and browser engine.
graph TB
subgraph "Client Layer"
CLI[CLI Interface]
Dashboard[Web Dashboard]
end
subgraph "Daemon Layer"
WS[WebSocket Server]
Dispatcher[Action Dispatcher]
State[State Manager]
end
subgraph "CDP Layer"
CDP[CDP Client]
Protocol[Protocol Handler]
end
subgraph "Browser Engine"
Chrome[Chrome/Chromium]
Lightpanda[Lightpanda]
end
CLI --> WS
Dashboard --> WS
WS --> Dispatcher
Dispatcher --> CDP
CDP --> Chrome
CDP --> Lightpanda
Dispatcher --> StateCore Components
Daemon Architecture
The browser automation daemon is the central coordinator that manages browser sessions and handles command dispatching. It runs as a persistent process that maintains browser state across multiple operations.
Key Responsibilities:
| Component | Responsibility |
|---|---|
| WebSocket Server | Accepts client connections with origin validation |
| Action Dispatcher | Routes commands to appropriate handlers |
| State Manager | Maintains session state and snapshots |
| CDP Client | Manages protocol-level communication |
Sources: cli/src/native/mod.rs
Action Dispatch System
The action system provides a comprehensive set of browser automation commands. Actions are dispatched based on command type and handle specific browser operations.
Action Categories:
| Category | Commands |
|---|---|
| Navigation | goto, back, forward, reload, waitforurl, waitforloadstate |
| Interaction | click, fill, press, select, check, uncheck, multiselect |
| Content | snapshot, innertext, innerhtml, gettext, getattribute |
| State | cookies_get, cookies_set, storage_get, storage_set |
| Network | route, unroute, requests, har |
| React Debug | react_tree, react_inspect, react_renders_start |
Sources: cli/src/native/actions.rs:1-50
CDP Client Layer
The CDP (Chrome DevTools Protocol) client handles low-level communication with the browser engine. This abstraction allows the system to work with different browser engines through a unified interface.
Supported Engines:
| Engine | Selection Flag |
|---|---|
| Chrome/Chromium | --engine chrome (default) |
| Lightpanda | --engine lightpanda |
Sources: cli/src/native/mod.rs
Communication Protocol
WebSocket Server
The daemon exposes a WebSocket server for client communication. Security is enforced through origin validation.
graph LR
Client[Client App] -->|WebSocket| OriginCheck[Origin Check]
OriginCheck -->|Allowed| Accept[Accept Connection]
OriginCheck -->|Blocked| Reject[403 Forbidden]Origin Validation:
The server validates the Origin header on incoming WebSocket requests. Connections from disallowed origins receive a 403 Forbidden response before any data exchange occurs.
if !is_allowed_origin(origin.as_deref()) {
return Err(reject); // Status: FORBIDDEN
}
Sources: cli/src/native/stream/websocket.rs:15-30
Request/Response Flow
All commands follow a request-response pattern:
- Client sends JSON command via WebSocket
- Server validates origin
- Dispatcher routes to appropriate handler
- Handler executes CDP operation
- Result returned as JSON response
State Management
Session State
The daemon maintains persistent state for each browser session:
| State Component | Description |
|---|---|
| Tabs | Active tab list and current tab reference |
| Frame | Current frame hierarchy |
| Viewport | Window dimensions |
| Recording | Video recording status |
Sources: cli/src/native/stream/websocket.rs:5-15
Snapshot System
The snapshot system provides accessibility-tree based page representation with stable element references (@e1, @e2, etc.) for reliable element selection across page mutations.
Best Practice: Always snapshot before interacting with elements, as refs change after navigation or dynamic content changes.
Sources: skill-data/core/references/snapshot-refs.md
React Inspection System
For React-based applications, the daemon provides specialized inspection capabilities:
Blocker Detection
The system identifies React Suspense boundaries and classifies them by impact:
| Blocker Kind | Weight | Actionability |
|---|---|---|
| ClientHook | 7 | 90 |
| RequestApi | 6 | 88 |
| ServerFetch | 5 | 82 |
| Cache | 4 | 74 |
| Stream | 3 | 60 |
| Unknown | 2 | 35 |
| Framework | 1 | 18 |
Boundary Classification
| Boundary Kind | Description |
|---|---|
| RouteSegment | Next.js/App Router segment boundary |
| ExplicitSuspense | User-declared <Suspense> component |
| Component | Implicit boundary from component structure |
Sources: cli/src/native/react/suspense.rs:30-60
CLI Architecture
The CLI provides both interactive and scripted access to browser automation:
Command Structure
agent-browser <command> [args]
Primary Command Groups:
| Group | Purpose |
|---|---|
agent-browser open | Navigate to URL |
agent-browser <action> | Execute automation action |
agent-browser set | Configure browser settings |
agent-browser network | Manage network interception |
agent-browser state | Save/load/restore sessions |
agent-browser tab | Manage browser tabs |
agent-browser screenshot | Capture page images |
agent-browser install | Download Chrome |
Sources: cli/src/output.rs
Dashboard Architecture
The web-based dashboard provides visual monitoring and control:
graph TD
Dashboard[Dashboard App] -->|API| Daemon
Dashboard -->|Display| Results[screenshots/snapshots]
Dashboard -->|Controls| Form[Control Form]Dashboard Features:
- Resizable split view (controls + results)
- Responsive layout for mobile/desktop
- Real-time screenshot display with base64 encoding
- Snapshot viewer with step history
- Step-by-step playback of automation sequences
Sources: packages/dashboard/src/components/extensions-panel.tsx
Installation and Dependencies
Chrome Installation
The install command downloads Chrome directly from Chrome for Testing:
agent-browser install
This ensures the Chrome binary is available for CDP communication without requiring system-wide Chrome installation.
Testing Architecture
Unit Tests
Fast tests (~320) that verify individual components without Chrome dependency:
cd cli && cargo test
End-to-End Tests
Integration tests that launch real headless Chrome:
cd cli && cargo test e2e -- --ignored --test-threads=1
Requirements:
- Chrome must be installed
- Tests run serially to avoid browser instance contention
Security Considerations
| Aspect | Implementation |
|---|---|
| Origin Validation | WebSocket connections validated before acceptance |
| Session Isolation | Each session maintains separate state |
| Credential Storage | Authentication vault for secure credential handling |
Summary
agent-browser implements a clean three-tier architecture:
- Client Layer - CLI and dashboard provide user interfaces
- Daemon Layer - Rust-based server handles command dispatch and state
- CDP Layer - Browser-agnostic protocol client enables Chrome/Lightpanda support
The design prioritizes reliability (stable element refs), observability (snapshots, screenshots, video recording), and extensibility (skill-based system for specialized automation tasks).
Sources: cli/src/native/mod.rs
Daemon and CDP Protocol
Related topics: Architecture Overview, Browser Engine Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Architecture Overview, Browser Engine Integration
Daemon and CDP Protocol
Overview
The agent-browser project implements a native Rust-based browser automation daemon that communicates with Chrome/Chromium browsers via the Chrome DevTools Protocol (CDP). The architecture separates the automation logic from browser control through WebSocket-based CDP connections, enabling AI agents to interact with web pages through a CLI interface.
Architecture Layer Diagram:
graph TD
A[CLI Interface] --> B[Action Dispatcher]
B --> C[CDP Client]
C --> D[WebSocket Stream]
D --> E[CDP Loop Handler]
E --> F[Chrome Browser Instance]
G[CDP Protocol Files] --> F
H[Generated CDP Types] --> CDaemon Architecture
Native Daemon Components
The daemon lives in cli/src/native/ and handles all browser automation tasks. The main components include:
| Component | Location | Purpose |
|---|---|---|
| Daemon | cli/src/native/daemon/ | Process management and state coordination |
| Actions | cli/src/native/actions.rs | Command handlers for browser operations |
| Browser | cli/src/native/browser/ | Browser instance lifecycle |
| CDP Client | cli/src/native/cdp/client.rs | Protocol communication |
| CDP Loop | cli/src/native/stream/cdp_loop.rs | Message processing loop |
Sources: cli/src/native/actions.rs
Action Dispatch
The action handler maps command names to their implementation functions. Supported actions include:
let result = match action {
"launch" => handle_launch(cmd, state).await,
"navigate" => handle_navigate(cmd, state).await,
"url" => handle_url(state).await,
"cdp_url" => handle_cdp_url(state),
"inspect" => handle_inspect(state).await,
"title" => handle_title(state).await,
"content" => handle_content(state).await,
"evaluate" => handle_evaluate(cmd, state).await,
"close" => handle_close(state).await,
"snapshot" => handle_snapshot(cmd, state).await,
"screenshot" => handle_screenshot(cmd, state).await,
"click" => handle_click(cmd, state).await,
"dblclick" => handle_dblclick(cmd, state).await,
"fill" => handle_fill(cmd, state).await,
"type" => handle_type(cmd, state).await,
"press" => handle_press(cmd, state).await,
"hover" => handle_hover(cmd, state).await,
"scroll" => handle_scroll(cmd, state).await,
// ... additional actions
};
Sources: cli/src/native/actions.rs:50-75
Browser Engine Selection
The --engine flag selects between Chrome and Lightpanda browsers. Chrome is downloaded from Chrome for Testing via the install command.
CDP Protocol Implementation
Protocol Files
The CDP protocol definitions are stored in JSON format:
| File | Description |
|---|---|
browser_protocol.json | Core browser domains (Page, Network, Runtime, etc.) |
js_protocol.json | JavaScript debugging domains |
Sources: cli/cdp-protocol/browser_protocol.json
Auto-Generated Types
CDP types are auto-generated from protocol JSON files:
/// Auto-generated CDP types from protocol JSON files in `cdp-protocol/`.
///
/// To populate: download `browser_protocol.json` and `js_protocol.json` from
/// <https://github.com/nicolo-ribaudo/nicolo-ribaudo.github.io/> (or any
/// Chromium source) into `cli/cdp-protocol/` and rebuild.
#[allow(clippy::upper_case_acronyms)]
pub mod generated {
include!(concat!(env!("OUT_DIR"), "/cdp_generated.rs"));
}
Sources: cli/src/native/cdp/types.rs
CDP Client Structure
The CDP client manages communication with the browser:
graph LR
A[Command] --> B[CDP Client]
B --> C[WebSocket Writer]
C --> D[Browser CDP Endpoint]
E[Browser Events] --> F[WebSocket Reader]
F --> G[Event Handler]
G --> H[State Updates]WebSocket Communication
Stream Module Architecture
The WebSocket communication is handled by the stream module located in cli/src/native/stream/:
| Module | File | Purpose |
|---|---|---|
| Stream Core | cli/src/native/stream/mod.rs | Stream trait definitions and utilities |
| WebSocket | cli/src/native/stream/websocket.rs | WebSocket connection handling |
| CDP Loop | cli/src/native/stream/cdp_loop.rs | CDP message processing loop |
WebSocket Connection
The WebSocket module establishes and maintains connections to the Chrome DevTools endpoint:
sequenceDiagram
participant CLI as CLI Command
participant Client as CDP Client
participant WS as WebSocket
participant Chrome as Chrome Browser
CLI->>Client: connect(url)
Client->>WS: establish_connection()
WS->>Chrome: WebSocket Handshake
Chrome-->>WS: 101 Switching Protocols
WS-->>Client: Connected
loop Message Exchange
CLI->>Client: send_command()
Client->>WS: write_message()
WS->>Chrome: CDP JSON Message
Chrome-->>WS: CDP Response/Event
WS-->>Client: read_message()
Client-->>CLI: Result
endCDP Loop Handler
The CDP loop processes incoming messages and manages the event queue:
- Handles CDP events from the browser
- Routes responses to pending command callbacks
- Manages connection state and reconnection logic
Sources: cli/src/native/stream/cdp_loop.rs
Browser Connection
Connection Methods
The daemon supports multiple connection methods:
| Method | Command | Use Case |
|---|---|---|
| Launch new browser | agent-browser open | Fresh browser instance |
| Connect to existing | agent-browser connect 9222 | Attach to running browser |
# Launch with navigation
agent-browser open <url>
# Connect to running browser on specific port
agent-browser connect 9222
# Launch without navigation (clean slate)
agent-browser open
CDP WebSocket URL
The CDP WebSocket URL can be retrieved programmatically:
agent-browser cdp_url
This returns the WebSocket debugger URL for programmatic browser attachment.
Browser Version Info
The connection retrieves browser metadata:
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct BrowserVersionInfo {
#[serde(rename = "webSocketDebuggerUrl")]
pub web_socket_debugger_url: Option<String>,
#[serde(rename = "Browser")]
pub browser: Option<String>,
}
Sources: cli/src/native/cdp/types.rs
CDP Protocol Domains
Supported Domains
The agent-browser supports CDP domains for:
| Domain | Purpose | Key Commands |
|---|---|---|
| Page | Page navigation and loading | navigate, reload, back, forward |
| Runtime | JavaScript execution | evaluate, callFunctionOn |
| DOM | DOM manipulation | getDocument, describeNode |
| Input | User input simulation | dispatchEvent, insertText |
| Network | Network request interception | setRequestInterception, getResponseBody |
| Target | Browser target management | createTarget, attachToTarget |
Browser Automation Actions
The following high-level actions are available via CDP:
# Navigation
agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
# DOM Interaction
agent-browser click @e1
agent-browser fill @e2 "text"
agent-browser type @e3 "input"
agent-browser hover @e4
agent-browser scroll down 500
# State Queries
agent-browser snapshot
agent-browser screenshot
agent-browser get text @e1
agent-browser get attr @e1 href
# JavaScript
agent-browser evaluate "document.title"
Error Handling
WebDriver Fallback
The daemon gracefully handles unsupported actions when using WebDriver backend:
Err(anyhow::anyhow!(
"Action '{}' is not supported on the WebDriver backend",
action
))
CDP Error Propagation
CDP errors are propagated through the action chain, enabling detailed error messages for debugging failed browser operations.
Performance Considerations
Session Management
- Each browser session maintains a persistent CDP connection
- Sessions can be named and persisted for multi-session workflows
- State persistence allows resuming automation tasks
Network Idle Detection
The daemon supports waiting for network idle states:
agent-browser wait --load networkidle
This is essential for SPAs and applications with dynamic content loading.
Security Model
Credential Management
The daemon provides a secure credential vault for browser authentication:
agent-browser set credentials <user> <pass>
Cookie Management
Cookies can be set from various formats:
agent-browser cookies set --curl <file> [--domain <host>]
Auto-detects JSON, cURL, and Cookie-header file formats.
Extension Points
Custom CDP Scripts
Execute arbitrary JavaScript in the browser context:
agent-browser addscript <script>
agent-browser addinitscript <script>
Custom Styles
Inject CSS for visual testing:
agent-browser addstyle <css>
Summary
The Daemon and CDP Protocol architecture enables agent-browser to provide a performant, Rust-native browser automation solution. By implementing direct CDP communication over WebSockets, the project avoids dependencies on Node.js wrappers like Playwright or Puppeteer while maintaining full compatibility with Chrome's DevTools Protocol capabilities.
The separation of concerns between the action dispatcher, CDP client, and WebSocket stream layers ensures maintainability and enables future extensions for additional browser engines and protocol features.
Sources: cli/src/native/actions.rs
Interaction Commands
Related topics: Navigation Commands, State Inspection Commands, Element References System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Navigation Commands, State Inspection Commands, Element References System
Interaction Commands
Interaction Commands are the core primitives that enable AI agents to programmatically control and manipulate web pages in the agent-browser system. These commands provide atomic operations for clicking elements, entering text, scrolling, and capturing page state through an accessibility-tree based reference system.
Architecture Overview
The interaction system follows a command dispatch pattern where incoming commands are routed to appropriate handlers based on their operation type. The architecture separates concerns between command parsing, execution, and output formatting.
graph TD
A[User/Agent Input] --> B[Command Parser]
B --> C[actions.rs Dispatcher]
C --> D[interaction.rs Handlers]
D --> E[CDP Protocol Layer]
E --> F[Browser Engine]
F --> G[Page Response]
G --> H[output.rs Formatter]
H --> I[Terminal/Agent]
C -.->|click, fill, type, scroll| D
C -.->|mouse, keyboard| D
C -.->|snapshot, screenshot| DComponent Responsibilities
| Component | File | Purpose |
|---|---|---|
| Command Dispatcher | actions.rs | Routes commands to handlers |
| Interaction Handlers | interaction.rs | Executes atomic browser operations |
| Output Formatter | output.rs | Formats and presents results |
| CDP Layer | Native | Chrome DevTools Protocol communication |
Element Reference System
Interaction commands use an element reference system (@e1, @e2, etc.) to identify targets on the page. These references are obtained through snapshot operations and represent unique identifiers in the accessibility tree.
graph LR
A[Page HTML] --> B[Accessibility Tree]
B --> C[Snapshot Command]
C --> D[@e1 button "Submit"]
C --> E[@e2 input "Email"]
D --> F[Click @e1]
E --> G[Fill @e2 "text"]Reference Format:
@e1 [tag type="value"] "text content" placeholder="hint"
│ │ │ │ │
│ │ │ │ └─ Additional attributes
│ │ │ └─ Visible text
│ │ └─ Key attributes shown
│ └─ HTML tag name
└─ Unique ref ID
Sources: skill-data/core/references/snapshot-refs.md:1-50
Core Interaction Commands
Element Selection Commands
| Command | Description | Parameters |
|---|---|---|
find | Find elements by locator | <locator> <value> [action] [text] |
count | Count matching elements | <selector> |
is | Check element state | <what> <selector> |
Locators supported: role, text, label, placeholder, alt, title, testid, first, last, nth
Sources: cli/src/output.rs:1-20
Mouse Commands
graph TD
A[mouse] --> B[move <x> <y>]
A --> C[down <btn>]
A --> D[up <btn>]
A --> E[wheel <dy> <dx>]
B --> F[Dispatch mousemove event]
C --> G[Dispatch mousedown event]
D --> H[Dispatch mouseup event]
E --> I[Dispatch wheel event]| Command | Description |
|---|---|
mouse move <x> <y> | Move cursor to coordinates |
mouse down [btn] | Press mouse button (default: left) |
mouse up [btn] | Release mouse button |
mouse wheel <dy> [dx] | Scroll wheel (delta Y/X) |
Sources: cli/src/native/actions.rs:1-30
Keyboard Commands
| Command | Description | Example |
|---|---|---|
type | Type text (with key events) | type @e1 "hello" |
press | Press special key | press Enter |
setvalue | Set input value directly | setvalue @e1 "value" |
Special Keys: Enter, Tab, Escape, Backspace, ArrowUp, ArrowDown, ArrowLeft, ArrowRight, F1-F12, Control, Alt, Shift
Sources: cli/src/native/actions.rs:1-30
Scroll Commands
| Command | Description |
|---|---|
scroll down <px> | Scroll down by pixels |
scroll up <px> | Scroll up by pixels |
scroll left <px> | Scroll left by pixels |
scroll right <px> | Scroll right by pixels |
Sources: skill-data/core/SKILL.md:1-50
State Inspection Commands
graph TD
A[get command] --> B{Property Type}
B -->|attr| C[Get attribute value]
B -->|value| D[Get input value]
B -->|text| E[Get visible text]
B -->|html| F[Get innerHTML]
B -->|title| G[Get page title]
B -->|url| H[Get current URL]
B -->|box| I[Get bounding box]
B -->|styles| J[Get computed styles]| Command | Description |
|---|---|
get text <ref> | Get visible text of element |
get value <ref> | Get input field value |
get attr <ref> <name> | Get specific attribute |
get html <ref> | Get innerHTML |
get title | Get page title |
get url | Get current URL |
get box <ref> | Get bounding box coordinates |
get styles <ref> | Get computed CSS styles |
get cdp-url | Get CDP debugging URL |
Sources: cli/src/output.rs:1-20
Click Variations
The click command supports several modifiers for different interaction patterns:
| Command | Description |
|---|---|
click <ref> | Standard left-click |
click <ref> --new-tab | Click and open in new tab |
click <ref> --double | Double-click |
click <ref> --right | Right-click (context menu) |
tap <ref> | Mobile-style tap (touch events) |
Sources: skill-data/core/SKILL.md:1-50
Form Input Commands
Text Input
graph LR
A[Input Commands] --> B[type]
A --> C[fill]
A --> D[setvalue]
B --> E[Triggers keydown/keyup]
C --> F[Direct value set]
D --> G[Direct value assignment]| Command | Description | Behavior |
|---|---|---|
fill <ref> <text> | Fill input field | Replaces existing value, triggers input events |
type <ref> <text> | Type text character by character | Triggers full key event sequence |
setvalue <ref> <value> | Set value directly | Bypasses sanitization |
Sources: cli/src/native/actions.rs:1-30
Other Input Types
| Command | Target | Description |
|---|---|---|
check <ref> | Checkbox | Check a checkbox |
uncheck <ref> | Checkbox | Uncheck a checkbox |
select <ref> <value> | Select | Select option by value |
upload <ref> <path> | File input | Upload file |
Sources: cli/src/native/actions.rs:1-30
Wait and Timing
Wait commands control execution timing for dynamic content:
| Command | Description |
|---|---|
wait <ms> | Wait for milliseconds |
wait --load | Wait for page load event |
wait networkidle | Wait for network to be idle |
wait --load networkidle | Combined load + network idle |
Sources: skill-data/core/SKILL.md:1-50
Command Chaining with Batches
Multiple commands can be executed in a single batch operation for efficiency:
graph TD
A[Batch Command] --> B[Parse JSON Array]
B --> C[Execute Sequentially]
C --> D[Command 1]
D --> E[Command 2]
E --> F[Command N]
F --> G[Return Combined Results]Example batch command:
agent-browser batch \
'["open"]' \
'["network","route","*","--abort","--resource-type","script"]' \
'["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
'["navigate","http://localhost:3000/target"]'
Sources: skill-data/core/references/commands.md:1-30
State Management
Browser State Commands
| Command | Description |
|---|---|
is <state> <ref> | Check if element is visible, enabled, checked |
is open | Check if browser is open |
is closed | Check if browser is closed |
Visibility and Enabled States
graph TD
A[Check State] --> B{Element Type}
B -->|Button/Input| C[Check: enabled]
B -->|Checkbox| D[Check: checked]
B -->|Any| E[Check: visible]
C --> F[Return boolean]
D --> F
E --> FSources: cli/src/output.rs:1-20
Advanced Interactions
React-Specific Commands
For React applications, specialized inspection commands are available:
| Command | Description |
|---|---|
react_tree | Get component tree |
react_inspect <ref> | Inspect React component |
react_renders_start | Start render tracking |
react_renders_stop | Stop render tracking |
Sources: cli/src/native/actions.rs:1-30
Dialog Handling
graph TD
A[Dialog Appears] --> B{dialog type}
B -->|alert| C[handle_alert]
B -->|confirm| D[handle_confirm]
B -->|prompt| E[handle_prompt]
C --> F[dialog accept --message "text"]
D --> F
E --> G[dialog accept "input"]
G --> F| Command | Description |
|---|---|
dialog accept [message] | Accept dialog with optional message |
dialog dismiss | Cancel/dismiss dialog |
Sources: cli/src/native/actions.rs:1-30
Common Workflow Patterns
Basic Navigation and Interaction
# 1. Open page
agent-browser open https://example.com
# 2. Take snapshot to get refs
agent-browser snapshot -i
# 3. Interact with elements
agent-browser click @e1
agent-browser fill @e2 "[email protected]"
agent-browser press Enter
# 4. Wait for response
agent-browser wait 1000
Form Submission Flow
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e_email "[email protected]"
agent-browser fill @e_password "secretpassword"
agent-browser click @e_submit
agent-browser wait --load networkidle
agent-browser screenshot result.png
Error Handling Pattern
# Check if operation succeeded
agent-browser is visible @e_success_message
# If failed, inspect state
agent-browser snapshot -i
agent-browser get text @e_error_message
Command Reference Summary
Interaction Operations Matrix
| Category | Commands |
|---|---|
| Mouse | click, mouse move/down/up/wheel, dblclick |
| Keyboard | type, press, setvalue |
| Scroll | scroll up/down/left/right |
| Forms | fill, check, uncheck, select, upload |
| Inspect | get text/value/attr/html/title/url/box/styles |
| State | find, count, is |
| Timing | wait |
Sources: cli/src/native/actions.rs:1-30 Sources: cli/src/output.rs:1-20 Sources: skill-data/core/SKILL.md:1-50
Best Practices
- Always snapshot before interacting - Element refs are obtained from snapshots and must be fetched after page load or navigation
- Re-snapshot after navigation - New pages have new accessibility trees with different refs
- Use appropriate wait conditions - Wait for
networkidlewhen content loads dynamically - Prefer
fillovertype-fillis faster and more reliable for automated workflows - Use
typefor form validation - When you need key events to trigger validation logic
State Inspection Commands
Related topics: Interaction Commands, Element References System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Interaction Commands, Element References System
State Inspection Commands
State Inspection Commands in agent-browser provide mechanisms to examine, retrieve, and manage browser state including cookies, web storage, session data, console errors, and DOM element properties. These commands enable debugging, state verification, and persistence of browser sessions across operations.
Architecture Overview
State inspection in agent-browser operates through a layered architecture where the CLI command layer parses user input, the actions layer dispatches to appropriate handlers, and the browser backend (CDP/WebDriver) executes the actual state retrieval.
graph TD
A[CLI Input] --> B[commands.rs Parser]
B --> C[actions.rs Dispatcher]
C --> D[State Handlers]
C --> E[Storage Handlers]
C --> F[Element Handlers]
D --> G[Browser Backend<br/>Chrome CDP / WebDriver]
E --> G
F --> G
G --> H[State Output]
D -. includes .-> D1[cookies_get/set/clear]
D -. includes .-> D2[state_save/load/list/clean]
E -. includes .-> E1[storage_get/set/clear]
F -. includes .-> F1[gettext/getattr/isvisible]Sources: cli/src/native/actions.rs:1-150
Command Categories
State inspection commands are organized into five primary categories:
| Category | Purpose | Commands |
|---|---|---|
| Cookie Inspection | Manage HTTP cookies | cookies_get, cookies_set, cookies_clear |
| Web Storage | Inspect localStorage/sessionStorage | storage_get, storage_set, storage_clear |
| Session State | Save/load browser sessions | state_save, state_load, state_list, state_clean |
| Element Properties | Query DOM element attributes | gettext, getattribute, inputvalue, isvisible, isenabled, ischecked |
| Error Inspection | Retrieve console errors | errors |
Sources: cli/src/native/actions.rs:80-100
Cookie Inspection
Cookies can be inspected and managed through the cookies command family.
Get Cookies
Retrieves all cookies for the current domain:
agent-browser cookies get
Set Cookie
Sets a cookie with explicit parameters:
agent-browser cookies set --url <url> --name <name> --value <value> [--domain <domain>] [--path <path>] [--httpOnly] [--secure] [--sameSite <strict|lax|none>] [--expires <timestamp>]
Set Cookie from File
Auto-detects and imports cookies from JSON, cURL, or Cookie-header format:
agent-browser cookies set --curl <file> [--domain <host>]
Clear Cookies
Removes all cookies:
agent-browser cookies clear
Sources: cli/src/output.rs:1-50
Web Storage Inspection
Web storage commands manage the browser's localStorage and sessionStorage.
Storage Commands
| Command | Description |
|---|---|
storage_get | Retrieve value from localStorage or sessionStorage |
storage_set | Set a key-value pair in storage |
storage_clear | Clear all items from selected storage |
# Get storage value
agent-browser storage_get <local|session> <key>
# Set storage value
agent-browser storage_set <local|session> <key> <value>
# Clear storage
agent-browser storage_clear <local|session>
Sources: cli/src/native/actions.rs:85-90
Session State Management
The agent-browser maintains persistent state in ~/.agent-browser (or <tempdir>/agent-browser when home directory cannot be resolved).
State Directory Structure
graph LR
A[~/.agent-browser] --> B[sessions/]
A --> C[auth/]
A --> D[encryption.key]
B --> E[<session-id>/]
E --> F[state.json]
E --> G[screenshots/]Sources: cli/src/native/state.rs:80-95
State Commands
| Command | Description |
|---|---|
state_save | Save current browser state to disk |
state_load | Restore browser state from saved file |
state_list | List all saved states |
state_clean | Remove states older than specified days |
state_rename | Rename an existing state |
# Save current state
agent-browser state_save <path> [--name <name>]
# Load saved state
agent-browser state_load <path>
# List all states
agent-browser state_list
# Clean old states (default: 30 days)
agent-browser state_clean [--days <n>]
# Rename a state
agent-browser state_rename --path <path> --name <new_name>
State Directory Resolution
pub fn get_state_dir() -> PathBuf {
if let Some(home) = dirs::home_dir() {
home.join(".agent-browser")
} else {
std::env::temp_dir().join("agent-browser")
}
}
pub fn get_sessions_dir() -> PathBuf {
get_state_dir().join("sessions")
}
Sources: cli/src/native/state.rs:80-90
Element Property Inspection
Element inspection commands retrieve properties and states of DOM elements using element references obtained from snapshots.
Get Text Content
Retrieves the visible text of an element:
agent-browser gettext @e1
Get HTML Content
Retrieves element innerHTML or innerText:
agent-browser innerhtml @e1
agent-browser innertext @e1
Get Attributes
Retrieves any attribute value from an element:
agent-browser getattribute @e1 href
agent-browser getattribute @e1 src
Get Input Value
Retrieves the current value of input elements:
agent-browser inputvalue @e1
Check Element State
Verify element state properties:
agent-browser isvisible @e1
agent-browser isenabled @e1
agent-browser ischecked @e1
Count Matching Elements
Count elements matching a selector:
agent-browser count ".item-class"
Get Bounding Box
Retrieve element dimensions and position:
agent-browser boundingbox @e1
Get Styles
Retrieve computed CSS styles:
agent-browser styles @e1
Sources: cli/src/native/actions.rs:30-60
Find Elements
The find command locates DOM elements using various locator strategies.
Supported Locators
| Locator | Description | Example |
|---|---|---|
role | Find by ARIA role | find role button --exact |
text | Find by text content | find text "Submit" |
label | Find form label | find label "Email" |
placeholder | Find by placeholder | find placeholder "Search..." |
alt | Find by alt attribute | find alt "profile" |
title | Find by title attribute | find title "Close" |
testid | Find by test ID | find testid submit-btn |
first | First element matching selector | find first ".item" |
last | Last element matching selector | find last ".item" |
Find Command Syntax
agent-browser find <locator> <value> [action] [--exact] [--name <name>]
Examples
# Find button by role and click
agent-browser find role button --exact click
# Find input by placeholder
agent-browser find placeholder "email" fill "[email protected]"
# Find link by text
agent-browser find text "Learn more"
Sources: cli/src/commands.rs:150-200
Console Error Inspection
Retrieve JavaScript errors logged to the browser console.
Get Errors
agent-browser errors
Returns a list of all console errors captured during the session.
Console Monitoring
Enable or disable console message capture:
agent-browser console enable
agent-browser console disable
Snapshot-Based Inspection
Snapshots provide a hierarchical view of the page DOM with element references.
Snapshot Modes
| Flag | Description |
|---|---|
-i | Interactive elements only (preferred) |
-u | Include href URLs on links |
-c | Compact mode (no empty structural nodes) |
-d <n> | Cap depth at n levels |
-s <selector> | Scope to CSS selector |
--json | Machine-readable JSON output |
Snapshot Output Format
Page: Example - Log in
URL: https://example.com/login
@e1 [heading] "Log in"
@e2 [form]
@e3 [input type="email"] placeholder="Email"
@e4 [input type="password"] placeholder="Password"
@e5 [button type="submit"] "Continue"
@e6 [link] "Forgot password?"
Snapshot Workflow
graph TD
A[Open Page] --> B[Snapshot -i]
B --> C[Parse Element Refs]
C --> D[Click @e3]
D --> E[Snapshot -i]
E --> F[Find Input Fields]
F --> G[Fill @e3 "email"]
G --> H[Fill @e4 "password"]
H --> I[Click @e5]Sources: skill-data/core/SKILL.md:1-80
Complete Command Reference
State Inspection Summary
| Command | Category | Description |
|---|---|---|
cookies get | Cookie | List all cookies |
cookies set --name X --value Y | Cookie | Set a cookie |
cookies clear | Cookie | Clear all cookies |
storage_get <type> <key> | Storage | Get storage value |
storage_set <type> <key> <val> | Storage | Set storage value |
storage_clear <type> | Storage | Clear storage |
state_save <path> | Session | Save browser state |
state_load <path> | Session | Load browser state |
state_list | Session | List saved states |
state_clean [days] | Session | Clean old states |
errors | Console | Get console errors |
gettext @eN | Element | Get element text |
getattribute @eN <attr> | Element | Get attribute |
isvisible @eN | Element | Check visibility |
count <selector> | Element | Count elements |
Sources: cli/src/native/actions.rs:70-100
Usage Patterns
Inspecting Page State
# Full page inspection workflow
agent-browser open https://example.com
agent-browser snapshot -i # Get element refs
agent-browser get title # Page title
agent-browser get url # Current URL
agent-browser errors # Check for console errors
Verifying Element State
agent-browser click @e1 # Click element
agent-browser wait 500 # Wait for response
agent-browser isvisible @e2 # Verify visibility
agent-browser gettext @e3 # Get text content
Persisting Session State
agent-browser open https://app.example.com
agent-browser cookies set --name session --value abc123
agent-browser storage_set local user "john"
agent-browser state_save ./my-session # Persist state
# Later...
agent-browser state_load ./my-session # Restore state
Summary
State Inspection Commands in agent-browser provide comprehensive capabilities for examining and managing browser state:
- Cookie Management: Full CRUD operations on HTTP cookies with file import support
- Web Storage: Access to localStorage and sessionStorage
- Session Persistence: Save, load, list, and clean browser sessions
- Element Inspection: Query text, attributes, states, and styles
- Element Location: Find elements by role, text, label, placeholder, and other attributes
- Console Monitoring: Capture and retrieve JavaScript errors
These commands work together with the snapshot system to enable precise browser automation workflows with full state observability.
Sources: cli/src/native/actions.rs:1-150
Browser Engine Integration
Related topics: Daemon and CDP Protocol, Installation Guide
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Daemon and CDP Protocol, Installation Guide
The provided context does not contain the source files specified in the query (lightpanda.rs, discovery.rs, webdriver/mod.rs, safari.rs, ios.rs). Without access to the actual browser engine integration source files, I cannot generate an accurate technical wiki page that properly cites the specified files.
However, based on the available context from this repository, I can provide a partial analysis of browser engine-related components:
Browser Engine Integration
Note: This page requires the actual source files for browser engine implementations (lightpanda.rs,discovery.rs,webdriver/mod.rs,safari.rs,ios.rs) which are not available in the current context. The following represents partial analysis based on indirect evidence.
Architecture Overview
Based on the available context, agent-browser uses a Chrome DevTools Protocol (CDP) based approach for browser automation:
┌─────────────────┐ CDP/WebSocket ┌─────────────────┐
│ agent-browser │ ──────────────────────▶│ Chrome/Chromium│
│ CLI │ │ Browser │
└─────────────────┘ └─────────────────┘
│
├── Session Management
├── Element Reference System (@e1, @e2, ...)
└── Command Dispatch
Supported Browser Contexts
| Context Type | Implementation | Protocol |
|---|---|---|
| Chrome/Chromium | CDP Native | WebSocket |
| Electron | CDP Native | WebSocket |
| Remote Debugging | --remote-debugging-port | CDP |
| Safari (iOS) | WebDriver | W3C WebDriver |
Session Management
Sessions are managed through port-based connections:
// From session-tree.tsx
interface Session {
port: number;
session: string;
provider?: string;
pending?: boolean;
}
Sessions can be connected via:
agent-browser connect 9222
Command Dispatch Architecture
The CLI uses a dispatch pattern for handling browser commands:
// From cli/src/native/actions.rs (partial)
match subcmd.as_str() {
"click" => handle_click(cmd, state).await,
"fill" => handle_fill(cmd, state).await,
"snapshot" => handle_snapshot(cmd, state).await,
"screenshot" => handle_screenshot(cmd, state).await,
"get" => handle_get(cmd, state).await,
// ... additional commands
}
Browser Engine Providers
Based on the codebase structure, agent-browser supports multiple browser engine providers:
| Provider | File Reference | Purpose |
|---|---|---|
| Lightpanda | lightpanda.rs | Lightweight browser engine |
| Safari | safari.rs | macOS/iOS Safari via WebDriver |
| iOS | ios.rs | iOS WebKit via WebDriver |
| Chrome CDP | discovery.rs | Auto-discovery of Chrome instances |
CDP Discovery Mechanism
The discovery.rs module handles automatic detection of browser instances:
- Scans for Chrome/Chromium processes
- Identifies remote debugging ports
- Matches browser version compatibility
- Establishes WebSocket connections
WebDriver Integration
For non-Chrome browsers, WebDriver protocols are used:
# Safari WebDriver
agent-browser set driver safari
# iOS WebDriver
agent-browser set driver ios
Session State Management
| State | Description |
|---|---|
| Active | Currently connected and responsive |
| Pending | Connection in progress |
| Closed | Session terminated |
Command Reference for Engine Interaction
# Connect to specific port
agent-browser connect <port>
# Session operations
agent-browser session new
agent-browser session list
agent-browser session close
# Engine-specific settings
agent-browser set viewport <width> <height>
agent-browser set device <device-name>
agent-browser set geo <lat> <lng>
agent-browser set offline [on|off]
Limitations
This page cannot provide complete documentation for browser engine integration without access to:
cli/src/native/cdp/lightpanda.rscli/src/native/cdp/discovery.rscli/src/native/webdriver/mod.rscli/src/native/webdriver/safari.rscli/src/native/webdriver/ios.rs
These files are required for accurate implementation details about:
- CDP command serialization/deserialization
- WebDriver protocol mapping
- Browser-specific quirks handling
- Session lifecycle management
Source: https://github.com/vercel-labs/agent-browser / Human Manual
Authentication and Session Persistence
This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credenti...
This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credenti...
This page documents the authentication workflows and session persistence mechanisms in agent-browser, covering how to handle login flows, save/restore authenticated states, manage credentials securely, and persist browser sessions across runs.
Overview
agent-browser provides multiple layers of authentication and session persistence:
- Credential Management — Store and retrieve login credentials via an encrypted auth vault
- State Persistence — Save and restore full browser state (cookies, localStorage, sessionStorage)
- Session Management — Auto-save/restore named sessions without manual file handling
- Profile Persistence — Use Chrome user data directories for full browser profile persistence
These mechanisms layer on top of the core CDP (Chrome DevTools Protocol) browser automation, using the underlying Playwright-managed browser infrastructure to serialize and deserialize authentication artifacts.
Sources: cli/src/native/actions.rs:action_dispatch (dispatch table)
Sources: cli/src/native/actions.rs:action_dispatch (dispatch table)
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
The project should not be treated as fully validated until this signal is reviewed.
Doramagic Pitfall Log
Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.
1. Installation risk: Chrome 147.0 crashes with "trap int3" when running in docker
- Severity: high
- Finding: Installation risk is backed by a source signal: Chrome 147.0 crashes with "trap int3" when running in docker. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1339
2. Installation risk: Detected: Trojan:Win32/Posilod.EB!cl
- Severity: high
- Finding: Installation risk is backed by a source signal: Detected: Trojan:Win32/Posilod.EB!cl. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1281
3. Configuration risk: snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId
- Severity: high
- Finding: Configuration risk is backed by a source signal: snapshot -s <selector> produces duplicate elements when AX tree contains virtual nodes without backendDOMNodeId. Treat it as a review item until the current version is checked.
- User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1338
4. Project risk: Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse
- Severity: high
- Finding: Project risk is backed by a source signal: Feature Request: Chrome Extension-based Connection for Seamless Login State Reuse. Treat it as a review item until the current version is checked.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1319
5. Security or permission risk: Developers should check this security_permissions risk before relying on the project: Dashboard privileged POST routes should reject cross-origin requests
- Severity: high
- Finding: Developers should check this security_permissions risk before relying on the project: Dashboard privileged POST routes should reject cross-origin requests
- User impact: Developers may expose sensitive permissions or credentials: Dashboard privileged POST routes should reject cross-origin requests
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Dashboard privileged POST routes should reject cross-origin requests. Context: Source discussion did not expose a precise runtime context.
- Evidence: failure_mode_cluster:github_issue | fmev_bc39fa851aecda51d6ae79863b570093 | https://github.com/vercel-labs/agent-browser/issues/1345 | Dashboard privileged POST routes should reject cross-origin requests
6. Security or permission risk: Developers should check this security_permissions risk before relying on the project: `--auto-connect` fails too quickly when Chrome asks for remote debugging permission
- Severity: high
- Finding: Developers should check this security_permissions risk before relying on the project:
--auto-connectfails too quickly when Chrome asks for remote debugging permission - User impact: Developers may expose sensitive permissions or credentials:
--auto-connectfails too quickly when Chrome asks for remote debugging permission - Recommended check: Before packaging this project, run the relevant install/config/quickstart check for:
--auto-connectfails too quickly when Chrome asks for remote debugging permission. Context: Source discussion did not expose a precise runtime context. - Evidence: failure_mode_cluster:github_issue | fmev_50f6336937705c962c78ed48a466eb98 | https://github.com/vercel-labs/agent-browser/issues/1365 |
--auto-connectfails too quickly when Chrome asks for remote debugging permission
7. Security or permission risk: Support XDG Base Directory paths for agent-browser state, config, and installs
- Severity: high
- Finding: Security or permission risk is backed by a source signal: Support XDG Base Directory paths for agent-browser state, config, and installs. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/vercel-labs/agent-browser/issues/1361
8. Installation risk: Developers should check this installation risk before relying on the project: After failed close, subsequent open reports success but returns stale content from prior URL
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: After failed close, subsequent open reports success but returns stale content from prior URL
- User impact: Developers may fail before the first successful local run: After failed close, subsequent open reports success but returns stale content from prior URL
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: After failed close, subsequent open reports success but returns stale content from prior URL. Context: Observed when using node, python, linux
- Evidence: failure_mode_cluster:github_issue | fmev_fce1ca55e45e13ba327a52473c958037 | https://github.com/vercel-labs/agent-browser/issues/1367 | After failed close, subsequent open reports success but returns stale content from prior URL
9. Installation risk: Developers should check this installation risk before relying on the project: Chrome 147.0 crashes with "trap int3" when running in docker
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Chrome 147.0 crashes with "trap int3" when running in docker
- User impact: Developers may fail before the first successful local run: Chrome 147.0 crashes with "trap int3" when running in docker
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Chrome 147.0 crashes with "trap int3" when running in docker. Context: Observed when using docker, windows, linux
- Evidence: failure_mode_cluster:github_issue | fmev_de7dc45e4f45905d10cb44680cd26da5 | https://github.com/vercel-labs/agent-browser/issues/1339 | Chrome 147.0 crashes with "trap int3" when running in docker
10. Installation risk: Developers should check this installation risk before relying on the project: Detected: Trojan:Win32/Posilod.EB!cl
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Detected: Trojan:Win32/Posilod.EB!cl
- User impact: Developers may fail before the first successful local run: Detected: Trojan:Win32/Posilod.EB!cl
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Detected: Trojan:Win32/Posilod.EB!cl. Context: Observed when using windows
- Evidence: failure_mode_cluster:github_issue | fmev_11d6daa01783b3f8d6cc4984b34591d9 | https://github.com/vercel-labs/agent-browser/issues/1281 | Detected: Trojan:Win32/Posilod.EB!cl
11. Installation risk: Developers should check this installation risk before relying on the project: Feature: `network throttle` for emulating slow connections / per-URL delay
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: Feature:
network throttlefor emulating slow connections / per-URL delay - User impact: Developers may fail before the first successful local run: Feature:
network throttlefor emulating slow connections / per-URL delay - Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Feature:
network throttlefor emulating slow connections / per-URL delay. Context: Observed during installation or first-run setup. - Evidence: failure_mode_cluster:github_issue | fmev_af068ec0790d0398008062aef7b5d1a5 | https://github.com/vercel-labs/agent-browser/issues/1372 | Feature:
network throttlefor emulating slow connections / per-URL delay
12. Installation risk: Developers should check this installation risk before relying on the project: High LLM turn count due to frequent `snapshot` calls when using `agent-browser` skills
- Severity: medium
- Finding: Developers should check this installation risk before relying on the project: High LLM turn count due to frequent
snapshotcalls when usingagent-browserskills - User impact: Developers may fail before the first successful local run: High LLM turn count due to frequent
snapshotcalls when usingagent-browserskills - Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: High LLM turn count due to frequent
snapshotcalls when usingagent-browserskills. Context: Observed when using node, playwright, windows - Evidence: failure_mode_cluster:github_issue | fmev_1ea0ed85aeff64de383d8fa15586474d | https://github.com/vercel-labs/agent-browser/issues/1351 | High LLM turn count due to frequent
snapshotcalls when usingagent-browserskills
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agent-browser with real data or production workflows.
--cdpeval/open silently target a secondary execution context when Chr - github / github_issue- Feature:
network throttlefor emulating slow connections / per-URL del - github / github_issue - Orphaned headless Chrome Helpers spin at high CPU under agent-browser-ch - github / github_issue
- snapshot -s <selector> produces duplicate elements when AX tree contains - github / github_issue
- Support XDG Base Directory paths for agent-browser state, config, and in - github / github_issue
- After failed close, subsequent open reports success but returns stale co - github / github_issue
- Chrome 147.0 crashes with "trap int3" when running in docker - github / github_issue
--auto-connectfails too quickly when Chrome asks for remote debugging - github / github_issue- High LLM turn count due to frequent
snapshotcalls when using `agent-b - github / github_issue - Support enabling WebAuthn for passkey authentication with a virtual auth - github / github_issue
- Feature Request: Chrome Extension-based Connection for Seamless Login St - github / github_issue
- Detected: Trojan:Win32/Posilod.EB!cl - GitHub / issue
Source: Project Pack community evidence and pitfall evidence