# https://github.com/browser-use/browser-use 项目说明书

生成时间：2026-06-01 10:05:16 UTC

## 目录

- [Browser Use 简介](#introduction)
- [快速开始](#quickstart)
- [系统架构](#architecture)
- [核心组件详解](#core-components)
- [Agent 执行机制](#agent-system)
- [系统提示词模板](#agent-prompts)
- [消息管理器](#agent-message-manager)
- [CDP 浏览器控制](#browser-cdp)
- [Watchdog 监控机制](#browser-watchdogs)
- [浏览器配置与 Profile](#browser-profile)

<a id='introduction'></a>

## Browser Use 简介

### 相关页面

相关主题：[快速开始](#quickstart), [系统架构](#architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/browser-use/browser-use/blob/main/README.md)
- [browser_use/__init__.py](https://github.com/browser-use/browser-use/blob/main/browser_use/__init__.py)
- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/mcp/server.py](https://github.com/browser-use/browser-use/blob/main/browser_use/mcp/server.py)
- [browser_use/dom/markdown_extractor.py](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)
- [browser_use/actor/page.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)
- [browser_use/filesystem/file_system.py](https://github.com/browser-use/browser-use/blob/main/browser_use/filesystem/file_system.py)
</details>

# Browser Use 简介

## 项目概述

Browser Use 是一个强大的 AI 驱动浏览器自动化框架，允许大语言模型（LLM）通过自然语言指令控制真实浏览器完成复杂任务。该项目将人工智能代理能力与浏览器自动化深度融合，实现了从简单的网页导航到复杂的多步骤工作流程自动化。

### 核心特性

| 特性 | 描述 |
|------|------|
| **AI 代理驱动** | 基于 LLM 的智能代理，可自主决策和执行浏览器操作 |
| **多标签页管理** | 支持同时管理多个浏览器标签页 |
| **内容提取** | 智能从网页中提取结构化数据 |
| **视觉理解** | 通过截图实现视觉反馈和验证 |
| **文件系统集成** | 支持读写文件、处理 PDF 等文档 |
| **MCP 协议支持** | 提供 Model Context Protocol 服务器集成 |

资料来源：[browser_use/mcp/manifest.json:1-40]()

## 架构设计

### 整体架构

```mermaid
graph TD
    A[用户请求] --> B[Agent 服务层]
    B --> C[动作规划器]
    C --> D[工具执行层]
    D --> E[浏览器会话]
    E --> F[CDP 协议通信]
    F --> G[Chromium 浏览器]
    
    H[DOM 服务] --> I[增强 DOM 树]
    I --> J[Markdown 提取]
    J --> K[LLM 上下文]
    
    L[文件系统] --> M[PDF 处理]
    L --> N[文件读写]
    
    G --> H
    G --> I
```

Browser Use 采用分层架构设计，核心组件包括：

1. **Agent 服务层** (`browser_use/agent/service.py`)：处理用户请求和 LLM 交互
2. **动作规划器**：将 LLM 输出转换为可执行的动作
3. **工具执行层**：提供导航、点击、输入、提取等工具
4. **浏览器会话管理**：`BrowserSession` 管理浏览器实例和 CDP 连接

资料来源：[browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)

### MCP 服务器架构

```mermaid
graph LR
    A[Claude Desktop] -->|MCP 协议| B[MCP 服务器]
    B --> C[browser_extract_content]
    B --> D[browser_get_html]
    B --> E[browser_screenshot]
    B --> F[browser_scroll]
    B --> G[Tab 管理工具]
```

MCP 服务器提供以下核心工具：

| 工具名称 | 功能描述 |
|---------|---------|
| `browser_navigate` | 导航到指定 URL |
| `browser_extract_content` | 根据查询提取页面结构化内容 |
| `browser_get_html` | 获取原始 HTML（支持 CSS 选择器） |
| `browser_screenshot` | 页面截图（支持全页） |
| `browser_scroll` | 滚动页面 |
| `browser_list_tabs` | 列出所有打开的标签页 |
| `browser_switch_tab` | 切换标签页 |
| `browser_close_tab` | 关闭标签页 |
| `browser_go_back` | 返回上一页 |

资料来源：[browser_use/mcp/server.py:1-100]()

## 核心组件详解

### Agent 服务

`Agent` 类是项目的核心，负责：
- 管理浏览器会话生命周期
- 与 LLM 通信并处理响应
- 执行规划的动作序列
- 维护记忆和状态

```python
# 初始化示例
from browser_use import Agent
from browser_use.browser import Browser

agent = Agent(
    task="搜索今天的天气",
    browser=Browser()
)
await agent.run()
```

### DOM 服务与内容提取

```mermaid
flowchart LR
    A[CDP 获取 DOM] --> B[HTML 序列化]
    B --> C[增强 DOM 树构建]
    C --> D[Markdown 提取]
    D --> E[结构化内容输出]
```

`extract_clean_markdown` 函数提供统一的内容提取接口：

```python
async def extract_clean_markdown(
    browser_session: 'BrowserSession | None' = None,
    dom_service: DomService | None = None,
    target_id: str | None = None,
    extract_links: bool = False,
    extract_images: bool = False,
) -> tuple[str, dict[str, Any]]
```

资料来源：[browser_use/dom/markdown_extractor.py:1-80]()

### Markdown 分块处理

对于长内容，系统支持结构感知的分块：

```python
def split_markdown_into_chunks(
    content: str,
    max_chunk_chars: int = 15000,
    overlap_lines: int = 5,
    start_from_char: int = 0,
) -> list[MarkdownChunk]
```

分块算法分三个阶段：
1. **原子块解析**：识别标题、代码块、表格、列表项、段落
2. **贪婪组装**：累积块直到超过最大字符限制
3. **重叠前缀构建**：为上下文延续添加重叠行

### 页面操作

`Page` 类封装了底层的 CDP 操作：

| 操作类别 | 包含功能 |
|---------|---------|
| 导航 | `navigate`, `go_back`, `reload` |
| 元素交互 | `click`, `hover`, `input` |
| 内容获取 | `get_elements_by_css_selector`, `get_basic_info` |
| 截图 | `screenshot`, `full_page_screenshot` |
| 视图控制 | `set_viewport_size`, `scroll` |

资料来源：[browser_use/actor/page.py:1-50]()

## 系统提示词

Browser Use 为不同模型类型提供优化的提示词模板：

| 模式 | 模板文件 | 适用场景 |
|-----|---------|---------|
| 标准思考 | `system_prompt.md` | 通用任务 |
| 无思考 | `system_prompt_no_thinking.md` | 快速任务 |
| Browser Use 专用 | `system_prompt_browser_use.md` | 微调模型 |
| Anthropic Flash | `system_prompt_anthropic_flash.md` | Claude Flash |

提示词包含的关键部分：
- **动作参考**：可用工具及其参数定义
- **错误恢复策略**：处理异常情况的指导
- **任务示例**：TODO 列表和评估示例

资料来源：[browser_use/agent/prompts.py:1-150]()

## 文件系统集成

`FileSystem` 服务支持多种文件操作：

| 功能 | 描述 | 支持格式 |
|------|------|---------|
| PDF 读取 | 提取 PDF 文本内容 | `.pdf` |
| 文件写入 | 创建或覆盖文件 | 所有格式 |
| 文件读取 | 读取现有文件 | 所有格式 |
| 字符串替换 | 替换文件内容 | 文本文件 |

```python
# PDF 读取示例
result = await file_system.read_file(
    file_name="report.pdf",
    max_chars=50000
)
# 返回结构包含 pages、num_pages、content 等字段
```

资料来源：[browser_use/filesystem/file_system.py:1-100]()

## CLI 工具

Browser Use 提供命令行界面：

```bash
# 基本用法
uvx --from 'browser-use[cli]' browser-use "任务描述"

# 使用指定 Chrome 配置
uvx --from 'browser-use[cli]' browser-use --profile "ProfileName" "任务"

# 模板初始化
browser-use init
```

### CLI 2.0 特性（0.12.3+）

- 基于直接 CDP 协议（非 Playwright）
- 持久化后台守护进程
- ~50ms 命令延迟
- 与 Claude Code、Codex 等 CLI 代理兼容

## 版本历史与重要变更

### 安全相关（0.12.5）

移除 `litellm` 核心依赖以应对供应链攻击：
- `pip install browser-use` 不再安装 litellm
- 如需使用 `ChatLiteLLM`，需单独安装：`pip install litellm`

### CLI 2.0（0.12.3）

重构 CLI 架构，使用直接 CDP 替代 Playwright，实现性能提升和延迟降低。

### CDP 连接修复（0.11.12）

修复了与远程浏览器（如 Browserless）的 CDP 连接问题。

## 已知问题与限制

| 问题 | 描述 | 状态 |
|------|------|------|
| Windows Profile 锁定 | `--profile` 在 Chrome 运行时会因文件锁失败 | [Issue #4546](https://github.com/browser-use/browser-use/issues/4546) |
| Token 计数显示 | 日志中显示 `??? (TODO)` 而非实际计数 | [Issue #4150](https://github.com/browser-use/browser-use/issues/4150) |
| CDP 远程超时 | 远程浏览器 CDP 调用可能无限挂起 | [Issue #4579](https://github.com/browser-use/browser-use/issues/4579) |

## 社区功能请求

### 热门功能请求

1. **人类在环（Human-in-the-Loop）** ([Issue #221](https://github.com/browser-use/browser-use/issues/221))
   - 暂停代理执行等待人工干预
   - 支持在工具调用中暂停

2. **模拟人类浏览行为** ([Issue #947](https://github.com/browser-use/browser-use/issues/947))
   - 避免被反爬虫检测
   - 添加鼠标移动、滚动延迟等

3. **Ollama 本地模型支持** ([Issue #2605](https://github.com/browser-use/browser-use/issues/2605))
   - 支持通过 Ollama 使用本地 LLM

## 快速入门

### 安装

```bash
pip install browser-use
```

### 基本使用

```python
from browser_use import Agent
from browser_use.browser import Browser

async def main():
    browser = Browser()
    agent = Agent(
        task="帮我搜索 Python 最新教程",
        browser=browser
    )
    result = await agent.run()
    await browser.close()
    return result

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
```

### MCP 集成

在 Claude Desktop 配置中添加：

```json
{
  "mcpServers": {
    "browser-use": {
      "command": "uvx",
      "args": ["--from", "browser-use[mcp]", "browser-use-mcp"]
    }
  }
}
```

## 总结

Browser Use 是一个功能全面的浏览器自动化框架，通过将 AI 代理能力与 CDP 协议深度集成，实现了高效、灵活的网页自动化。其核心优势包括：

- **智能决策**：基于 LLM 的自主规划
- **高性能**：CLI 2.0 实现了约 50ms 的命令延迟
- **灵活扩展**：通过 MCP 协议支持多种集成
- **丰富工具集**：覆盖浏览、提取、管理等全场景

当前版本为 0.12.9，持续活跃开发中。社区反馈的功能请求和已知问题表明项目正在向更稳定、更人性化的方向发展。

---

<a id='quickstart'></a>

## 快速开始

### 相关页面

相关主题：[Browser Use 简介](#introduction)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [examples/simple.py](https://github.com/browser-use/browser-use/blob/main/examples/simple.py)
- [examples/getting_started/01_basic_search.py](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/01_basic_search.py)
- [pyproject.toml](https://github.com/browser-use/browser-use/blob/main/pyproject.toml)
- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
- [browser_use/actor/page.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)
- [examples/apps/news-use/README.md](https://github.com/browser-use/browser-use/blob/main/examples/apps/news-use/README.md)
</details>

# 快速开始

本页面帮助您快速上手 browser-use —— 一个基于 Chrome DevTools Protocol (CDP) 的 AI 驱动浏览器自动化库。通过本指南，您将在几分钟内完成环境配置并运行第一个自动化任务。

## 环境要求

### 系统要求

| 要求项 | 最低版本 | 推荐版本 |
|--------|----------|----------|
| Python | 3.11+ | 3.11 / 3.12 |
| Chrome/Chromium | 支持 CDP 的任意版本 | 最新稳定版 |
| 操作系统 | Windows 10+, macOS 10.15+, Ubuntu 20.04+ | 最新稳定版 |

### 前置依赖

browser-use 依赖以下核心包（安装时自动处理）：

- `playwright` - 浏览器驱动管理
- `cdp_use` - CDP 协议封装
- `pydantic` - 数据验证
- `langchain` 生态 - LLM 集成

> [!IMPORTANT]
> v0.12.5 版本因安全原因将 `litellm` 从核心依赖中移除。如需使用 `ChatLiteLLM` 包装器，请单独安装：`pip install litellm` 资料来源：[pyproject.toml](https://github.com/browser-use/browser-use/blob/main/pyproject.toml)

## 安装

### 基础安装

```bash
pip install browser-use
```

### 包含 CLI 工具

```bash
pip install "browser-use[cli]"
```

或使用 `uvx`（推荐，更快）：

```bash
uvx --from "browser-use[cli]" browser-use --help
```

### 安装浏览器驱动

```bash
# 使用 playwright 安装 Chromium
playwright install chromium

# 或安装所有浏览器
playwright install
```

## 基本用法

### 最小示例

以下是 browser-use 的最简使用模式：

```python
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    llm = ChatOpenAI(model="gpt-4o")
    agent = Agent(task="打开 Google 并搜索 'browser-use'", llm=llm)
    result = await agent.run()
    print(result)

asyncio.run(main())
```

资料来源：[examples/simple.py:1-15](https://github.com/browser-use/browser-use/blob/main/examples/simple.py)

### 初始化参数

`Agent` 类的核心初始化参数：

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `task` | `str` | 必填 | 要完成的自然语言任务 |
| `llm` | `BaseChatModel` | 必填 | 语言模型实例 |
| `browser` | `Browser` | `None` | 浏览器实例，None 时自动创建 |
| `max_steps` | `int` | `100` | 最大步数限制 |
| `use_vision` | `bool` | `True` | 是否使用视觉（截图）分析 |
| `save_conversation_path` | `str` | `None` | 对话历史保存路径 |

资料来源：[browser_use/agent/service.py:150-200](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)

## 核心概念

### 架构概览

```mermaid
graph TD
    A[User Task] --> B[Agent]
    B --> C[LLM]
    C --> D[Action Planning]
    D --> E[Browser Session]
    E --> F[CDP Protocol]
    F --> G[Chrome/Chromium]
    G --> H[Screenshot/DOM]
    H --> I[State Evaluation]
    I --> B
```

browser-use 基于 Agent 架构运行。Agent 接收自然语言任务，通过 LLM 生成行动计划，执行浏览器操作，并评估结果直到任务完成。

资料来源：[browser_use/agent/service.py:1-50](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)

### 浏览器实例管理

```python
from browser_use import Browser, BrowserConfig

# 自动创建浏览器
browser = Browser()

# 或自定义配置
browser = Browser(config=BrowserConfig(
    headless=False,
    chrome_instance_path="/path/to/chrome"
))
```

### 页面操作

Agent 可以执行以下核心操作：

- `navigate` - 导航到指定 URL
- `click` - 点击元素（通过索引）
- `input` - 输入文本
- `scroll` - 滚动页面
- `extract` - 从页面提取结构化信息
- `switch_tab` - 切换浏览器标签页
- `go_back` - 返回历史记录
- `done` - 完成任务

资料来源：[browser_use/actor/page.py:1-80](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)

## 完整示例

### 基础搜索任务

```python
import asyncio
from agent import Agent
from langchain_openai import ChatOpenAI

async def main():
    # 初始化 LLM
    llm = ChatOpenAI(model="gpt-4o")
    
    # 创建 Agent 实例
    agent = Agent(
        task="""
        1. 打开 https://www.google.com
        2. 在搜索框输入 "browser-use"
        3. 点击搜索按钮
        4. 提取前 5 个搜索结果的标题和链接
        """,
        llm=llm
    )
    
    # 运行 Agent
    result = await agent.run()
    print(f"任务完成: {result}")

if __name__ == "__main__":
    asyncio.run(main())
```

资料来源：[examples/getting_started/01_basic_search.py:1-30](https://github.com/browser-use/browser-use/blob/main/examples/getting_started/01_basic_search.py)

### 新闻监控应用

browser-use 支持构建更复杂的自动化应用：

```python
# 新闻监控示例
python news_monitor.py --once  # 单次提取
python news_monitor.py --interval 60  # 每60秒检查一次
python news_monitor.py --debug  # 调试模式
```

功能包括：自动访问新闻网站、提取最新文章、情感分析、跨重启的持久化去重。

资料来源：[examples/apps/news-use/README.md:1-60](https://github.com/browser-use/browser-use/blob/main/examples/apps/news-use/README.md)

## 配置选项

### Agent 配置

```python
from browser_use.agent.service import Agent, AgentSettings

settings = AgentSettings(
    max_steps=50,
    use_vision=True,
    step_timeout=120,  # 单步超时（秒）
    calculate_cost=True,  # 计算 token 成本
)

agent = Agent(
    task="你的任务",
    llm=llm,
    settings=settings
)
```

### 浏览器配置

```python
from browser_use.browser.session import BrowserSession

session = BrowserSession(
    headless=False,
    viewport_width=1920,
    viewport_height=1080,
)
```

## 常见问题

### 启动后浏览器显示空白页

**问题描述**：浏览器启动后显示空白页，任务不执行。

**可能原因**：
1. Chrome 扩展阻止页面加载
2. 网络问题导致资源无法加载
3. 浏览器路径配置错误

**解决方案**：尝试使用 `--no-sandbox` 模式启动，或检查浏览器路径配置。

### Windows 上使用 --profile 失败

**问题描述**：`WinError 32` - 文件被占用。

**原因**：Chrome 运行时无法复制用户配置文件。

**解决方案**：确保 Chrome 完全关闭后再运行，或使用临时配置文件。

资料来源：[社区 Issue #4546](https://github.com/browser-use/browser-use/issues/4546)

### CDP 连接不稳定

**问题描述**：使用远程浏览器（如 Browserless）时出现无限挂起。

**原因**：CDP 调用缺少超时机制。

**解决方案**：使用本地浏览器实例，或确保远程 CDP 端点稳定。

资料来源：[社区 Issue #4579](https://github.com/browser-use/browser-use/issues/4579)

## 下一步

- 查看 [examples/](examples/) 目录下的更多示例
- 了解 [高级配置](./advanced-configuration.md)
- 阅读 [API 参考](./api-reference.md)
- 参与 [社区讨论](https://github.com/browser-use/browser-use/discussions)

> [!NOTE]
> 最新版本为 **0.12.9**，包含会话 ID 传递给 judge LLM 调用等改进。查看 [完整更新日志](https://github.com/browser-use/browser-use/releases)。

---

<a id='architecture'></a>

## 系统架构

### 相关页面

相关主题：[Browser Use 简介](#introduction), [核心组件详解](#core-components), [CDP 浏览器控制](#browser-cdp)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
- [browser_use/browser/session.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/session.py)
- [browser_use/browser/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/views.py)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/tools/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/views.py)
- [browser_use/actor/page.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)
- [browser_use/dom/markdown_extractor.py](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)
- [browser_use/mcp/server.py](https://github.com/browser-use/browser-use/blob/main/browser_use/mcp/server.py)
</details>

# 系统架构

browser-use 是一个基于 Chrome DevTools Protocol (CDP) 的 AI 浏览器自动化框架。本页面详细介绍其核心系统架构，帮助开发者理解各组件的职责边界和交互方式。

## 架构总览

browser-use 采用分层架构设计，核心组件包括：**Agent（智能体）**、**Browser（浏览器）**、**Tools（工具）**、**DOM（文档对象模型）** 和 **LLM（语言模型）**。

```mermaid
graph TD
    subgraph "用户层"
        A[用户代码/API]
    end

    subgraph "Agent 层"
        B[Agent Service]
        C[Message Manager]
        D[History Manager]
        E[Loop Detector]
    end

    subgraph "LLM 层"
        F[BaseChatModel]
        G[TokenCost Service]
        H[Judge LLM]
    end

    subgraph "Browser 层"
        I[Browser]
        J[BrowserSession]
        K[Page]
        L[Element]
        M[Mouse]
    end

    subgraph "Tools 层"
        N[Tools Service]
        O[Action Models]
    end

    subgraph "DOM 层"
        P[DOM Service]
        Q[Serializer]
        R[Markdown Extractor]
    end

    A --> B
    B --> C
    B --> D
    B --> E
    B --> F
    B --> G
    B --> H
    B --> N
    N --> P
    N --> Q
    N --> R
    I --> J
    J --> K
    K --> L
    K --> M
```

## Agent 系统

Agent 是整个框架的核心决策引擎，负责理解用户任务、规划执行步骤并调用工具完成浏览器自动化操作。

### AgentService

`AgentService` 是 Agent 的主入口，管理整个执行循环的生命周期。

**核心职责：**

| 职责 | 说明 |
|------|------|
| 任务执行循环 | 控制 Agent 的主循环，包括步骤执行、状态更新、超时管理 |
| LLM 集成 | 初始化主 LLM、页面提取 LLM、评判 LLM |
| 消息管理 | 构建发送给 LLM 的消息，包含状态描述、截图、操作历史 |
| 循环检测 | 检测 Agent 是否陷入重复操作模式 |
| 成本追踪 | 使用 `TokenCost` 服务统计 LLM 调用成本 |

**初始化参数（部分）：**

```python
# 资料来源：browser_use/agent/service.py
def __init__(
    self,
    task: str,
    llm: BaseChatModel,
    browser: Browser | None = None,
    use_vision: bool = True,
    use_gemini: bool = False,
    max_steps: int = 100,
    agent_kwargs: dict | None = None,
    page_extraction_llm: BaseChatModel | None = None,
    judge_llm: BaseChatModel | None = None,
    calculate_cost: bool = True,
    # ... 更多参数
)
```

### 状态管理

Agent 使用 `AgentState` 管理执行状态，包含：

- `n_steps`: 当前执行步骤数
- `memory`: Agent 的记忆/上下文
- `agent_created_at`: 创建时间戳
- `last_result`: 上一步执行结果
- `evaluation_previous_goal`: 上一步的评估结果

```python
# 资料来源：browser_use/agent/service.py
self.state = injected_agent_state or AgentState()
```

### 循环检测

Agent 内置循环检测机制，通过 `LoopDetector` 避免重复操作：

```python
# 资料来源：browser_use/agent/service.py
self.state.loop_detector.window_size = self.settings.loop_detection_window
```

当检测到 Agent 在最近 N 步内重复相同操作时，会触发重新规划。

## Browser 系统

Browser 系统基于 CDP 协议直接与 Chrome 通信，提供约 50ms 的命令延迟。

```mermaid
graph LR
    A[Agent] -->|调用| B[Browser]
    B -->|CDP| C[Chrome Browser]
    C -->|DOM| D[DOMWatchdog]
    D -->|Enhanced DOM| B
```

### Browser 类

`Browser` 类是浏览器的顶层抽象，管理多个 `BrowserSession` 实例。

```python
# 资料来源：browser_use/browser/views.py
class Browser:
    def __init__(
        self,
        make_contextual_cdp_client: bool = False,
        cdp_port: int | None = None,
        # ...
    )
```

**浏览器实例化方式：**

| 方法 | 说明 | 资料来源 |
|------|------|----------|
| `Browser()` | 启动新的 Chrome 实例 | browser_use/browser/views.py |
| `Browser.from_system_chrome()` | 附加到系统已运行的 Chrome | browser_use/browser/views.py |

### BrowserSession

每个 `BrowserSession` 代表一个独立的浏览器会话，可以包含多个 Page（标签页）。

```python
# 资料来源：browser_use/browser/session.py
class BrowserSession:
    def __init__(
        self,
        cdp_client: ChromeDriver,
        browser: Browser,
        # ...
    )
```

**核心功能：**

- 页面导航和历史管理
- CDP 命令执行
- DOM Watchdog 集成用于实时 DOM 监控
- Cookie 和存储状态管理

### Page 和 Element

```mermaid
graph TD
    A[BrowserSession] --> B[Page]
    B --> C[Element]
    B --> D[Mouse]
    C --> E[Mouse]
```

**Page 类** 负责标签页级别的操作：

```python
# 资料来源：browser_use/actor/page.py
class Page:
    """Page operations (tab or iframe)."""

    def __init__(
        self, browser_session: 'BrowserSession', target_id: str, session_id: str | None = None, llm: 'BaseChatModel | None' = None
    )
```

**Element 类** 表示 DOM 元素，提供交互接口：

- 点击操作
- 文本输入
- 属性读取
- 元素可见性检测

## Tools 系统

Tools 系统定义了 Agent 可以执行的所有操作，是连接 Agent 决策与浏览器行为的桥梁。

### 工具类型

| 工具 | 功能 | 资料来源 |
|------|------|----------|
| `navigate` | 导航到指定 URL | browser_use/tools/views.py |
| `click` | 点击元素 | browser_use/tools/views.py |
| `input` | 输入文本 | browser_use/tools/views.py |
| `scroll` | 滚动页面 | browser_use/tools/views.py |
| `extract` | 从页面提取结构化数据 | browser_use/tools/views.py |
| `screenshot` | 截图 | browser_use/tools/views.py |
| `switch_tab` | 切换标签页 | browser_use/tools/views.py |
| `go_back` | 后退 | browser_use/tools/views.py |
| `done` | 完成任务 | browser_use/tools/views.py |

### ExtractAction 详解

`extract` 工具支持强大的内容提取能力：

```python
# 资料来源：browser_use/tools/views.py
class ExtractAction(BaseModel):
    query: str
    extract_links: bool = False
    extract_images: bool = False
    start_from_char: int = 0  # 长文本分块读取
    output_schema: dict | None = None  # 结构化输出
    already_collected: list[str] = []  # 去重
```

**使用示例：**

```python
# 资料来源：browser_use/tools/views.py
# 从第 5000 个字符开始提取，避免重复已收集项
extract(
    query="提取所有文章标题",
    start_from_char=5000,
    already_collected=["Article 1", "Article 2"],
    output_schema={
        "type": "object",
        "properties": {
            "titles": {"type": "array", "items": {"type": "string"}}
        }
    }
)
```

### Tools Service

`ToolsService` 是工具执行的核心，负责：

1. 解析 Agent 输出的操作指令
2. 调用对应的浏览器操作
3. 处理操作结果和异常

## DOM 系统

DOM 系统负责将网页内容转换为 Agent 可理解的文本描述。

### Markdown 提取流程

```mermaid
graph TD
    A[HTML DOM] --> B[DOM Service]
    B --> C[HTML Serializer]
    C --> D[Markdown Converter]
    D --> E[Content Filter]
    E --> F[Clean Markdown]
```

### 核心组件

| 组件 | 职责 | 资料来源 |
|------|------|----------|
| `DomService` | DOM 遍历和序列化 | browser_use/dom/service.py |
| `DOMTreeSerializer` | HTML 转 Markdown | browser_use/dom/serializer/serializer.py |
| `markdown_extractor` | 内容过滤和分块 | browser_use/dom/markdown_extractor.py |

### Enhanced DOM Tree

browser-use 使用增强型 DOM 树捕获页面完整状态：

```python
# 资料来源：browser_use/dom/markdown_extractor.py
async def _get_enhanced_dom_tree_from_browser_session(browser_session: 'BrowserSession'):
    dom_watchdog = browser_session._dom_watchdog
    # 使用缓存的增强 DOM 树
    if dom_watchdog.enhanced_dom_tree is not None:
        return dom_watchdog.enhanced_dom_tree
    # 构建新的增强 DOM 树
    await dom_watchdog._build_dom_tree_without_highlights()
    return dom_watchdog.enhanced_dom_tree
```

**特点：**

- 包含动态内容（Shadow DOM 等）
- 保留元素层级结构
- 支持元素高亮状态

### Markdown 分块

对于长页面内容，系统支持分块读取：

```python
# 资料来源：browser_use/dom/markdown_extractor.py
def split_markdown_into_chunks(
    content: str,
    max_chunk_chars: int = 15000,
    overlap_lines: int = 5,
    start_from_char: int = 0,
) -> list[MarkdownChunk]
```

**分块算法：**

1. **Phase 1**: 解析原子块（headers、code fences、tables、list items、paragraphs）
2. **Phase 2**: 贪婪组装，累积块直到超过 `max_chunk_chars`
3. **Phase 3**: 构建重叠前缀，保持上下文连续性

## LLM 集成

### 多 LLM 架构

Agent 支持配置多个 LLM：

| LLM 类型 | 用途 | 资料来源 |
|----------|------|----------|
| 主 LLM | 核心决策和行动规划 | browser_use/agent/service.py |
| 页面提取 LLM | 高效内容提取 | browser_use/agent/service.py |
| 评判 LLM | 验证操作结果 | browser_use/agent/service.py |

### Token 成本追踪

```python
# 资料来源：browser_use/agent/service.py
self.token_cost_service = TokenCost(include_cost=calculate_cost, pricing_url=pricing_url)
self.token_cost_service.register_llm(llm)
self.token_cost_service.register_llm(page_extraction_llm)
self.token_cost_service.register_llm(judge_llm)
```

### Prompt 管理

Prompt 模板定义在 `browser_use/agent/system_prompts/` 目录下：

- `system_prompt.md` - 主系统提示词
- `system_prompt_no_thinking.md` - 无思考模式提示词
- `system_prompt_anthropic_flash.md` - Anthropic Flash 模型提示词

```python
# 资料来源：browser_use/agent/prompts.py
def get_extract_data_prompt() -> str:
    """构建内容提取的系统提示"""
    return """
You are an expert at extracting data from webpages.
...
"""
```

## MCP 集成

browser-use 提供 Model Context Protocol (MCP) 服务器，支持标准化工具调用。

### MCP 工具列表

```python
# 资料来源：browser_use/mcp/server.py
types.Tool(
    name='browser_navigate',
    description='Navigate to a specific URL',
    inputSchema={...}
)
types.Tool(
    name='browser_extract_content',
    description='Extract structured content from the current page',
    inputSchema={...}
)
types.Tool(
    name='browser_get_html',
    description='Get the raw HTML of the current page',
    inputSchema={...}
)
types.Tool(
    name='browser_screenshot',
    description='Take a screenshot of the current page',
    inputSchema={...}
)
types.Tool(
    name='browser_scroll',
    description='Scroll the page',
    inputSchema={...}
)
```

## 文件系统集成

Agent 可以读写本地文件，支持多种文件类型：

```python
# 资料来源：browser_use/filesystem/file_system.py
SUPPORTED_EXTENSIONS = {
    # 文档
    'pdf', 'doc', 'docx', 'txt', 'rtf', 'odt',
    # 表格
    'xls', 'xlsx', 'csv',
    # 演示
    'ppt', 'pptx', 'odp',
    # 代码
    'py', 'js', 'css', 'java', 'cpp',
    # 压缩
    'zip', 'rar', '7z', 'tar', 'gz',
    # 图片
    'jpg', 'jpeg', 'png', 'gif', 'svg',
}
```

**PDF 处理特性：**

- 自动分页读取
- 智能内容截断
- 显示进度提示

## 执行流程

```mermaid
sequenceDiagram
    participant User as 用户
    participant Agent as AgentService
    participant LLM as 语言模型
    participant Tools as ToolsService
    participant Browser as Browser
    participant DOM as DOM系统

    User->>Agent: 初始化任务
    loop 执行循环
        Agent->>DOM: 获取页面状态
        DOM-->>Agent: 增强DOM树
        Agent->>Agent: 构建状态描述
        Agent->>LLM: 发送决策请求
        LLM-->>Agent: 返回操作指令
        Agent->>Tools: 解析并执行操作
        Tools->>Browser: CDP命令
        Browser-->>Tools: 执行结果
        Tools-->>Agent: 操作结果
        Agent->>Agent: 更新状态/评估
    end
    Agent-->>User: 任务完成
```

## 关键配置参数

| 参数 | 默认值 | 说明 | 资料来源 |
|------|--------|------|----------|
| `max_steps` | 100 | 最大执行步骤数 | browser_use/agent/service.py |
| `use_vision` | True | 是否使用视觉能力 | browser_use/agent/service.py |
| `step_timeout` | 60 | 单步超时时间（秒） | browser_use/agent/service.py |
| `loop_detection_window` | 10 | 循环检测窗口大小 | browser_use/agent/service.py |

## 与社区问题相关的架构考量

### CDP 连接稳定性

社区反馈（Issue #4579）指出远程浏览器场景下 CDP 调用可能无限挂起。架构层面建议：

- 使用 MCP 服务器的标准化工具有更好的超时控制
- 浏览器会话应配置适当的连接超时

### Token 计数问题

社区反馈（Issue #4150）显示 token 计数显示为 `??? (TODO)`。当前 `TokenCost` 服务已集成，但需确保 LLM 返回完整的 usage 信息。

### Windows Profile 问题

Issue #4546 报告的 WinError 32 问题源于 Chrome 进程文件锁定。当前架构在初始化时复制 Chrome profile 目录，与运行中的 Chrome 实例冲突。

## 总结

browser-use 的架构设计遵循以下原则：

1. **CDP 优先**：直接使用 Chrome DevTools Protocol，实现低延迟操作
2. **模块化**：各组件职责清晰，便于测试和扩展
3. **多 LLM 协作**：主 LLM 负责决策，提取 LLM 优化内容处理
4. **智能循环检测**：避免 Agent 陷入无效重复操作
5. **标准化接口**：通过 MCP 支持跨平台工具集成

这种架构使 browser-use 能够高效处理复杂的浏览器自动化任务，同时保持代码的可维护性和可扩展性。

---

<a id='core-components'></a>

## 核心组件详解

### 相关页面

相关主题：[系统架构](#architecture), [Agent 执行机制](#agent-system)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/actor/page.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)
- [browser_use/tools/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/views.py)
- [browser_use/dom/markdown_extractor.py](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/mcp/server.py](https://github.com/browser-use/browser-use/blob/main/browser_use/mcp/server.py)
- [browser_use/filesystem/file_system.py](https://github.com/browser-use/browser-use/blob/main/browser_use/filesystem/file_system.py)
</details>

# 核心组件详解

browser-use 是一个基于 Chrome DevTools Protocol（CDP）构建的浏览器自动化框架，通过 AI Agent 驱动浏览器完成复杂任务。其核心架构围绕 **Agent（代理）**、**Page（页面）**、**Element（元素）**、**Tools（工具）** 和 **DOM Service（DOM服务）** 五大组件展开。

## 架构概览

browser-use 采用分层架构设计，各组件职责明确：

```mermaid
graph TD
    A[Agent 代理层] --> B[Tools 工具层]
    A --> C[Page Actor 页面层]
    B --> D[DOM Service DOM服务]
    C --> D
    C --> E[Browser Session 浏览器会话]
    E --> F[CDP Client CDP客户端]
    F --> G[Chrome Browser Chrome浏览器]
```

| 层级 | 组件 | 职责 |
|------|------|------|
| 代理层 | Agent | 任务规划、决策、状态管理 |
| 工具层 | Tools Service | 提供可调用工具（点击、输入、提取等） |
| 页面层 | Page Actor | 页面操作抽象、元素交互 |
| 服务层 | DOM Service | DOM 树构建、内容序列化 |
| 协议层 | CDP Client | Chrome DevTools Protocol 通信 |

资料来源：[browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)

---

## 1. Page Actor（页面角色）

Page Actor 是浏览器页面的高级抽象层，封装了页面级操作和状态获取能力。

### 1.1 核心职责

- 页面导航与状态获取（URL、标题、截图）
- 元素查询与操作
- 内容提取与结构化输出
- 滚动与视口管理

资料来源：[browser_use/actor/page.py:1-50](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)

### 1.2 主要方法

```python
class Page:
    async def get_url() -> str                    # 获取当前URL
    async def get_title() -> str                  # 获取页面标题
    async def screenshot() -> bytes               # 页面截图
    async def get_elements_by_css_selector()       # CSS选择器查询
    async def extract_content(query, schema)      # 结构化内容提取
```

### 1.3 结构化内容提取

Page 提供了基于 LLM 的结构化内容提取功能：

```python
async def extract_content(
    prompt: str,                                   # 提取描述
    structured_output: type[T],                   # Pydantic 模型定义
    llm: BaseChatModel | None = None              # 可选指定LLM
) -> T
```

该方法调用 `_extract_clean_markdown()` 获取干净的内容，然后通过 LLM 提取符合 schema 的结构化数据。资料来源：[browser_use/actor/page.py:extract_content](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)

---

## 2. Tools Service（工具服务）

Tools Service 是 Agent 与浏览器之间的桥梁，提供 Agent 可调用的所有工具。

### 2.1 工具类型

| 工具名称 | 功能描述 |
|----------|----------|
| `go_to_url` | 导航到指定 URL |
| `click_element` | 点击页面元素 |
| `input_text` | 向输入框填入文本 |
| `scroll` | 滚动页面 |
| `extract_content` | 提取结构化内容 |
| `search_on_page` | 页面内搜索 |
| `switch_tab` | 切换浏览器标签页 |
| `get_html` | 获取页面 HTML |

资料来源：[browser_use/tools/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/views.py)

### 2.2 ExtractAction 参数详解

```python
class ExtractAction(BaseModel):
    query: str                                    # 提取查询描述
    extract_links: bool = False                  # 是否包含链接
    extract_images: bool = False                 # 是否包含图片URL
    start_from_char: int = 0                      # 长内容分页起始位置
    output_schema: dict | None = None             # JSON Schema 定义输出结构
    already_collected: list[str] = []             # 已收集项去重列表
```

`start_from_char` 参数用于处理长文档的分页读取，当内容被分块时，下一块的起始位置通过 `content_stats["next_start_char"]` 返回。资料来源：[browser_use/tools/views.py:ExtractAction](https://github.com/browser-use/browser-use/blob/main/browser_use/tools/views.py)

### 2.3 工具执行流程

```mermaid
sequenceDiagram
    participant Agent
    participant ToolsService
    participant Page
    participant DOMService
    participant LLM
    
    Agent->>ToolsService: 调用 extract_content
    ToolsService->>Page: _extract_clean_markdown()
    Page->>DOMService: 获取 DOM 树
    DOMService-->>Page: Enhanced DOM Tree
    Page-->>ToolsService: Markdown 内容
    ToolsService->>LLM: 结构化提取
    LLM-->>ToolsService: 符合 Schema 的数据
    ToolsService-->>Agent: 提取结果
```

---

## 3. DOM Service（DOM 服务）

DOM Service 负责构建和管理增强的 DOM 树，是内容提取的核心底层服务。

### 3.1 核心功能

- 从 CDP 获取页面 DOM 结构
- 构建增强的 DOM 树（包含动态内容、Shadow DOM）
- HTML 到 Markdown 的序列化转换
- 内容分块与统计

资料来源：[browser_use/dom/markdown_extractor.py](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)

### 3.2 Markdown 提取函数

```python
async def extract_clean_markdown(
    browser_session: BrowserSession | None = None,
    dom_service: DomService | None = None,
    target_id: str | None = None,
    extract_links: bool = False,
    extract_images: bool = False,
) -> tuple[str, dict[str, Any]]
```

返回值为元组 `(markdown_content, content_statistics)`，其中 `content_statistics` 包含：

| 统计字段 | 说明 |
|----------|------|
| `method` | 使用的提取方法 |
| `original_html_chars` | 原始 HTML 字符数 |
| `initial_markdown_chars` | 初步转换后的字符数 |
| `filtered_chars_removed` | 过滤掉的噪声字符数 |
| `final_filtered_chars` | 最终内容字符数 |
| `url` | 页面 URL |

资料来源：[browser_use/dom/markdown_extractor.py:extract_clean_markdown](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)

### 3.3 内容分块算法

`split_markdown_into_chunks` 函数实现三阶段分块：

1. **阶段一**：解析原子块（headers、code fences、tables、list items、paragraphs）
2. **阶段二**：贪婪组装，按 `max_chunk_chars` 限制累积块
3. **阶段三**：构建重叠前缀，用于上下文连续性

```python
def split_markdown_into_chunks(
    content: str,
    max_chunk_chars: int = 15000,
    overlap_lines: int = 5,
    start_from_char: int = 0,
) -> list[MarkdownChunk]
```

---

## 4. Agent（代理）

Agent 是任务编排的核心，负责维护任务状态、做出决策、调用工具。

### 4.1 消息管理

Agent 通过 `AgentMessageManager` 管理对话历史和状态信息：

- `history`: 完整的对话历史
- `steps': 已执行步骤记录
- `evaluation_previous_goal`: 上一步骤评估结果
- `memory`: 跨步骤的持久化记忆
- `todo`: 当前任务列表

资料来源：[browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)

### 4.2 系统提示词

系统提示词包含以下关键部分：

| 部分 | 内容 |
|------|------|
| `task_definition` | 任务定义 |
| `agent_identity` | 代理身份说明 |
| `capabilities` | 可用能力列表 |
| `action_reference` | 动作引用 |
| `error_recovery` | 错误恢复策略 |
| `critical_reminders` | 关键提醒 |

### 4.3 Agent 状态描述构建

`AgentStateDescription` 类负责生成发送给 LLM 的状态描述，包含：

- **历史记录**：已完成步骤和评估结果
- **当前状态**：页面 URL、标题、可用操作
- **记忆内容**：跨步骤的重要信息
- **待办事项**：当前任务进度
- **截图信息**：视觉反馈

---

## 5. MCP Server（模型上下文协议服务）

MCP Server 提供了基于 STDIO 的工具接口，允许其他 AI 工具（如 Claude Code）通过 MCP 协议调用 browser-use 功能。

### 5.1 可用 MCP 工具

| 工具名称 | 参数 | 功能 |
|----------|------|------|
| `browser_navigate` | `url` | 导航到指定 URL |
| `browser_click` | `index` | 点击元素索引 |
| `browser_input` | `index, text` | 向输入框输入文本 |
| `browser_scroll` | `direction, amount` | 滚动页面 |
| `browser_extract_content` | `query, extract_links` | 提取内容 |
| `browser_get_html` | `selector` | 获取 HTML |
| `browser_screenshot` | `full_page` | 截图 |

资料来源：[browser_use/mcp/server.py](https://github.com/browser-use/browser-use/blob/main/browser_use/mcp/server.py)

---

## 6. 文件系统工具

browser-use 提供了文件读写工具，用于 Agent 生成报告或持久化数据。

### 6.1 支持的文件类型

```python
SUPPORTED_EXTENSIONS = {
    # 文档格式
    'pdf', 'doc', 'docx', 'txt', 'rtf', 'odt',
    # 数据格式
    'md', 'csv', 'json', 'xml', 'yaml', 'yml',
    # 图像格式
    'jpg', 'jpeg', 'png', 'gif', 'bmp', 'svg', 'webp',
    # 代码格式
    'py', 'js', 'css', 'java', 'cpp',
    # 压缩格式
    'zip', 'rar', '7z', 'tar', 'gz'
}
```

### 6.2 PDF 读取策略

当读取 PDF 文件时，系统采用智能优先级策略：

1. 按内容密度对页面评分
2. 优先展示包含关键词的页面
3. 始终包含第一页
4. 遵守 `MAX_CHARS` 字符限制

资料来源：[browser_use/filesystem/file_system.py](https://github.com/browser-use/browser-use/blob/main/browser_use/filesystem/file_system.py)

---

## 7. 常见问题与社区反馈

### 7.1 CDP 连接稳定性

远程浏览器（如 Browserless）场景下，单个 CDP 调用可能无限期挂起。建议在网络不稳定环境中配置适当的超时。资料来源：[Issue #4579](https://github.com/browser-use/browser-use/issues/4579)

### 7.2 Token 计数显示问题

当前版本的 token 计数显示为 `??? (TODO)` 占位符，该功能尚未完全实现。资料来源：[Issue #4150](https://github.com/browser-use/browser-use/issues/4150)

### 7.3 Windows Profile 锁定

在 Windows 上使用 `--profile` 参数时，如果 Chrome 正在运行，会因文件锁导致 WinError 32。资料来源：[Issue #4546](https://github.com/browser-use/browser-use/issues/4546)

---

## 8. 快速参考

### 8.1 核心类关系

```mermaid
classDiagram
    class BrowserSession {
        +start()
        +stop()
        +new_page()
        +get_pages()
    }
    class Page {
        +get_url()
        +get_title()
        +screenshot()
        +extract_content()
    }
    class Element {
        +click()
        +fill()
        +get_attribute()
    }
    class ToolsService {
        +execute_action()
        +extract_content()
    }
    
    BrowserSession "1" --> "many" Page
    Page "1" --> "many" Element
    ToolsService --> Page
```

### 8.2 版本兼容性

| 版本 | 特性 |
|------|------|
| 0.12.9+ | 跳过新标签页截图 |
| 0.12.3+ | CLI 2.0，基于 CDP 而非 Playwright |
| 0.12.5+ | 安全性更新，litellm 移至可选依赖 |

当前最新版本为 **0.12.9**，详情见 [Release Notes](https://github.com/browser-use/browser-use/releases/tag/0.12.9)。

---

<a id='agent-system'></a>

## Agent 执行机制

### 相关页面

相关主题：[系统提示词模板](#agent-prompts), [消息管理器](#agent-message-manager)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
- [browser_use/agent/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py)
- [browser_use/agent/judge.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/judge.py)
- [browser_use/agent/variable_detector.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/variable_detector.py)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/agent/message_manager/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/message_manager/service.py)
- [browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)
- [browser_use/agent/history.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/history.py)
</details>

# Agent 执行机制

## 概述

Agent 是 browser-use 框架的核心执行单元，负责控制浏览器完成自动化任务。它通过与大语言模型（LLM）交互，将高层任务目标转化为具体的浏览器操作指令，并持续执行「观察-决策-执行」的循环，直到任务完成或达到最大步数限制。

Agent 执行机制的核心职责包括：

- **状态管理**：维护任务执行过程中的上下文状态、内存和历史记录
- **LLM 交互**：构建提示词、调用语言模型、解析模型输出
- **动作执行**：将模型决策转化为具体的浏览器操作（点击、输入、滚动等）
- **错误恢复**：处理操作失败、执行回退策略
- **资源控制**：管理令牌消耗、超时控制和循环检测

资料来源：[browser_use/agent/service.py:1-100]()

## 核心架构

### 组件关系图

```mermaid
graph TD
    A[Agent] --> B[AgentState<br/>状态管理]
    A --> C[AgentHistoryList<br/>历史记录]
    A --> D[AgentSettings<br/>配置参数]
    A --> E[TokenCost<br/>成本追踪]
    
    A --> F[BrowserSession<br/>浏览器会话]
    A --> G[DomService<br/>DOM 服务]
    
    A --> H[BaseChatModel<br/>主 LLM]
    A --> I[BaseChatModel<br/>提取 LLM]
    A --> J[BaseChatModel<br/>判断 LLM]
    
    F --> G
    G --> K[CDP Client<br/>协议客户端]
    
    H --> L[ActionModel<br/>动作解析]
    L --> M[navigate<br/>click<br/>input<br/>scroll<br/>extract<br/>...]
    
    M --> F
```

### 执行流程图

```mermaid
graph TD
    A[开始执行<br/>agent.run] --> B[初始化状态]
    B --> C[构建系统提示词]
    C --> D[获取页面状态<br/>DOM + 截图]
    
    D --> E{循环检测}
    E -->|检测到重复| F[触发重新规划]
    E -->|正常| G[调用主 LLM]
    
    G --> H[解析 AgentOutput]
    H --> I{解析成功?}
    
    I -->|失败| J[JSON 解析错误处理]
    J --> K[重试或回退]
    K --> G
    
    I -->|成功| L{执行动作列表}
    
    L -->|有待执行动作| M[执行单个动作]
    M --> N{动作成功?}
    
    N -->|失败| O[错误恢复]
    O --> L
    
    N -->|成功| L
    
    L -->|动作列表完成| P[更新历史记录]
    P --> Q[更新内存状态]
    Q --> D
    
    L -->|done 动作| R[任务完成]
    
    E -->|超过循环阈值| S[提前终止]
```

资料来源：[browser_use/agent/service.py:200-350]()

## 状态管理

### AgentState 数据结构

AgentState 维护任务执行过程中的所有状态信息：

```python
class AgentState(BaseModel):
    memory: str = ""                    # 代理记忆/上下文
    todo_list: list[str] = field(default_factory=list)  # 待办事项
    task: str = ""                      # 当前任务描述
    steps: int = 0                      # 已执行步数
    action_names: list[str] = field(default_factory=list)  # 动作历史
    action_results: list[str] = field(default_factory=list)  # 动作结果
    success_steps: int = 0              # 成功步数
    loop_detector: LoopDetector = field(default_factory=LoopDetector)  # 循环检测器
```

关键状态字段说明：

| 字段 | 类型 | 说明 |
|------|------|------|
| `memory` | str | 跨步骤持久化的上下文信息 |
| `todo_list` | list[str] | 任务分解的待办清单 |
| `steps` | int | 当前已执行的步数计数器 |
| `action_names` | list[str] | 最近执行的动作名称序列 |
| `action_results` | list[str] | 最近动作的执行结果 |
| `success_steps` | int | 连续成功的步数 |
| `loop_detector` | LoopDetector | 检测重复执行模式 |

资料来源：[browser_use/agent/views.py:1-100]()

### 循环检测器

LoopDetector 用于检测 Agent 是否陷入重复执行模式：

```python
class LoopDetector:
    window_size: int = 5    # 检测窗口大小
    detected_loops: int = 0 # 检测到的循环次数
    
    def check_loop(self, action_sequence: list[str]) -> bool:
        """检测是否陷入循环"""
        # 移除最后一个动作，检查是否存在重复模式
        ...
```

当检测到连续的动作序列重复出现时，Agent 会触发重新规划策略。

资料来源：[browser_use/agent/variable_detector.py:1-80]()

## 历史记录管理

### AgentHistoryList

AgentHistoryList 维护完整的执行历史，用于上下文累积和成本追踪：

```python
class AgentHistoryList:
    history: list[AgentHistory]  # 历史记录列表
    usage: UsageMetrics | None   # API 使用统计
    
class AgentHistory(BaseModel):
    step_number: int                      # 步骤编号
    action: str                           # 执行的动作
    args: dict                            # 动作参数
    result: str                           # 执行结果
    success: bool                         # 是否成功
    screenshot: str | None                # 该步的截图
    state_before: str | None              # 执行前的状态描述
    tokens_used: int | None               # 该步消耗的令牌数
    error: str | None                      # 错误信息（如有）
```

历史记录支持消息压缩（Message Compaction）功能，用于在长对话中减少上下文长度。

资料来源：[browser_use/agent/history.py:1-150]()

## 提示词系统

### 系统提示词构建

Agent 使用结构化的系统提示词来指导 LLM 行为：

```python
def get_system_prompt(
    task: str,
    include_examples: bool = True,
    include_action_reference: bool = True,
) -> str:
```

提示词包含以下关键组件：

| 组件 | 说明 |
|------|------|
| `intro` | Agent 角色和能力描述 |
| `language_settings` | 语言设置 |
| `error_recovery` | 错误恢复策略 |
| `critical_reminders` | 关键提醒（验证、弹窗处理、过滤条件） |
| `action_reference` | 可用动作参考 |
| `todo_examples` | 待办事项示例 |
| `evaluation_examples` | 动作评估示例 |
| `memory_examples` | 记忆/上下文示例 |

资料来源：[browser_use/agent/prompts.py:1-100]()

### 提示词变体

框架支持多种提示词变体以优化不同场景：

- **system_prompt.md**：标准提示词，适合大多数任务
- **system_prompt_no_thinking.md**：禁用思维链提示词，减少令牌消耗
- **system_prompt_anthropic_flash.md**：针对 Anthropic Flash 模型优化的提示词

## LLM 交互层

### 多模型架构

Agent 支持配置多个独立的 LLM 实例：

| LLM 类型 | 用途 | 配置参数 |
|----------|------|----------|
| `llm` | 主决策模型 | `include_reasoning: bool` |
| `page_extraction_llm` | 页面内容提取 | 用于 extract 动作的结构化提取 |
| `judge_llm` | 结果评判 | 验证动作执行结果 |
| `compaction_llm` | 消息压缩 | 长对话上下文优化 |

```python
class Agent:
    def __init__(
        self,
        llm: BaseChatModel,
        page_extraction_llm: BaseChatModel | None = None,
        judge_llm: BaseChatModel | None = None,
        ...
    ):
```

资料来源：[browser_use/agent/service.py:150-250]()

### 模型输出解析

Agent 使用 Pydantic 模型解析 LLM 输出：

```python
class AgentOutput(BaseModel):
    action_names: list[str]          # 要执行的动作列表
    action_args: dict[str, Any]      # 动作参数
    evaluation_previous_goal: str     # 评估上一个目标的结果
    memory: str                       # 更新后的记忆
    done: bool                        # 任务是否完成
    result: str                       # 最终结果（done=true时）
```

## 动作执行模型

### 可用动作列表

Agent 支持以下核心浏览器动作：

| 动作 | 参数 | 说明 |
|------|------|------|
| `navigate` | `url: str` | 导航到指定 URL |
| `click` | `index: int` | 点击指定索引的元素 |
| `input` | `index: int, text: str, submit: bool` | 向输入框输入文本 |
| `scroll` | `direction: up/down, amount: int` | 滚动页面 |
| `wait` | `seconds: int` | 等待指定秒数 |
| `extract` | `query: str, extract_links: bool` | 提取页面结构化信息 |
| `screenshot` | `full_page: bool` | 截取页面截图 |
| `switch_tab` | `page_id: str` | 切换浏览器标签页 |
| `go_back` / `go_forward` | - | 浏览器历史导航 |
| `done` | `result: str` | 标记任务完成 |

资料来源：[browser_use/agent/system_prompts/system_prompt.md:1-100]()

### 动作执行器

动作通过 ToolsService 执行：

```mermaid
sequenceDiagram
    participant Agent
    participant ToolsService
    participant BrowserSession
    participant CDP
    
    Agent->>ToolsService: 执行动作(action_name, args)
    ToolsService->>BrowserSession: 调用相应方法
    BrowserSession->>CDP: 发送协议命令
    CDP-->>BrowserSession: 返回结果
    BrowserSession-->>ToolsService: 返回执行结果
    ToolsService-->>Agent: ActionResult
```

## 评判机制

### Judge LLM

Judge 是独立的 LLM 实例，用于验证动作执行结果：

```python
class JudgeService:
    def __init__(self, judge_llm: BaseChatModel):
        self.llm = judge_llm
    
    async def judge_step(
        self,
        step_result: str,
        goal: str,
        state_before: str,
        state_after: str,
    ) -> JudgeResult:
        """评判当前步骤是否成功"""
```

评判结果包含：
- `success: bool`：步骤是否成功
- `reasoning: str`：判断理由
- `should_stop: bool`：是否应该终止任务

资料来源：[browser_use/agent/judge.py:1-120]()

## 超时与资源控制

### 步进超时

每个动作步骤都有独立的超时控制：

```python
class AgentSettings:
    step_timeout: int = 120  # 单步超时秒数
    
    @property
    def llm_timeout(self) -> int:
        """LLM 调用超时 = step_timeout * 0.8"""
        return int(self.step_timeout * 0.8)
```

### 令牌成本追踪

TokenCost 服务追踪所有 LLM 调用的令牌消耗：

```python
class TokenCost:
    def __init__(self, include_cost: bool, pricing_url: str | None):
        self.include_cost = include_cost
        self.pricing_url = pricing_url
        
    def register_llm(self, llm: BaseChatModel):
        """注册 LLM 实例用于成本追踪"""
        
    def calculate_cost(self, model: str, tokens: dict) -> float:
        """计算令牌成本"""
```

资料来源：[browser_use/agent/service.py:250-350]()

## 配置参数

### AgentSettings 完整配置

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `max_steps` | int | 100 | 最大执行步数 |
| `step_timeout` | int | 120 | 单步超时（秒） |
| `max_clickable_elements_length` | int | 10 | 可点击元素数量限制 |
| `include_reasoning` | bool | True | 是否包含推理过程 |
| `include_visual_feedback` | bool | True | 是否包含视觉反馈 |
| `use_vision` | bool | True | 是否使用视觉模型 |
| `enable_planning` | bool | False | 是否启用规划模式 |
| `loop_detection_enabled` | bool | True | 是否启用循环检测 |
| `loop_detection_window` | int | 5 | 循环检测窗口大小 |
| `message_compaction` | MessageCompaction | None | 消息压缩配置 |

资料来源：[browser_use/agent/service.py:80-150]()

## 执行入口

### agent.run() 方法

```python
async def run(
    self,
    task: str,
    max_steps: int | None = None,
    on_step: Callable[[AgentHistory], None] | None = None,
) -> AgentHistoryList:
    """异步执行任务
    
    Args:
        task: 任务描述
        max_steps: 最大步数覆盖
        on_step: 每步执行后的回调函数
        
    Returns:
        AgentHistoryList: 完整执行历史
    """
```

执行流程概览：

1. **初始化阶段**：设置任务、更新状态、注册信号处理器
2. **主循环**：持续执行步骤直到完成或达到限制
3. **收尾阶段**：生成最终结果、清理资源

```python
# 核心循环逻辑伪代码
while self.state.steps < max_steps:
    # 1. 获取当前页面状态
    state = await self.get_state()
    
    # 2. 调用 LLM 获取决策
    output = await self.get_model_output(state)
    
    # 3. 评估上一步结果
    if use_judge:
        judgment = await self.judge_step(...)
    
    # 4. 执行动作列表
    for action_name in output.action_names:
        result = await self.execute_action(action_name, args)
    
    # 5. 更新历史和状态
    self.history.add(step)
    self.state.update(output)
    
    # 6. 检查是否完成
    if output.done:
        break
```

资料来源：[browser_use/agent/service.py:300-500]()

## 常见问题与限制

### 社区反馈的已知问题

| Issue | 描述 | 影响 |
|-------|------|------|
| #4150 | 令牌计数显示 `??? (TODO)` | 调试信息不准确 |
| #4579 | 远程浏览器 CDP 连接超时 | 任务可能无限挂起 |
| #4798 | 缺少 agent.pause() 功能 | 无法实现人工介入 |

### 使用建议

1. **超时配置**：远程浏览器建议设置较长的 `step_timeout`
2. **循环检测**：复杂任务可调整 `loop_detection_window`
3. **视觉模式**：低带宽环境可设置 `use_vision=False` 以减少数据传输
4. **成本控制**：启用 `include_cost=True` 追踪 API 消耗

## 相关资源

- 源码仓库：[browser-use/browser-use](https://github.com/browser-use/browser-use)
- 最新版本：0.12.9
- CLI 文档：[Browser Use CLI 2.0](https://github.com/browser-use/browser-use/releases/tag/0.12.3)

---

<a id='agent-prompts'></a>

## 系统提示词模板

### 相关页面

相关主题：[Agent 执行机制](#agent-system)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)
- [browser_use/agent/system_prompts/system_prompt_no_thinking.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt_no_thinking.md)
- [browser_use/agent/system_prompts/system_prompt_anthropic_flash.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt_anthropic_flash.md)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/agent/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py)
- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
</details>

# 系统提示词模板

## 概述

系统提示词模板是 browser-use 框架中用于指导 AI Agent 行为的核心组件。这些 Markdown 格式的模板文件定义了 Agent 在执行浏览器自动化任务时的角色、能力边界、动作规范和错误恢复策略。系统提示词模板通过模块化设计，支持不同模型提供商（如 OpenAI、Anthropic、Google）的特定优化。

## 模板架构

### 模板文件结构

browser-use 采用多模板架构，根据不同场景选择合适的系统提示词：

| 模板文件 | 用途 | 适用场景 |
|---------|------|---------|
| `system_prompt.md` | 标准主模板 | 默认场景，通用浏览器自动化任务 |
| `system_prompt_no_thinking.md` | 无思考模式模板 | 禁用思维链的模型，减少 token 消耗 |
| `system_prompt_anthropic_flash.md` | Anthropic Flash 优化模板 | Claude Flash 系列模型优化 |

### 模板加载流程

```mermaid
graph TD
    A[Agent 初始化] --> B{模型类型判断}
    B -->|Anthropic Flash| C[加载 system_prompt_anthropic_flash.md]
    B -->|禁用思考| D[加载 system_prompt_no_thinking.md]
    B -->|标准模式| E[加载 system_prompt.md]
    C --> F[模板内容注入]
    D --> F
    E --> F
    F --> G[Agent 运行时使用]
```

## 模板核心组件

### 1. 动作参考（Action Reference）

每个模板都包含标准动作集的定义，指导 Agent 理解可用的浏览器操作：

```markdown
Common actions you can use:
- navigate: Go to a specific URL
- click: Click on an element by index
- input: Type text into an input field
- scroll: Scroll the page up or down
- wait: Wait for the page to load
- extract: Extract structured information from the page
- screenshot: Take a screenshot for visual verification
- switch_tab: Switch between browser tabs
- go_back: Navigate back in browser history
- done: Complete the task and report results
```

资料来源：[browser_use/agent/system_prompts/system_prompt_anthropic_flash.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt_anthropic_flash.md)

### 2. 错误恢复策略（Error Recovery）

模板中定义了标准化的错误处理流程：

```markdown
<error_recovery>
When encountering errors or unexpected states:
1. First, verify the current state using screenshot as ground truth
2. Check if a popup, modal, or overlay is blocking interaction
3. If an element is not found, scroll to reveal more content
4. If an action fails repeatedly (2-3 times), try an alternative approach
5. If blocked by login/403, consider alternative sites or search engines
6. If the page structure is different than expected, re-analyze and adapt
7. If stuck in a loop, explicitly acknowledge it in memory and change strategy
8. If max_steps is approaching, prioritize completing the most important parts
</error_recovery>
```

资料来源：[browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)

### 3. Todo 示例（Todo Examples）

模板包含结构化任务示例，帮助 Agent 理解如何组织复杂任务：

```json
"write_file": {
  "file_name": "todo.md",
  "content": "# ArXiv CS.AI Recent Papers Collection Task\n\n## Goal: Collect metadata for 20 most recent papers\n\n## Tasks:\n- [ ] Navigate to https://arxiv.org/list/cs.AI/recent\n- [ ] Initialize papers.md file for storing paper data\n- [ ] Collect paper 1/20: The Automated LLM Speedrunning Benchmark\n- [x] Collect paper 2/20: AI Model Passport\n- [ ] Continue collecting remaining papers from current page\n..."
}
```

资料来源：[browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)

### 4. 评估示例（Evaluation Examples）

正向和负向评估示例帮助 Agent 理解任务完成标准：

| 类型 | 示例 |
|------|------|
| 正向 | `"evaluation_previous_goal": "Successfully navigated to the product page and found the target information. Verdict: Success"` |
| 正向 | `"evaluation_previous_goal": "Clicked the login button and user authentication form appeared. Verdict: Success"` |
| 负向 | `"evaluation_previous_goal": "Failed to input text into the search bar as I cannot see it in the image. Verdict: Failure"` |
| 负向 | `"evaluation_previous_goal": "Clicked the submit button with index 15 but the form was not submitted successfully. Verdict: Failure"` |

资料来源：[browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)

### 5. 记忆机制（Memory）

模板定义了 Agent 的上下文记忆格式：

```json
"memory": "Visited 2 of 5 target websites. Collected pricing data from Amazon ($39.99) and eBay ($42.00). Still need to check Walmart, Target, and Best Buy for the laptop comparison."
"memory": "Search returned results but no filter applied yet. User wants items under $50 with 4+ stars. Will apply price filter first, then rating filter."
"memory": "Popup appeared blocking the page. Need to close it first before continuing with search."
```

资料来源：[browser_use/agent/system_prompts/system_prompt_no_thinking.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt_no_thinking.md)

## 提示词构建函数

### prompts.py 模块

`browser_use/agent/prompts.py` 文件包含动态提示词构建函数，用于生成运行时提示词：

#### 1. 内容提取提示词

```python
def get_extraction_prompt() -> str:
    """构建用于网页内容提取的提示词"""
    return """
You are an expert at extracting data from webpages.

<input>
You will be given:
1. A query describing what to extract
2. The markdown of the webpage (filtered to remove noise)
3. Optionally, a screenshot of the current page state
</input>

<instructions>
- Extract information from the webpage that is relevant to the query
- ONLY use the information available in the webpage - do not make up information
- If information is not available, mention that clearly
- If the query asks for all items, list all of them
</instructions>
"""
```

资料来源：[browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)

#### 2. AI 步骤用户提示词

```python
def get_ai_step_user_prompt(query: str, stats_summary: str, content: str) -> str:
    """
    Build user prompt for AI step action.

    Args:
        query: What to extract or analyze
        stats_summary: Content statistics summary
        content: Page markdown content

    Returns:
        Formatted prompt string
    """
    return f'<query>\n{query}\n</query>\n\n<content_stats>\n{stats_summary}\n</content_stats>\n\n<webpage_content>\n{content}\n</webpage_content>'
```

资料来源：[browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)

## 模板选择机制

### 模型配置与模板映射

Agent 根据模型配置自动选择合适的系统提示词模板：

```mermaid
graph LR
    A[AgentSettings] --> B[llm 模型配置]
    B --> C{模型提供商检测}
    C -->|Anthropic + Flash| D[system_prompt_anthropic_flash.md]
    C -->|其他 + no_think=True| E[system_prompt_no_thinking.md]
    C -->|标准配置| F[system_prompt.md]
```

### 配置参数

| 参数 | 说明 | 默认值 |
|------|------|-------|
| `use_vision` | 是否使用视觉功能（截图分析） | `True` |
| `model_name` | 模型名称 | 根据实际配置 |
| `system_prompt` | 自定义系统提示词覆盖 | `None` |

资料来源：[browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)

## 状态描述生成

### 读取状态组件

系统提示词模板中的 `<read_state>` 部分定义了 Agent 感知页面状态的方式：

| 组件 | 说明 |
|------|------|
| `objective` | 当前任务目标 |
| `past_steps` | 历史执行步骤 |
| `memory` | Agent 记忆的上下文 |
| `evaluation_previous_goal` | 上一步执行结果评估 |
| `content_extract` | 页面内容摘要 |
| `screenshot` | 视觉截图 |
| `active_tab_index` | 当前活动标签页索引 |

### 页面特定动作

模板支持动态注入页面特定动作：

```markdown
<page_specific_actions>
click(42)
input(36, "search query")
scroll(0.5)
</page_specific_actions>
```

资料来源：[browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)

## 关键提示词组件详解

### 关键提醒（Critical Reminders）

```markdown
<critical_reminders>
1. ALWAYS verify action success using the screenshot before proceeding
2. ALWAYS handle popups/modals/cookie banners before other actions
3. ALWAYS apply filters when user specifies criteria (price, rating, etc.)
</critical_reminders>
```

资料来源：[browser_use/agent/system_prompts/system_prompt_anthropic_flash.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt_anthropic_flash.md)

### 好输出模式

模板强调参考而非直接复制模式：

```markdown
good output patterns. Use them as reference but never copy them directly.
```

这确保 Agent 生成原创性输出，同时保持格式一致性。

## 模板定制指南

### 自定义系统提示词

用户可以通过 AgentSettings 覆盖默认模板：

```python
from browser_use import Agent, AgentSettings

custom_prompt = """
You are a specialized researcher for academic papers.
Focus on extracting citations, authors, and abstracts.
"""

agent = Agent(
    task="Find recent papers on machine learning",
    settings=AgentSettings(
        system_prompt=custom_prompt
    )
)
```

### 模板变量注入

模板中支持动态变量的部分：

| 变量 | 注入时机 | 示例 |
|------|---------|------|
| `{{todo_examples}}` | 初始化时 | Todo 任务示例 |
| `{{evaluation_examples}}` | 初始化时 | 评估示例 |
| `{{memory_examples}}` | 初始化时 | 记忆示例 |
| `{objective}` | 每步运行时 | 当前目标 |
| `{past_steps}` | 每步运行时 | 历史步骤 |
| `{screenshot}` | 每步运行时 | 截图数据 |

## 社区相关问题

### Token 计数显示问题

社区反馈指出 Token 计数显示存在 TODO 占位符问题：

> **Bug**: Token count display permanently broken — shows `??? (TODO)` in all verbose logs

资料来源：[browser-use/issues/4150](https://github.com/browser-use/browser-use/issues/4150)

### Ollama 模型支持

社区请求在系统提示词中添加更好的 Ollama 模型兼容性：

> **Issue #2605**: Gpt-OSS model support (Ollama) - JSON 解析错误

资料来源：[browser-use/issues/2605](https://github.com/browser-use/browser-use/issues/2605)

## 最佳实践

1. **模板选择**：根据模型特性选择合适模板，Anthropic Flash 模型使用专用模板可获得更好的性能
2. **避免硬编码**：系统提示词模板应保持通用性，避免过度依赖特定网站结构
3. **错误处理**：遵循模板中的错误恢复策略进行顺序尝试
4. **视觉验证**：始终使用截图确认动作执行结果
5. **记忆维护**：及时更新 memory 字段以维护长任务上下文

## 文件位置

所有系统提示词模板文件位于：

```
browser_use/agent/system_prompts/
├── system_prompt.md                    # 标准主模板
├── system_prompt_no_thinking.md         # 无思考模式
├── system_prompt_anthropic_flash.md     # Anthropic Flash 优化
└── (其他提供商特定模板)

---

<a id='agent-message-manager'></a>

## 消息管理器

### 相关页面

相关主题：[Agent 执行机制](#agent-system)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/agent/message_manager/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/message_manager/service.py)
- [browser_use/agent/message_manager/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/message_manager/views.py)
- [browser_use/agent/message_manager/utils.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/message_manager/utils.py)
- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
- [browser_use/agent/prompts.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/prompts.py)
- [browser_use/agent/system_prompts/system_prompt.md](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/system_prompts/system_prompt.md)
</details>

# 消息管理器

消息管理器（Message Manager）是 browser-use 项目中负责管理和处理 Agent 与 LLM（大语言模型）之间通信消息的核心组件。该模块位于 `browser_use/agent/message_manager/` 目录下，是 Agent 架构中连接用户任务、浏览器状态和 AI 推理的关键桥梁。

## 架构概述

消息管理器的设计采用分层架构，将消息的定义、转换逻辑和业务服务分离到不同的子模块中。这种设计使得消息处理流程更加模块化，便于维护和扩展。

消息管理器主要负责以下几个核心职责：构建发送给 LLM 的消息格式、管理对话历史状态、处理内容统计信息，以及提供工具调用的示例和上下文信息。

## 核心模块

### service.py - 消息服务层

service.py 是消息管理器的核心服务层，实现了消息的构建和转换逻辑。该模块直接与 Agent 主服务交互，为 LLM 调用提供格式化的输入。

在服务层中，消息管理器需要处理来自多个数据源的内容，包括浏览器状态、截图信息、DOM 树序列化结果等。这些原始数据需要经过清洗、格式化和组装，才能形成符合 LLM 输入格式要求的消息结构。

根据社区反馈，该模块存在一个已知问题：令牌计数显示功能目前无法正常工作。在日志输出中，消息的令牌数显示为 `??? (TODO)` 占位符，这是因为相关代码被注释替换为 TODO 标记。资料来源：[社区问题 #4150](https://github.com/browser-use/browser-use/issues/4150)

### views.py - 消息视图定义

views.py 定义了消息管理器中使用的数据结构和视图模型。这些视图类封装了消息的元数据、内容和状态信息，为上层服务提供统一的数据接口。

消息视图通常包含以下关键字段：消息内容类型标识、内容主体、来源标识、时间戳、以及与该消息关联的元数据信息。这些字段使得消息在处理过程中能够携带足够的上下文信息。

### utils.py - 工具函数集

utils.py 提供了一系列辅助函数，用于支持消息的转换、验证和格式化操作。这些工具函数是消息处理流程中可重用的基础组件。

## 消息流转流程

消息管理器在 Agent 执行过程中的消息流转可以通过以下流程描述：

```mermaid
graph TD
    A[用户任务输入] --> B[Agent 主服务]
    B --> C[消息管理器 Service]
    C --> D[构建 System Prompt]
    C --> E[序列化浏览器状态]
    C --> F[包含截图信息]
    D --> G[格式化消息列表]
    E --> G
    F --> G
    G --> H[LLM 推理调用]
    H --> I[接收 LLM 响应]
    I --> J[处理工具调用]
    J --> K[执行浏览器操作]
    K --> L[更新浏览器状态]
    L --> C
```

## 系统提示词集成

消息管理器与系统提示词模板紧密集成，通过加载预定义的提示词文件来指导 LLM 的行为。这些提示词模板定义了 Agent 的角色、能力边界和错误处理策略。

### 提示词模板类型

根据代码分析，系统提示词模板分为多个版本以适应不同的模型能力：

| 模板文件 | 用途说明 | 适用场景 |
|---------|---------|---------|
| system_prompt.md | 标准系统提示词 | 通用场景 |
| system_prompt_anthropic_flash.md | Anthropic Flash 优化版 | Claude Flash 模型 |
| system_prompt_no_thinking.md | 无思考过程版 | 快速响应需求 |

资料来源：[browser_use/agent/system_prompts/](https://github.com/browser-use/browser-use/tree/main/browser_use/agent/system_prompts)

### 动态内容注入

消息管理器会在运行时动态注入以下内容到系统提示词中：

状态描述部分包含了当前浏览器窗口的详细信息，如 URL、页面标题、可交互元素列表等。这些信息帮助 LLM 理解当前执行环境的上下文。

记忆模块（Memory）记录了 Agent 在执行任务过程中的中间状态和已完成的步骤。这使得 Agent 能够追踪复杂任务的整体进度，避免重复操作或遗漏关键环节。

评估模块（Evaluation）提供了对前一步骤执行结果的判断。Agent 通过分析上一步操作的输出，来决定是否需要调整策略或继续执行预设计划。

## 内容提取与统计

消息管理器负责管理从网页中提取的内容统计信息。这些统计帮助 LLM 了解可用的上下文量，从而做出合理的决策。

### 内容统计指标

消息管理器追踪的关键统计指标包括：

| 指标名称 | 说明 | 数据来源 |
|---------|------|---------|
| 原始 HTML 字符数 | DOM 序列化后的初始大小 | HTMLSerializer |
| 初始 Markdown 字符数 | 首次转换后的文本长度 | markdownify 库 |
| 过滤移除字符数 | 噪声内容清理后的减少量 | 自定义过滤规则 |
| 最终字符数 | 实际发送给 LLM 的长度 | 最终处理结果 |
| 截断指示器 | 内容是否被截断 | 长度阈值判断 |

资料来源：[browser_use/dom/markdown_extractor.py](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)

### Markdown 分块处理

对于较长的页面内容，消息管理器支持分块处理策略。当单个页面的 Markdown 内容超过预设的字符限制时，系统会自动将其分割为多个块，并在每个块中包含重叠的前缀内容以保持上下文连贯性。

分块算法采用三阶段处理：原子块解析、贪婪组装和重叠前缀构建。这种设计确保了内容在分割时能够尊重语义边界（如标题、代码块、表格等）。

资料来源：[browser_use/dom/markdown_extractor.py:100-150](https://github.com/browser-use/browser-use/blob/main/browser_use/dom/markdown_extractor.py)

## 工具调用支持

消息管理器负责格式化工具调用的示例和模式信息，帮助 LLM 理解可用的浏览器操作能力。

### 动作类型映射

系统提示词中定义了 Agent 可用的核心动作集合：

| 动作名称 | 功能描述 | 典型参数 |
|---------|---------|---------|
| navigate | 导航到指定 URL | url |
| click | 点击页面元素 | element_index |
| input | 向输入框写入文本 | text, element_index |
| scroll | 滚动页面 | direction, amount |
| wait | 等待页面加载 | duration |
| extract | 提取结构化信息 | query, output_schema |
| screenshot | 捕获页面截图 | full_page |
| switch_tab | 切换浏览器标签页 | tab_index |
| go_back | 后退浏览历史 | 无 |
| done | 完成任务 | 无 |

## 配置与扩展

### 消息压缩

消息管理器支持消息压缩功能，当对话历史过长时，可以配置压缩 LLM 来减少上下文长度。这对于处理长任务或受限的上下文窗口尤为重要。

### 令牌成本追踪

消息管理器集成了 TokenCost 服务来追踪每次 LLM 调用的令牌消耗。这使得用户能够监控任务执行的资源使用情况。

## 已知问题

### 令牌计数显示异常

当前版本中，消息管理器的令牌计数显示功能存在问题。在详细日志输出中，消息的令牌数显示为 `??? (TODO)` 而非实际数值。这是由于相关代码被注释替换为占位符，等待后续修复。

资料来源：[社区问题 #4150](https://github.com/browser-use/browser-use/issues/4150)

## 相关模块依赖

消息管理器的正常运行依赖于以下核心模块：

浏览器会话模块（BrowserSession）提供 DOM Watchdog 服务，用于获取页面的增强 DOM 树结构。消息管理器通过该服务获取页面的序列化内容。

LLM 消息模块定义了不同类型消息的构造方式，包括系统消息、用户消息等基础消息类型。这些消息类型是消息管理器构建输入的基础。

Agent 服务层是消息管理器的直接调用方，协调消息管理器与其他组件（如计划器、循环检测器）的交互。

## 使用建议

在使用消息管理器时，建议注意以下几点：

根据任务复杂度选择合适的提示词模板。对于需要快速响应的场景，可选择精简版提示词；对于复杂推理任务，建议使用完整版提示词。

监控消息长度，避免超过 LLM 的上下文限制。对于长任务，启用消息压缩或手动分块处理。

定期检查日志输出，关注令牌计数功能的修复进展，以便及时更新监控策略。

---

<a id='browser-cdp'></a>

## CDP 浏览器控制

### 相关页面

相关主题：[系统架构](#architecture), [Watchdog 监控机制](#browser-watchdogs)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/browser/session.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/session.py)
- [browser_use/actor/page.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/page.py)
- [browser_use/actor/element.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/element.py)
- [browser_use/browser/session_manager.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/session_manager.py)
- [browser_use/browser/watchdogs/default_action_watchdog.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/default_action_watchdog.py)
- [browser_use/actor/__init__.py](https://github.com/browser-use/browser-use/blob/main/browser_use/actor/__init__.py)
- [examples/browser/using_cdp.py](https://github.com/browser-use/browser-use/blob/main/examples/browser/using_cdp.py)
</details>

# CDP 浏览器控制

## 概述

CDP（Chrome DevTools Protocol）是 browser-use 实现浏览器自动化控制的核心底层协议。与基于 Playwright 的传统方案不同，browser-use 直接通过 CDP 与 Chrome 通信，实现约 50ms 的命令延迟，通过持久化后台守护进程提供高效的浏览器操控能力。资料来源：[browser_use/actor/__init__.py:1-5]()

browser-use 的 CDP 实现包含两个主要层面：

1. **底层 CDP 通信层**：基于 `cdp-use` 库封装 WebSocket 连接，处理协议消息收发
2. **上层 Actor API 层**：提供类 Playwright 的高级接口，封装常用浏览器操作

```mermaid
graph TD
    subgraph "上层 Actor API"
        A["Page 类<br/>页面级操作"]
        B["Element 类<br/>元素交互"]
        C["Mouse 类<br/>鼠标操作"]
        D["Utils 类<br/>工具函数"]
    end
    
    subgraph "CDP 通信层"
        E["BrowserSession<br/>会话管理"]
        F["SessionManager<br/>事件驱动管理"]
        G["TimeoutWrappedCDPClient<br/>超时封装"]
    end
    
    subgraph "Chrome DevTools Protocol"
        H["CDP WebSocket<br/>实时通信"]
    end
    
    A --> E
    B --> E
    C --> E
    D --> E
    E --> F
    E --> G
    G --> H
    F --> H
```

## 核心架构组件

### BrowserSession 会话管理

`BrowserSession` 是 CDP 通信的中心枢纽，负责管理浏览器连接、页面目标和 CDP 客户端实例。资料来源：[browser_use/browser/session.py:1-100]()

| 组件 | 功能描述 |
|------|----------|
| `_cdp_client_root` | 根级 CDP 客户端，用于创建新页面和管理浏览器级别操作 |
| `session_manager` | 事件驱动的会话管理器，处理动态目标挂载/卸载 |
| `cdp_client` | 当前激活页面的 CDP 会话 |
| `targets` | 页面目标集合，存储所有打开的标签页信息 |

**关键初始化流程**：

```python
# 初始化根级 CDP 客户端（带超时包装）
self._cdp_client_root = TimeoutWrappedCDPClient(
    self.cdp_url,
    additional_headers=headers or None,
    max_ws_frame_size=200 * 1024 * 1024,  # 200MB 限制处理大型 DOM
)

# 启动事件驱动会话管理器
self.session_manager = SessionManager(self)
await self.session_manager.start_monitoring()

# 启用自动挂载，Chrome 自动通知新目标
await self._cdp_client_root.send.Target.setAutoAttach(
    params={'autoAttach': True, 'waitForDebuggerOnStart': False, 'flatten': True}
)
```

### SessionManager 事件驱动管理

`SessionManager` 实现基于事件的动态目标管理，是实现多标签页支持的核心组件。资料来源：[browser_use/browser/session_manager.py:1-50]()

**核心职责**：

1. 注册 attach/detach 事件处理器
2. 发现并挂载所有现有目标
3. 初始化会话并启用生命周期监控
4. 为未来目标启用 autoAttach

```mermaid
sequenceDiagram
    participant Chrome as Chrome Browser
    participant SM as SessionManager
    participant BS as BrowserSession
    participant CDP as CDP Client

    Chrome->>SM: 启动时发现所有 targets
    SM->>BS: 注册事件处理器
    Chrome->>SM: Target.attached (新标签页)
    SM->>BS: 通知目标变化
    BS->>CDP: 创建 CDP 会话
    CDP-->>BS: 会话就绪
```

### TimeoutWrappedCDPClient 超时封装

CDP 客户端的封装层，为所有 CDP 调用添加超时机制，防止远程浏览器连接不稳定导致的无限挂起问题。资料来源：[browser_use/browser/_cdp_timeout.py:1-30]()

**已知问题**：社区反馈 #4579 指出使用远程浏览器（如 Browserless）时，单个 CDP 调用缺乏超时控制，可能导致无限挂起。当前版本已针对此问题进行了改进。

## Page 类 - 页面级操作

`Page` 类封装了标签页级别的浏览器操作，提供导航、截图、内容提取等核心功能。资料来源：[browser_use/actor/page.py:1-100]()

### 页面导航

| 方法 | 功能 | 参数 |
|------|------|------|
| `goto(url)` | 导航到指定 URL | `url: str` |
| `go_back()` | 后退一页 | 无 |
| `go_forward()` | 前进一页 | 无 |
| `reload()` | 刷新当前页 | 无 |

### 页面内容操作

| 方法 | 功能 | 返回类型 |
|------|------|----------|
| `screenshot()` | 截图当前页面 | `str` (base64) |
| `get_url()` | 获取当前 URL | `str` |
| `get_title()` | 获取页面标题 | `str` |
| `set_viewport_size(width, height)` | 设置视口大小 | `None` |
| `evaluate(script, *args)` | 执行 JavaScript | `str` |

### 元素查找

| 方法 | 功能 | 返回类型 |
|------|------|----------|
| `get_elements_by_css_selector(selector)` | CSS 选择器查找 | `list[Element]` |
| `get_element(backend_node_id)` | 通过后端节点 ID 获取 | `Element` |
| `get_element_by_prompt(prompt, llm)` | AI 驱动元素查找 | `Element \| None` |
| `must_get_element_by_prompt(prompt, llm)` | AI 元素查找（未找到时抛异常） | `Element` |

### 结构化内容提取

`extract_content` 方法使用 LLM 从页面提取结构化数据：

```python
async def extract_content(
    self, 
    prompt: str, 
    structured_output: type[T],  # Pydantic BaseModel
    llm: BaseChatModel | None = None
) -> T:
    """从当前页面提取结构化内容"""
    # 1. 提取干净的 markdown 内容
    content, content_stats = await self._extract_clean_markdown()
    
    # 2. 构造 LLM 提示词
    system_prompt = """从网页 markdown 中提取结构化数据"""
    
    # 3. 返回 Pydantic 模型实例
```

资料来源：[browser_use/actor/page.py:100-150]()

## Element 类 - 元素交互

`Element` 类封装了对 DOM 元素的所有交互操作，包括点击、输入、悬停等。资料来源：[browser_use/actor/element.py:1-100]()

### 核心交互方法

| 方法 | 功能描述 |
|------|----------|
| `click()` | 点击元素 |
| `hover()` | 鼠标悬停 |
| `fill(text)` | 填充文本（清除后输入） |
| `type(text, delay)` | 模拟逐字输入（含延迟） |
| `clear()` | 清空输入框 |
| `scroll_into_view()` | 滚动元素到可见区域 |

### CDP 键盘事件发送

`Element` 通过底层 CDP 协议发送键盘事件，实现精确的文本输入：

```python
# Step 1: 发送 keyDown 事件（不含 text 参数）
await cdp_client.send.Input.dispatchKeyEvent(
    params={
        'type': 'keyDown',
        'key': base_key,
        'code': key_code,
        'modifiers': modifiers,
        'windowsVirtualKeyCode': vk_code,
    },
    session_id=session_id,
)

# Step 2: 发送 char 事件（含 text 参数）- 文本输入关键步骤
await cdp_client.send.Input.dispatchKeyEvent(
    params={
        'type': 'char',
        'text': char,
        'key': char,
    },
)

# Step 3: 发送 keyUp 事件
```

**注意**：char 事件的 `text` 参数是文本输入到焦点输入框的关键，社区 Issue #1020 报告的空白页问题可能与键盘事件发送不当有关。

资料来源：[browser_use/actor/element.py:100-150]()

## 键盘事件处理

`default_action_watchdog.py` 处理特殊键和文本输入的键盘事件分发。资料来源：[browser_use/browser/watchdogs/default_action_watchdog.py:1-50]()

### 特殊键支持

```python
special_keys = {
    'Enter', 'Escape', 'Tab', 
    'ArrowUp', 'ArrowDown', 'ArrowLeft', 'ArrowRight',
    'PageUp', 'PageDown', 'Home', 'End',
    'Control', 'Alt', 'Meta', 'Shift',
    'F1' - 'F12'
}
```

### 文本输入处理

对于非特殊键（普通文本字符），系统逐字符发送键盘事件，并添加 0.001 秒延迟模拟人类打字速度：

```python
for char in normalized_keys:
    if char in ('\n', '\r'):
        # 换行符作为 Enter 键处理
        await cdp_session.cdp_client.send.Input.dispatchKeyEvent(...)
    else:
        # 常规字符输入
        modifiers, vk_code, base_key = self._get_char_modifiers_and_vk(char)
```

## 目标管理

### 创建新页面

使用 `Target.createTarget` CDP 命令创建新标签页：

```python
async def _cdp_create_new_page(
    self, 
    url: str = 'about:blank', 
    background: bool = False, 
    new_window: bool = False
) -> str:
    """使用 CDP Target.createTarget 创建新页面/标签页"""
    params = CreateTargetParameters(url=url, background=background)
    if new_window:
        params['newWindow'] = True
    
    result = await self._cdp_client_root.send.Target.createTarget(params=params)
    return result['targetId']
```

资料来源：[browser_use/browser/session.py:200-220]()

### 关闭页面

```python
async def _cdp_close_page(self, target_id: TargetID) -> None:
    """使用 CDP Target.closeTarget 关闭页面/标签页"""
    await self.cdp_client.send.Target.closeTarget(params={'targetId': target_id})
```

### 页面重定向处理

系统自动检测 `chrome://newtab` 页面并重定向：

```python
# 检查 chrome://newtab 页面并重定向
page_targets_from_manager = self.session_manager.get_all_page_targets()
for target in page_targets_from_manager:
    if 'newtab' in target.get('url', ''):
        # 重定向到 about:blank
```

## CDP 连接配置

### WebSocket 帧大小限制

```python
max_ws_frame_size=200 * 1024 * 1024  # 200MB 限制
```

此配置确保处理包含大型 DOM 的页面时不会因帧大小超限而失败。

### 连接参数

| 参数 | 说明 | 默认值 |
|------|------|--------|
| `cdp_url` | Chrome DevTools 端点 URL | `localhost:9222` |
| `additional_headers` | 额外 HTTP 头（如代理认证） | `None` |
| `max_ws_frame_size` | WebSocket 最大帧大小 | 200MB |

## 已知限制与问题

### Windows Profile 锁定问题

**Issue #4546**：在 Windows 系统上使用 `--profile` 参数时，如果 Chrome 正在运行，会因文件锁冲突导致 `WinError 32`。这是因为 profile 复制操作无法获取已打开的 Chrome 配置文件夹。

### CDP 连接不稳定

**Issue #4579**：使用远程浏览器（如 Browserless）时，单个 CDP 调用缺乏超时控制，可能导致无限挂起。建议在生产环境中为远程连接配置适当的超时参数。

### Token 计数显示问题

**Issue #4150**：在 `browser_use/agent/message_manager/service.py` 中，每条消息的 token 计数显示为 `??? (TODO)` 占位符，相关功能尚未完全实现。

## 使用示例

### 基础 CDP 操作

```python
from browser_use import Browser, Page

async def basic_cdp_demo():
    # 启动浏览器
    browser = Browser()
    session = await browser.new_session()
    
    # 获取当前页面
    page = session.get_current_page()
    
    # 导航
    await page.goto('https://example.com')
    
    # 查找并点击元素
    elements = await page.get_elements_by_css_selector('a[href]')
    if elements:
        await elements[0].click()
    
    # 截图
    screenshot = await page.screenshot()
    
    # 执行 JavaScript
    result = await page.evaluate("() => document.title")
    
    # 关闭
    await browser.close()
```

### AI 驱动的元素查找

```python
from browser_use.llm.openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-4o')

# 使用自然语言查找元素
element = await page.must_get_element_by_prompt(
    prompt="登录按钮",
    llm=llm
)
await element.click()
```

## 总结

browser-use 的 CDP 浏览器控制模块提供了从底层协议通信到高层 API 封装的完整解决方案。核心优势包括：

- **低延迟**：直接 CDP 通信，约 50ms 命令延迟
- **事件驱动**：基于 SessionManager 的动态目标管理
- **丰富 API**：Page、Element、Mouse 等高层封装
- **远程支持**：通过 WebSocket 连接远程 Chrome 实例

通过合理利用这些组件，开发者可以实现高效的浏览器自动化任务。

---

<a id='browser-watchdogs'></a>

## Watchdog 监控机制

### 相关页面

相关主题：[CDP 浏览器控制](#browser-cdp), [浏览器配置与 Profile](#browser-profile)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/browser/watchdogs/__init__.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/__init__.py)
- [browser_use/browser/watchdogs/dom_watchdog.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/dom_watchdog.py)
- [browser_use/browser/watchdogs/crash_watchdog.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/crash_watchdog.py)
- [browser_use/browser/watchdogs/captcha_watchdog.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/captcha_watchdog.py)
- [browser_use/browser/watchdogs/screenshot_watchdog.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/watchdogs/screenshot_watchdog.py)
</details>

# Watchdog 监控机制

## 概述

Watchdog 监控机制是 browser-use 框架中用于协调浏览器会话事件处理的核心组件系统。该机制采用事件驱动架构，通过 `bubus` 事件总线将各种专门化的监控服务（Watchdog）绑定到浏览器会话生命周期中，实现对浏览器状态的实时感知和自动化响应。

browser-use 的 Watchdog 系统承担以下核心职责：

| 职责类别 | 说明 |
|---------|------|
| 状态监控 | 监控 DOM 变化、页面加载、元素高亮等状态 |
| 异常处理 | 检测浏览器崩溃、处理 CAPTCHA 验证 |
| 资源管理 | 管理 PDF 下载、文件处理 |
| 安全控制 | 强制执行域名限制和安全策略 |
| 对话管理 | 处理 JavaScript 对话框和弹窗 |

资料来源：[CLAUDE.md:1-30]()

## 架构设计

### 事件驱动模型

BrowserSession 使用 `bubus` 事件总线来协调各个 Watchdog 服务。每个 Watchdog 订阅特定的事件类型，并在事件触发时执行相应的处理逻辑。

```mermaid
graph TD
    BrowserSession["BrowserSession<br/>浏览器会话"]
    EventBus["bubus 事件总线"]
    DOMWatchdog["DOMWatchdog<br/>DOM 监控"]
    CrashWatchdog["CrashWatchdog<br/>崩溃监控"]
    CaptchaWatchdog["CaptchaWatchdog<br/>验证码监控"]
    ScreenshotWatchdog["ScreenshotWatchdog<br/>截图监控"]
    DownloadsWatchdog["DownloadsWatchdog<br/>下载监控"]
    PopupsWatchdog["PopupsWatchdog<br/>弹窗监控"]
    SecurityWatchdog["SecurityWatchdog<br/>安全监控"]
    AboutBlankWatchdog["AboutBlankWatchdog<br/>空白页监控"]
    
    BrowserSession --> EventBus
    EventBus --> DOMWatchdog
    EventBus --> CrashWatchdog
    EventBus --> CaptchaWatchdog
    EventBus --> ScreenshotWatchdog
    EventBus --> DownloadsWatchdog
    EventBus --> PopupsWatchdog
    EventBus --> SecurityWatchdog
    EventBus --> AboutBlankWatchdog
    
    style EventBus fill:#e1f5fe
    style BrowserSession fill:#fff3e0
```

资料来源：[CLAUDE.md:5-18]()

### Watchdog 组件层次

Watchdog 系统采用分层设计，每个组件专注于特定领域的监控任务：

| 组件名称 | 监控范围 | 优先级 | 状态同步 |
|---------|---------|-------|---------|
| DOMWatchdog | DOM 快照、元素树、截图 | 高 | 是 |
| CrashWatchdog | 浏览器崩溃、进程异常 | 最高 | 是 |
| CaptchaWatchdog | CAPTCHA 验证弹窗 | 高 | 是 |
| ScreenshotWatchdog | 页面视觉状态 | 中 | 是 |
| DownloadsWatchdog | PDF 自动下载、文件管理 | 中 | 是 |
| PopupsWatchdog | JS 对话框、模态弹窗 | 高 | 是 |
| SecurityWatchdog | 域名限制、安全策略 | 最高 | 是 |
| AboutBlankWatchdog | 空白页重定向 | 中 | 是 |

资料来源：[CLAUDE.md:10-18]()

## 核心组件详解

### DOMWatchdog

DOMWatchdog 是最核心的监控组件，负责处理 DOM 快照、页面元素分析和可视化状态管理。

**主要功能：**

- 处理 DOM 快照和元素树构建
- 生成增强的 DOM 树（enhanced_dom_tree）
- 支持元素高亮和信息提取
- 管理元素索引和可交互性分析

**核心方法：**

| 方法名 | 功能 | 返回类型 |
|-------|------|---------|
| `_build_dom_tree_without_highlights` | 构建 DOM 树（不含高亮） | None/Tree |
| `get_enhanced_dom_tree` | 获取增强 DOM 树（带缓存） | EnhancedDOMTree |
| `process_element_highlight` | 处理元素高亮显示 | None |

资料来源：[browser_use/browser/watchdogs/dom_watchdog.py]()

### CrashWatchdog

CrashWatchdog 负责监控浏览器崩溃事件，确保自动化流程的健壮性。

**触发场景：**

- 浏览器进程异常终止
- CDP 连接意外断开
- 页面加载超时
- 未处理的 JavaScript 错误

**响应策略：**

```mermaid
graph TD
    A["检测到崩溃信号"] --> B{是否为主页崩溃}
    B -->|是| C["触发会话恢复流程"]
    B -->|否| D["记录错误日志"]
    C --> E["尝试重新连接"]
    E --> F{连接成功?}
    F -->|是| G["恢复页面状态"]
    F -->|否| H["终止会话"]
    D --> I["继续执行其他任务"]
```

### CaptchaWatchdog

CaptchaWatchdog 专门处理 CAPTCHA 验证挑战，确保自动化流程不会被中断。

**支持的验证类型：**

- reCAPTCHA
- hCaptcha
- 自定义验证码
- 图像选择验证

**处理流程：**

1. 检测页面中的 CAPTCHA 元素
2. 评估验证难度和自动化解决方案可行性
3. 执行验证或暂停等待人工介入
4. 记录验证结果到日志

资料来源：[browser_use/browser/watchdogs/captcha_watchdog.py]()

### ScreenshotWatchdog

ScreenshotWatchdog 管理页面截图功能，为 AI 代理提供视觉反馈。

**截图类型：**

| 类型 | 说明 | 用途 |
|-----|------|------|
| 视口截图 | 当前可见区域 | 状态确认 |
| 全页截图 | 整个可滚动页面 | 上下文分析 |
| 元素截图 | 特定 DOM 元素 | 细节放大 |

**性能优化：**

- 按需截图（避免不必要的资源消耗）
- 压缩和格式优化
- 缓存机制减少重复生成

资料来源：[browser_use/browser/watchdogs/screenshot_watchdog.py]()

### DownloadsWatchdog

DownloadsWatchdog 处理 PDF 自动下载和文件管理任务。

**职责范围：**

- 监听下载事件
- 管理下载文件路径
- 处理文件名冲突
- 验证下载完整性

资料来源：[CLAUDE.md:11]()

### PopupsWatchdog

PopupsWatchdog 管理 JavaScript 对话框和弹窗，确保用户界面的可交互性。

**处理的弹窗类型：**

- `alert()` 对话框
- `confirm()` 确认框
- `prompt()` 输入框
- 模态对话框
- Cookie 通知栏

**行为策略：**

```mermaid
graph LR
    A["弹窗检测"] --> B{弹窗类型}
    B -->|alert| C["自动确认"]
    B -->|confirm| D["自动取消"]
    B -->|prompt| E["自动取消"]
    B -->|模态框| F["关闭或交互"]
    C --> G["记录日志"]
    D --> G
    E --> G
    F --> G
```

资料来源：[CLAUDE.md:12]()

### SecurityWatchdog

SecurityWatchdog 强制执行域名限制和安全策略，保护自动化流程的安全边界。

**安全策略：**

- 域名白名单/黑名单
- 跨域请求限制
- 敏感数据访问控制
- HTTPS 强制重定向

资料来源：[CLAUDE.md:13]()

### AboutBlankWatchdog

AboutBlankWatchdog 处理空白页重定向问题，确保导航流程的连续性。

**处理场景：**

- 检测 `about:blank` 页面
- 识别空白页重定向
- 触发重新导航

资料来源：[CLAUDE.md:15]()

## 事件订阅机制

### Watchdog 注册

每个 Watchdog 通过事件总线订阅特定事件：

```python
# 伪代码示例
class DOMWatchdog:
    def __init__(self, browser_session: BrowserSession):
        self.browser_session = browser_session
        self.event_bus = browser_session.event_bus
        
    def register(self):
        self.event_bus.subscribe('dom_changed', self.on_dom_changed)
        self.event_bus.subscribe('navigation', self.on_navigation)
        self.event_bus.subscribe('element_highlight', self.on_highlight)
```

### 事件类型映射

| 事件名称 | 触发时机 | 主要订阅者 |
|---------|---------|----------|
| `dom_changed` | DOM 树更新时 | DOMWatchdog |
| `navigation` | 页面导航时 | DOMWatchdog, AboutBlankWatchdog |
| `download_started` | 下载开始时 | DownloadsWatchdog |
| `popup_opened` | 弹窗出现时 | PopupsWatchdog |
| `crash_detected` | 崩溃检测时 | CrashWatchdog |
| `screenshot_requested` | 截图请求时 | ScreenshotWatchdog |
| `security_violation` | 安全违规时 | SecurityWatchdog |

## 与 CDP 的集成

browser-use 使用 `cdp-use` 库（https://github.com/browser-use/cdp-use）进行类型化的 CDP 协议访问。

### CDP 客户端管理

所有 CDP 客户端管理集中在 `browser_use/browser/session.py` 中，Watchdog 通过该模块与浏览器通信：

```mermaid
graph TD
    Watchdog --> CDPClient["CDP Client<br/>cdp-use"]
    CDPClient --> Browser["Chrome DevTools Protocol"]
    Browser --> Target["浏览器目标页面"]
    
    style CDPClient fill:#e8f5e9
```

**CDP 连接稳定性注意：**

社区报告显示，在使用远程浏览器（如 Browserless）时，个别 CDP 调用缺乏超时设置，可能导致无限等待。建议在配置远程浏览器时设置适当的超时参数。

资料来源：[browser-use/browser-use/issues/4579]()

## 配置与扩展

### 启用/禁用 Watchdog

可通过 BrowserSession 配置各个 Watchdog 的启用状态：

| 配置项 | 类型 | 默认值 | 说明 |
|-------|------|-------|------|
| `enable_dom_watchdog` | bool | true | 启用 DOM 监控 |
| `enable_crash_watchdog` | bool | true | 启用崩溃监控 |
| `enable_captcha_watchdog` | bool | true | 启用验证码监控 |
| `enable_downloads_watchdog` | bool | true | 启用下载监控 |

### 自定义 Watchdog

扩展 Watchdog 系统需要：

1. 继承基础 Watchdog 类
2. 实现事件订阅方法
3. 在 BrowserSession 初始化时注册

```python
class CustomWatchdog:
    def __init__(self, browser_session: BrowserSession):
        self.browser_session = browser_session
        self.event_bus = browser_session.event_bus
    
    def register(self):
        # 订阅感兴趣的事件
        self.event_bus.subscribe('custom_event', self.handle_custom_event)
    
    async def handle_custom_event(self, event_data):
        # 处理事件逻辑
        pass
```

## 已知限制与注意事项

### Windows 平台问题

在 Windows 系统上使用 `--profile` 参数时，如果 Chrome 正在运行，可能会遇到文件锁定问题（WinError 32）。这是因为 profile 复制操作与文件锁冲突。

资料来源：[browser-use/browser-use/issues/4546]()

### Token 计数显示问题

当前实现中，每个消息的 token 计数被注释掉并替换为 `??? (TODO)` 占位符，详细的 token 统计功能需要后续修复。

资料来源：[browser-use/browser-use/issues/4150]()

## 总结

Watchdog 监控机制是 browser-use 实现智能浏览器自动化的关键基础设施。通过事件驱动的架构设计，系统实现了：

- **模块化**：各监控功能独立、职责清晰
- **可扩展性**：易于添加新的监控类型
- **可靠性**：多层次的异常检测和恢复机制
- **性能优化**：按需处理、资源缓存

这套机制使得 AI 代理能够像人类用户一样感知和响应浏览器状态变化，完成复杂的网页自动化任务。

---

<a id='browser-profile'></a>

## 浏览器配置与 Profile

### 相关页面

相关主题：[CDP 浏览器控制](#browser-cdp)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [browser_use/browser/profile.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/profile.py)
- [browser_use/browser/views.py](https://github.com/browser-use/browser-use/blob/main/browser_use/browser/views.py)
- [examples/browser/real_browser.py](https://github.com/browser-use/browser-use/blob/main/examples/browser/real_browser.py)
- [examples/browser/save_cookies.py](https://github.com/browser-use/browser-use/blob/main/examples/browser/save_cookies.py)
- [browser_use/agent/service.py](https://github.com/browser-use/browser-use/blob/main/browser_use/agent/service.py)
</details>

# 浏览器配置与 Profile

## 概述

Browser-use 的浏览器配置与 Profile 系统提供了灵活的浏览器实例管理和会话持久化能力。该系统支持创建独立的浏览器配置文件、使用系统 Chrome 浏览器、管理 cookies 和会话状态，以及配置代理设置。

Profile（配置文件）是 Chrome 浏览器的核心概念，允许用户在同一浏览器安装中维护多个独立的用户环境。每个 Profile 拥有独立的书签、历史记录、扩展程序、cookies 和其他浏览器数据。Browser-use 利用这一机制实现多会话隔离和会话复用。

## Profile 管理架构

### 核心组件关系

```mermaid
graph TD
    A[BrowserConfig] --> B[ProfileConfig]
    A --> C[ProxyConfig]
    A --> D[BrowserSession]
    B --> E[UserDataDir]
    B --> F[ProfileName]
    D --> G[CDPConnection]
    D --> H[DOMWatchdog]
    G --> I[Chrome DevTools Protocol]
```

### 主要类结构

| 类名 | 文件位置 | 功能说明 |
|------|----------|----------|
| `BrowserConfig` | `browser_use/browser/views.py` | 浏览器全局配置，包含 CDP URL、端口、窗口大小等 |
| `BrowserSession` | `browser_use/browser/session.py` | 浏览器会话管理，处理 CDP 连接生命周期 |
| `ProfileConfig` | `browser_use/browser/profile.py` | Profile 配置管理，处理 Chrome Profile 的创建和复制 |
| `DomService` | `browser_use/dom/service.py` | DOM 服务，处理页面内容提取 |

资料来源：[browser_use/browser/views.py:1-100]()

## Profile 配置详解

### ProfileConfig 类

`ProfileConfig` 类是 Profile 管理的核心，负责处理 Chrome Profile 的创建、复制和路径解析：

```python
class ProfileConfig(BaseModel):
    name: str = "default"
    path: str | None = None
    use_subdir: bool = True
    copy_from_path: str | None = None
```

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `name` | `str` | `"default"` | Profile 名称，用于标识和目录创建 |
| `path` | `str \| None` | `None` | 自定义 Profile 路径，None 则自动生成 |
| `use_subdir` | `bool` | `True` | 是否在 user_data_dir 下创建子目录 |
| `copy_from_path` | `str \| None` | `None` | 从指定路径复制 Profile 内容 |

资料来源：[browser_use/browser/profile.py:1-50]()

### Profile 路径解析逻辑

Browser-use 使用以下优先级解析 Profile 路径：

1. 如果指定了 `path`，直接使用该路径
2. 如果指定了 `copy_from_path`，复制该路径的 Profile 内容
3. 否则在默认 Chrome 目录下创建以 `name` 命名的 Profile

```mermaid
flowchart LR
    A[开始] --> B{path参数存在?}
    B -->|是| C[使用指定path]
    B -->|否| D{copy_from_path存在?}
    D -->|是| E[复制源Profile]
    D -->|否| F{use_subdir?}
    F -->|是| G[创建子目录Profile]
    F -->|否| H[使用默认Profile目录]
```

资料来源：[browser_use/browser/profile.py:50-150]()

## 浏览器配置

### BrowserConfig 主要参数

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `headless` | `bool` | `True` | 是否无头模式运行 |
| `headless_screenshots` | `bool \| None` | `None` | 无头模式下是否启用截图 |
| `cdp_url` | `str \| None` | `None` | Chrome DevTools Protocol 连接 URL |
| `cdp_port` | `int \| None` | `None` | CDP 连接端口 |
| `window_width` | `int` | `1920` | 浏览器窗口宽度 |
| `window_height` | `int` | `1080` | 浏览器窗口高度 |
| `chrome_path` | `str \| None` | `None` | 自定义 Chrome 可执行文件路径 |
| `user_data_dir` | `str \| None` | `None` | Chrome 用户数据目录 |

资料来源：[browser_use/browser/views.py:100-200]()

### 从系统 Chrome 创建会话

Browser-use 支持直接连接用户已安装的系统 Chrome 浏览器实例：

```python
browser = Browser.from_system_chrome()
```

此方法会使用当前登录用户的默认 Chrome Profile，并保留所有已保存的 cookies 和会话数据。

资料来源：[examples/browser/real_browser.py:1-50]()

## 会话与 Cookies 管理

### 保存 Cookies

Browser-use 提供了便捷的 cookies 持久化功能，允许在会话之间保存和恢复登录状态：

```python
# 保存 cookies 到文件
browser_session = BrowserSession()
await browser_session.setup()
# ... 执行登录等操作 ...
await save_cookies(browser_session, "cookies.json")

# 加载 cookies
browser_session = BrowserSession()
await browser_session.setup()
await load_cookies(browser_session, "cookies.json")
```

资料来源：[examples/browser/save_cookies.py:1-100]()

### Storage State 恢复

除了手动保存和加载 cookies，Browser-use 还支持通过 Playwright 的 `storage_state` 机制恢复完整的浏览器状态：

- Cookies
- LocalStorage
- SessionStorage
- 其他浏览器存储

资料来源：[browser_use/agent/service.py:1-50]()

## 代理配置

### 代理设置参数

| 参数 | 类型 | 说明 |
|------|------|------|
| `server` | `str` | 代理服务器地址（格式：`protocol://host:port`） |
| `username` | `str \| None` | 认证用户名（可选） |
| `password` | `str \| None` | 认证密码（可选） |

```python
proxy = ProxyConfig(
    server="http://proxy.example.com:8080",
    username="user",
    password="pass"
)
```

资料来源：[browser_use/browser/views.py:200-300]()

## CDP 连接配置

### 连接方式

Browser-use 支持两种 CDP 连接方式：

1. **嵌入式 CDP**：Browser-use 启动并管理自己的 Chrome 进程
2. **远程 CDP**：连接到外部 Chrome 实例或 Browserless 等远程浏览器服务

```python
# 嵌入式连接
browser = Browser()

# 远程 CDP 连接
browser = Browser(
    cdp_url="wss://chrome.browserless.io?token=YOUR_TOKEN"
)
```

资料来源：[browser_use/browser/session.py:1-100]()

### 已知问题：远程浏览器 CDP 超时

根据社区反馈（Issue #4579），使用远程浏览器时单个 CDP 调用可能缺少超时设置，导致无限等待。建议在生产环境中配置合理的超时参数。

## CLI Profile 使用

### 命令行参数

在 CLI 中使用 `--profile` 参数指定要使用的 Chrome Profile：

```bash
browser-use --profile my-profile --task "搜索浏览器使用文档"
```

### Windows 平台已知问题

**Issue #4546**：在 Windows 平台上，当 Chrome 浏览器正在运行时使用 `--profile` 参数可能会失败，报错 `WinError 32`。

**原因分析**：Browser-use 在启动时尝试复制 Profile 目录到临时位置，但当 Chrome 正在运行时，相关文件会被操作系统锁定。

**解决方案**：

- 关闭正在运行的 Chrome 实例后再使用 `--profile` 参数
- 使用 `--profile copy_from_path` 复制一个不活跃的 Profile
- 使用 `Browser.from_system_chrome()` 连接已打开的 Chrome 而不复制 Profile

## 最佳实践

### Profile 隔离建议

```mermaid
graph TD
    A[任务需求] --> B{需要持久登录?}
    B -->|是| C[使用固定Profile]
    B -->|否| D[使用临时Profile]
    C --> E[定期清理过期cookies]
    D --> F[使用完即删除]
```

### 安全注意事项

1. **敏感数据保护**：避免在代码中硬编码 Profile 路径或 cookies 路径
2. **临时文件清理**：定期清理不再使用的临时 Profile 目录
3. **代理认证**：使用环境变量存储代理密码，而非代码中明文配置

## 配置示例代码

### 基础配置

```python
from browser_use import Browser, BrowserConfig

config = BrowserConfig(
    headless=False,
    window_width=1280,
    window_height=800,
    cdp_port=9222,
)

browser = Browser(config=config)
```

### 带 Profile 的配置

```python
from browser_use import Browser, BrowserConfig, ProfileConfig

profile_config = ProfileConfig(
    name="my-project",
    use_subdir=True,
)

browser_config = BrowserConfig(
    headless=True,
    profile=profile_config,
)

browser = Browser(config=browser_config)
```

### 使用系统 Chrome

```python
from browser_use import Browser

# 使用系统 Chrome 的默认 Profile
browser = Browser.from_system_chrome()
```

资料来源：[examples/browser/real_browser.py:50-150]()

## 总结

Browser-use 的浏览器配置与 Profile 系统提供了企业级的浏览器自动化能力。通过合理配置 Profile，可以实现：

- 多任务隔离执行
- 登录状态复用
- 不同项目使用独立环境
- 与现有 Chrome 浏览器无缝集成

开发者在使用 Profile 功能时，应注意 Windows 平台的文件锁定问题，并建议使用固定的 Profile 名称以便管理和维护。

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Doramagic 踩坑日志

项目：browser-use/browser-use

摘要：发现 38 个潜在踩坑项，其中 15 个为 high/blocking；最高优先级：安装坑 - 来源证据：Feature Request: Session replay / task audit trail for agent runs。

## 1. 安装坑 · 来源证据：Feature Request: Session replay / task audit trail for agent runs

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Feature Request: Session replay / task audit trail for agent runs
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_1e24a779206e4982aae9ffdaa602dc79 | https://github.com/browser-use/browser-use/issues/4860 | 来源类型 github_issue 暴露的待验证使用条件。

## 2. 配置坑 · 来源证据：BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_f3bf8b97a64a41dfb2dc4ff38a3226a2 | https://github.com/browser-use/browser-use/issues/4471 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 3. 配置坑 · 来源证据：Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_d63b4afd46eb43a886bebf8b4c4aa7dc | https://github.com/browser-use/browser-use/issues/4846 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 4. 配置坑 · 来源证据：Interaction Issue: ...`browser-use` CLI works on this Windows machine, but the MCP server path fails before navigation…

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Interaction Issue: ...`browser-use` CLI works on this Windows machine, but the MCP server path fails before navigation completes.
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_7ab2188b56c74ec2bc6d9fdcc9e09bc0 | https://github.com/browser-use/browser-use/issues/4580 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 5. 维护坑 · 来源证据：Feature Request: ...ADD Firefox and Safari(Webkit) Support Urgent

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：Feature Request: ...ADD Firefox and Safari(Webkit) Support Urgent
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_4b5fe20246924117b9748362adc22271 | https://github.com/browser-use/browser-use/issues/4772 | 来源类型 github_issue 暴露的待验证使用条件。

## 6. 维护坑 · 来源证据：Non-structured done action does not attach browser-downloaded files

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：Non-structured done action does not attach browser-downloaded files
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_682435d0c72a4f869bbb2c2803eb33f8 | https://github.com/browser-use/browser-use/issues/4482 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 7. 安全/权限坑 · 失败模式：security_permissions: Extension over browser-use to automate task in the current opened chrome instance?

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Extension over browser-use to automate task in the current opened chrome instance?
- 对用户的影响：Developers may expose sensitive permissions or credentials: Extension over browser-use to automate task in the current opened chrome instance?
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Extension over browser-use to automate task in the current opened chrome instance?. Context: Observed during installation or first-run setup.
- 防护动作：Do not recommend enabling privileged or credential-bearing paths until the source-backed risk is reviewed: https://github.com/browser-use/browser-use/issues/4709
- 证据：failure_mode_cluster:github_issue | fmev_d2f845e25226df207bc6749584b86987 | https://github.com/browser-use/browser-use/issues/4709 | Extension over browser-use to automate task in the current opened chrome instance?

## 8. 安全/权限坑 · 失败模式：security_permissions: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in...

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects
- 对用户的影响：Developers may expose sensitive permissions or credentials: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects. Context: Observed when using python, playwright
- 防护动作：Do not recommend enabling privileged or credential-bearing paths until the source-backed risk is reviewed: https://github.com/browser-use/browser-use/issues/4824
- 证据：failure_mode_cluster:github_issue | fmev_2dddf0e4d74712b09045a5ad3346039e | https://github.com/browser-use/browser-use/issues/4824 | Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects, failure_mode_cluster:github_issue | fmev_b624ccb9a033a66c4e6406e68c90667e | https://github.com/browser-use/browser-use/issues/4824 | Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects

## 9. 安全/权限坑 · 失败模式：security_permissions: Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction

- 严重度：high
- 证据强度：source_linked
- 发现：Developers should check this security_permissions risk before relying on the project: Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction
- 对用户的影响：Developers may expose sensitive permissions or credentials: Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction. Context: Observed when using python
- 防护动作：Do not recommend enabling privileged or credential-bearing paths until the source-backed risk is reviewed: https://github.com/browser-use/browser-use/issues/4763
- 证据：failure_mode_cluster:github_issue | fmev_4f59b6981477c7fff92589c5b6df55a5 | https://github.com/browser-use/browser-use/issues/4763 | Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction

## 10. 安全/权限坑 · 来源证据：Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_e62225fb80864dc7b651e0c2db2beee5 | https://github.com/browser-use/browser-use/issues/4846 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 11. 安全/权限坑 · 来源证据：Feature request: A2A payment protocol for browser agents hiring specialized sub-agents

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Feature request: A2A payment protocol for browser agents hiring specialized sub-agents
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_bd4eccf0673c4c7aacd4fbaf5fdb892b | https://github.com/browser-use/browser-use/issues/4540 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 12. 安全/权限坑 · 来源证据：Feature: Add MCP server trust verification before browser tool execution

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Feature: Add MCP server trust verification before browser tool execution
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_75701909d91d4f11829713d0384e31b5 | https://github.com/browser-use/browser-use/issues/4903 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 13. 安全/权限坑 · 来源证据：Governance/audit checks for browser automation agents in CI

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Governance/audit checks for browser automation agents in CI
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_006f63d2e95c43b7ab4ac3cd5cb2f1c7 | https://github.com/browser-use/browser-use/issues/4621 | 来源类型 github_issue 暴露的待验证使用条件。

## 14. 安全/权限坑 · 来源证据：MCP Server: downloads_path TCC failure + SingletonLock contention with multiple sessions

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：MCP Server: downloads_path TCC failure + SingletonLock contention with multiple sessions
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_5e85e5006ead4555af43beadbc3dadb1 | https://github.com/browser-use/browser-use/issues/4548 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 15. 安全/权限坑 · 来源证据：Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Security: `data:` and `blob:` URLs bypass `allowed_domains` restriction
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ecee25746fa74a3e8ac62d620c39e189 | https://github.com/browser-use/browser-use/issues/4763 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 16. 安装坑 · 失败模式：installation: Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent
- 对用户的影响：Developers may fail before the first successful local run: Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent. Context: Observed when using python, windows
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_1964c064d6780195cc781c49f3dd74ab | https://github.com/browser-use/browser-use/issues/3093 | Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent

## 17. 安装坑 · 失败模式：installation: Bug: ...browser-use can't use `tab list` command

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Bug: ...browser-use can't use `tab list` command
- 对用户的影响：Developers may fail before the first successful local run: Bug: ...browser-use can't use `tab list` command
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Bug: ...browser-use can't use `tab list` command. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_b9b953f1c3366bfc086beedd7d4e92d8 | https://github.com/browser-use/browser-use/issues/4790 | Bug: ...browser-use can't use `tab list` command

## 18. 安装坑 · 失败模式：installation: Bug: ...pip dependency conflicts due to strict exact-pins (==) in pyproject.toml

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Bug: ...pip dependency conflicts due to strict exact-pins (==) in pyproject.toml
- 对用户的影响：Developers may fail before the first successful local run: Bug: ...pip dependency conflicts due to strict exact-pins (==) in pyproject.toml
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Bug: ...pip dependency conflicts due to strict exact-pins (==) in pyproject.toml. Context: Observed when using python, macos, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_55dc895aea6aa4b4d5279fbc8208c3c2 | https://github.com/browser-use/browser-use/issues/4877 | Bug: ...pip dependency conflicts due to strict exact-pins (==) in pyproject.toml

## 19. 安装坑 · 失败模式：installation: Feature Request: Session replay / task audit trail for agent runs

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Feature Request: Session replay / task audit trail for agent runs
- 对用户的影响：Developers may fail before the first successful local run: Feature Request: Session replay / task audit trail for agent runs
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature Request: Session replay / task audit trail for agent runs. Context: Observed when using python, playwright
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_57167bb93d4642e8c14e13a9fa68b6b3 | https://github.com/browser-use/browser-use/issues/4860 | Feature Request: Session replay / task audit trail for agent runs, failure_mode_cluster:github_issue | fmev_ef81eaab3e00ae9961bf3928394b2aaa | https://github.com/browser-use/browser-use/issues/4860 | Feature Request: Session replay / task audit trail for agent runs

## 20. 安装坑 · 失败模式：installation: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in...

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects
- 对用户的影响：Developers may fail before the first successful local run: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects. Context: Observed when using python, playwright
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_2eef112d4d8359b968d8861fb0e546e7 | https://github.com/browser-use/browser-use/issues/4824 | Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects, failure_mode_cluster:github_issue | fmev_c81e2299e5991b600198149adb22a421 | https://github.com/browser-use/browser-use/issues/4824 | Feature Request: Too strict dependency version pinning (`==`) causes dependency conflicts in complex projects

## 21. 安装坑 · 失败模式：installation: Feature request: A2A payment protocol for browser agents hiring specialized sub-agents

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Feature request: A2A payment protocol for browser agents hiring specialized sub-agents
- 对用户的影响：Developers may fail before the first successful local run: Feature request: A2A payment protocol for browser agents hiring specialized sub-agents
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature request: A2A payment protocol for browser agents hiring specialized sub-agents. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_5c32343ab00d1eded92d225f3cf53bc7 | https://github.com/browser-use/browser-use/issues/4540 | Feature request: A2A payment protocol for browser agents hiring specialized sub-agents

## 22. 安装坑 · 失败模式：installation: Protect browser-use from AI slop PRs

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this installation risk before relying on the project: Protect browser-use from AI slop PRs
- 对用户的影响：Developers may fail before the first successful local run: Protect browser-use from AI slop PRs
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Protect browser-use from AI slop PRs. Context: Observed during installation or first-run setup.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_c0e8c26254230a3dbf56b186884d9757 | https://github.com/browser-use/browser-use/issues/4825 | Protect browser-use from AI slop PRs

## 23. 安装坑 · 来源证据：Starlog published a deep-dive on browser-use/browser-use

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Starlog published a deep-dive on browser-use/browser-use
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_81c0b9a04cfd437681db8a0fa2f393f5 | https://github.com/browser-use/browser-use/issues/4747 | 来源类型 github_issue 暴露的待验证使用条件。

## 24. 配置坑 · 失败模式：configuration: Feature Request: ...

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Feature Request: ...
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Feature Request: ...
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature Request: .... Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_ac1522c411a0409ef24478af59f3e5b7 | https://github.com/browser-use/browser-use/issues/4895 | Feature Request: ...

## 25. 配置坑 · 失败模式：configuration: Where do third-party tool integrations belong in docs?

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this configuration risk before relying on the project: Where do third-party tool integrations belong in docs?
- 对用户的影响：Developers may misconfigure credentials, environment, or host setup: Where do third-party tool integrations belong in docs?
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Where do third-party tool integrations belong in docs?. Context: Observed when using python
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_f72a80256a24aebbb63863de06f2eda7 | https://github.com/browser-use/browser-use/issues/4744 | Where do third-party tool integrations belong in docs?

## 26. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:881458615 | https://github.com/browser-use/browser-use | README/documentation is current enough for a first validation pass.

## 27. 运行坑 · 失败模式：runtime: BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop
- 对用户的影响：Developers may hit a documented source-backed failure mode: BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop. Context: Observed when using python, playwright, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_4de65e1916f595ff2059c4fbdd12a003 | https://github.com/browser-use/browser-use/issues/4471 | BrowserSession fails on headless Linux: watchdog timeout and CDP WebSocket loop

## 28. 运行坑 · 失败模式：runtime: Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP he...

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)
- 对用户的影响：Developers may hit a documented source-backed failure mode: Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy). Context: Observed when using python, windows, linux
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_ab59eef511363e0eace1d72b61a3d32c | https://github.com/browser-use/browser-use/issues/4846 | Bug: MCP server connects but list_tabs/get_state/screenshot fail while navigate works (CDP healthy)

## 29. 运行坑 · 失败模式：runtime: Feature Request: Session replay / task audit trail for agent runs

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: Feature Request: Session replay / task audit trail for agent runs
- 对用户的影响：Developers may hit a documented source-backed failure mode: Feature Request: Session replay / task audit trail for agent runs
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Feature Request: Session replay / task audit trail for agent runs. Context: Observed when using playwright
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_556a0da1891e6e106ebf7c7880596276 | https://github.com/browser-use/browser-use/issues/4860 | Feature Request: Session replay / task audit trail for agent runs

## 30. 运行坑 · 失败模式：runtime: Make agent prompts cache-friendly for Gemini implicit caching

- 严重度：medium
- 证据强度：source_linked
- 发现：Developers should check this runtime risk before relying on the project: Make agent prompts cache-friendly for Gemini implicit caching
- 对用户的影响：Developers may hit a documented source-backed failure mode: Make agent prompts cache-friendly for Gemini implicit caching
- 建议检查：Before packaging this project, run the relevant install/config/quickstart check for: Make agent prompts cache-friendly for Gemini implicit caching. Context: Source discussion did not expose a precise runtime context.
- 防护动作：State this as source-backed community evidence, not as Doramagic reproduction.
- 证据：failure_mode_cluster:github_issue | fmev_7db3907e714e4339a65b289aa8f8dbb2 | https://github.com/browser-use/browser-use/issues/4887 | Make agent prompts cache-friendly for Gemini implicit caching

## 31. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:881458615 | https://github.com/browser-use/browser-use | last_activity_observed missing

## 32. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:881458615 | https://github.com/browser-use/browser-use | no_demo; severity=medium

## 33. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:881458615 | https://github.com/browser-use/browser-use | no_demo; severity=medium

## 34. 安全/权限坑 · 来源证据：Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Bug: ...Ollama integration issue: Correct model gets loaded but incorrect port is used by Agent
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_c7ca4462139746a3914b90718f0a9006 | https://github.com/browser-use/browser-use/issues/3093 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 35. 安全/权限坑 · 来源证据：Protect browser-use from AI slop PRs

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Protect browser-use from AI slop PRs
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_06f6cbda58d94425bc552709e44bbc3a | https://github.com/browser-use/browser-use/issues/4825 | 来源类型 github_issue 暴露的待验证使用条件。

## 36. 安全/权限坑 · 来源证据：Where do third-party tool integrations belong in docs?

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Where do third-party tool integrations belong in docs?
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_c5a72d1aa1264091b5d9d6e7c781945b | https://github.com/browser-use/browser-use/issues/4744 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 37. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:881458615 | https://github.com/browser-use/browser-use | issue_or_pr_quality=unknown

## 38. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:881458615 | https://github.com/browser-use/browser-use | release_recency=unknown

<!-- canonical_name: browser-use/browser-use; human_manual_source: deepwiki_human_wiki -->