# https://github.com/hangwin/mcp-chrome 项目说明书

生成时间：2026-05-16 01:25:43 UTC

## 目录

- [Introduction to Chrome MCP Server](#page-introduction)
- [Quick Start Guide](#page-quickstart)
- [System Architecture](#page-architecture)
- [Communication Protocols](#page-communication)
- [Chrome Extension Structure](#page-extension-structure)
- [Browser Tools and APIs](#page-browser-tools)
- [Record and Replay Engine](#page-record-replay)
- [MCP Server Implementation](#page-mcp-server)
- [AI Agent Engines](#page-agent-engines)
- [Storage and Data Management](#page-storage)

<a id='page-introduction'></a>

## Introduction to Chrome MCP Server

### 相关页面

相关主题：[System Architecture](#page-architecture)

<details>
<summary>Related Source Files</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/hangwin/mcp-chrome/blob/main/README.md)
- [app/native-server/src/scripts/utils.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)
- [app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)
- [app/native-server/src/scripts/browser-config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)
- [app/chrome-extension/utils/content-indexer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)
- [packages/wasm-simd/package.json](https://github.com/hangwin/mcp-chrome/blob/main/packages/wasm-simd/package.json)
</details>

# Introduction to Chrome MCP Server

Chrome MCP Server is a Model Context Protocol (MCP) implementation that bridges AI assistants with Chrome/Chromium browsers, enabling AI-powered browser automation, content analysis, and control through a comprehensive set of tools. The project consists of two main components: a Chrome extension that runs in the browser and a native server that communicates with AI clients via the MCP protocol.

## Overview and Purpose

Chrome MCP Server provides AI assistants with the ability to interact with web pages, manage browser tabs, capture screenshots, monitor network traffic, and perform automated actions. By exposing browser functionality as MCP tools, developers can create AI agents that can browse the web, fill forms, click elements, search history, and analyze page content.

资料来源：[README.md:1-20](https://github.com/hangwin/mcp-chrome/blob/main/README.md)

## System Architecture

The Chrome MCP Server architecture consists of multiple interconnected components that work together to provide browser automation capabilities.

### Component Overview

```mermaid
graph TD
    A["AI Client<br/>(Claude, etc.)"] --> B["MCP Chrome Bridge<br/>(mcp-chrome-bridge)"]
    B --> C["Native Server<br/>(Node.js)"]
    C <--> D["Chrome Extension<br/>(WXT-based)"]
    D --> E["Chrome Browser"]
    F["Web Content Scripts"] --> D
    G["Injected Scripts"] --> E
```

### Native Messaging Host Configuration

The native server registers itself as a Native Messaging Host with the operating system, enabling secure communication between native applications and Chrome extensions.

| Platform | User-Level Path | System-Level Path |
|----------|-----------------|-------------------|
| Windows | `%APPDATA%\Google\Chrome\NativeMessagingHosts\` | `%ProgramFiles%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `~/Library/Application Support/Google/Chrome/NativeMessagingHosts/` | `/Library/Google/Chrome/NativeMessagingHosts/` |
| Linux | `~/.config/google-chrome/NativeMessagingHosts/` | `/etc/opt/chrome/native-messaging-hosts/` |

资料来源：[app/native-server/src/scripts/utils.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)

The `HOST_NAME` constant identifies the native messaging host configuration file, which Chrome reads to establish the connection with the native server.

### Chrome Extension Structure

The Chrome extension is built using WXT, a modern build tool for Chrome extensions. The extension configuration includes security policies, content scripts, and web-accessible resources.

```typescript
// Key configuration from wxt.config.ts
web_accessible_resources: [
  {
    resources: [
      '/models/*',      // ML models for semantic search
      '/workers/*',     // Web Workers for background tasks
      '/inject-scripts/*', // Scripts injected into web pages
    ],
    matches: ['<all_urls>'],
  },
]
```

资料来源：[app/chrome-extension/wxt.config.ts:1-30](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

## Tool Categories

Chrome MCP Server organizes its functionality into several tool categories, each targeting specific browser interaction needs.

### Navigation and Tab Management

| Tool | Purpose |
|------|---------|
| `chrome_navigate` | Navigate to URLs with viewport control |
| `chrome_switch_tab` | Switch the current active tab |
| `chrome_close_tabs` | Close specific tabs or windows |
| `chrome_go_back_or_forward` | Browser navigation control |

These tools enable AI assistants to control the browser's navigation state and manage multiple tabs programmatically. The `chrome_close_tabs` function, for example, accepts URL patterns to match and close multiple tabs at once.

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts:1-80](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)

### Content Interaction

| Tool | Purpose |
|------|---------|
| `chrome_click_element` | Click elements using CSS selectors |
| `chrome_fill_or_select` | Fill forms and select options |
| `chrome_keyboard` | Simulate keyboard input and shortcuts |
| `chrome_inject_script` | Inject content scripts into web pages |
| `chrome_send_command_to_inject_script` | Send commands to injected content scripts |

The interaction tools support complex user interaction scenarios by translating high-level commands into browser automation actions.

### Network Monitoring

| Tool | Purpose |
|------|---------|
| `chrome_network_capture_start/stop` | Capture network requests via webRequest API |
| `chrome_network_debugger_start/stop` | Debugger API with response bodies |
| `chrome_network_request` | Send custom HTTP requests |

资料来源：[README.md:40-55](https://github.com/hangwin/mcp-chrome/blob/main/README.md)

### Content Analysis

| Tool | Purpose |
|------|---------|
| `search_tabs_content` | AI-powered semantic search across browser tabs |
| `chrome_get_web_content` | Extract HTML/text content from pages |
| `chrome_get_interactive_elements` | Find clickable elements |
| `chrome_console` | Capture and retrieve console output |

The content analysis tools leverage semantic search capabilities powered by WebAssembly-optimized math functions for cosine similarity and vector operations.

资料来源：[packages/wasm-simd/package.json:1-30](https://github.com/hangwin/mcp-chrome/blob/main/packages/wasm-simd/package.json)

### Screenshots

| Tool | Purpose |
|------|---------|
| `chrome_screenshot` | Advanced screenshot capture with element targeting, full-page support, and custom dimensions |

### Data Management

| Tool | Purpose |
|------|---------|
| `chrome_history` | Search browser history with time filters |
| `chrome_bookmark_search` | Find bookmarks by keywords |
| `chrome_bookmark_add` | Add new bookmarks with folder support |
| `chrome_bookmark_delete` | Delete bookmarks |

## Content Indexing System

The Chrome extension includes a sophisticated content indexing system that enables semantic search across open browser tabs.

```mermaid
graph LR
    A["Tab Load Complete"] --> B["ContentIndexer"]
    B --> C{"URL Excluded?"}
    C -->|Yes| D["Skip"]
    C -->|No| E["Execute web-fetcher-helper.js"]
    E --> F["Extract Metadata"]
    F --> G["Index Content"]
    G --> H["Searchable Index"]
```

The indexer automatically processes tabs when they complete loading, extracting text content, titles, and metadata. It excludes internal Chrome URLs, extensions, and local files.

资料来源：[app/chrome-extension/utils/content-indexer.ts:1-60](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)

### URL Exclusion Patterns

The content indexer skips indexing for the following URL patterns:

| Pattern | Description |
|---------|-------------|
| `chrome://*` | Internal Chrome pages |
| `chrome-extension://*` | Extension pages |
| `edge://*` | Microsoft Edge pages |
| `about:*` | About pages |
| `moz-extension://*` | Firefox extension pages |
| `file://*` | Local files |

## Web Fetcher Tool

The `chrome_get_web_content` tool provides flexible content extraction from web pages.

```typescript
interface WebFetcherToolParams {
  htmlContent?: boolean;  // Get visible HTML content
  textContent?: boolean;  // Get visible text content
  url?: string;           // Optional URL to fetch
  selector?: string;      // CSS selector for specific elements
  tabId?: number;         // Target existing tab
  background?: boolean;   // Don't activate/focus tab
  windowId?: number;      // Target window
}
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/web-fetcher.ts:1-30](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/web-fetcher.ts)

### Metadata Extraction

The injected web fetcher script extracts comprehensive metadata from web pages:

| Field | Source Priority |
|-------|-----------------|
| title | JSON-LD → Open Graph → Twitter Cards |
| byline | JSON-LD → Meta tags |
| excerpt | JSON-LD → Open Graph → Twitter |
| siteName | Open Graph |
| publishedTime | JSON-LD → Article tags |

All values are automatically unescaped from HTML entities to ensure proper formatting.

资料来源：[app/chrome-extension/inject-scripts/web-fetcher-helper.js:1-100](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/inject-scripts/web-fetcher-helper.js)

## Record and Replay System

The extension includes a record-replay engine for automating browser interactions. The wait policies ensure that automated actions wait for appropriate page states before proceeding.

```typescript
// Wait policies monitor various browser events
chrome.webNavigation.onCommitted  // Page navigation committed
chrome.webNavigation.onCompleted  // Page fully loaded
chrome.tabs.onUpdated             // Tab state changes
```

资料来源：[app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts)

## Security Configuration

The extension implements a multi-layered security model:

| Environment | COEP | COOP | CSP |
|-------------|------|------|-----|
| Development | Disabled (WXT default) | Disabled (WXT default) | Relaxed for HMR |
| Production | `require-corp` | `same-origin` | Strict |

The Content Security Policy for production enforces:
- Scripts: `'self' 'wasm-unsafe-eval'`
- Objects: `'self'`
- Styles: `'self' 'unsafe-inline'`
- Images: `'self' data: blob:`

资料来源：[app/chrome-extension/wxt.config.ts:25-40](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

## Browser Configuration Support

The native server supports multiple browsers and platforms:

| Browser | Windows | macOS | Linux |
|---------|---------|-------|-------|
| Chrome | ✅ | ✅ | ✅ |
| Chromium | ✅ | ✅ | ✅ |

The configuration system automatically detects the platform and selects appropriate paths for native messaging host manifests.

资料来源：[app/native-server/src/scripts/browser-config.ts:1-100](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts)

## Installation Flow

```mermaid
graph TD
    A["Install mcp-chrome-bridge"] --> B{"pnpm config set enable-pre-post-scripts"}
    B -->|pnpm| C["Auto-register native host"]
    B -->|npm| C
    C --> D["Load Chrome Extension"]
    D --> E["Click Extension Icon"]
    E --> F["Connect to Bridge"]
    F --> G["Get MCP Configuration"]
```

### Prerequisites

- Node.js >= 20.0.0
- pnpm/npm
- Chrome/Chromium browser

### Registration Methods

**Automatic (pnpm):**
```bash
pnpm config set enable-pre-post-scripts true
pnpm install -g mcp-chrome-bridge
```

**Manual:**
```bash
npm install -g mcp-chrome-bridge
mcp-chrome-bridge register
```

资料来源：[README.md:60-100](https://github.com/hangwin/mcp-chrome/blob/main/README.md)

## Future Roadmap

The project has planned several enhancements:

| Feature | Status |
|---------|--------|
| Authentication | Planned |
| Recording and Playback | Planned |
| Workflow Automation | Planned |
| Firefox Extension | Planned |

## Project Structure

```
mcp-chrome/
├── app/
│   ├── chrome-extension/     # WXT-based Chrome extension
│   │   ├── entrypoints/      # Extension entry points (background, sidepanel, etc.)
│   │   ├── inject-scripts/   # Scripts injected into web pages
│   │   └── utils/            # Utility modules
│   └── native-server/        # Node.js MCP server
│       └── src/
│           ├── agent/        # AI agent integration
│           └── scripts/      # Native messaging setup
└── packages/
    └── wasm-simd/           # WebAssembly SIMD math functions
```

This structure enables clear separation between the browser extension (UI and web interaction), the native server (MCP protocol and AI integration), and shared packages (common utilities and optimized algorithms).

---

<a id='page-quickstart'></a>

## Quick Start Guide

### 相关页面

相关主题：[Communication Protocols](#page-communication)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/hangwin/mcp-chrome/blob/main/README.md)
- [app/chrome-extension/README.md](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/README.md)
- [app/native-server/README.md](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/README.md)
- [app/native-server/install.md](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/install.md)
- [app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)
- [app/chrome-extension/entrypoints/sidepanel/composables/useAgentThreads.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/sidepanel/composables/useAgentThreads.ts)
</details>

# Quick Start Guide

## Overview

The **mcp-chrome** project provides a Model Context Protocol (MCP) server that enables AI assistants to control and interact with Chrome browser instances. This integration allows AI-powered automation of web browsing tasks through a comprehensive set of tools for navigation, interaction, content extraction, and network monitoring.

The project consists of two primary components:

| Component | Location | Purpose |
|-----------|----------|---------|
| Chrome Extension | `app/chrome-extension/` | Handles browser-level operations via Chrome Extension APIs |
| Native Server | `app/native-server/` | Bridges MCP clients (Claude Desktop, Cursor, etc.) with the extension |

## Architecture Overview

```mermaid
graph TD
    A[MCP Client<br/>Claude Desktop / Cursor] -->|MCP Protocol| B[Native Server<br/>mcp-chrome-bridge]
    B -->|Chrome Native Messaging| C[Chrome Extension<br/>Background Script]
    C -->|Chrome APIs| D[Browser Tabs & Content]
    E[Content Scripts] -->|Injection| D
    C -->|scripting.executeScript| E
```

## Prerequisites

Before starting, ensure the following requirements are met:

| Requirement | Version/Details |
|-------------|-----------------|
| Chrome/Chromium Browser | Version 88+ |
| Node.js | v18+ |
| npm | v8+ |
| MCP Client | Claude Desktop, Cursor, or compatible |

## Installation Steps

### 1. Chrome Extension Installation

The Chrome extension must be loaded in developer mode:

1. Open Chrome and navigate to `chrome://extensions/`
2. Enable **Developer mode** (toggle in top-right corner)
3. Click **Load unpacked**
4. Select the `app/chrome-extension/` directory from the repository

资料来源：[app/chrome-extension/README.md]()

The extension provides the following entry points:

| Entry Point | File | Purpose |
|-------------|------|---------|
| Side Panel | `entrypoints/sidepanel/` | Main AI chat interface |
| Popup | `entrypoints/popup/` | Quick access toolbar popup |
| Options | `entrypoints/options/` | Extension settings |
| Welcome | `entrypoints/welcome/` | Onboarding page |
| Builder | `entrypoints/builder/` | Workflow editor |

### 2. Native Server Setup

The native server acts as the bridge between MCP clients and the Chrome extension.

#### Installation via npm

```bash
npm install -g @mcp-chrome/native-server
```

#### Manual Installation

For manual installation, use the registry command:

```bash
# User-level installation (default)
mcp-chrome-bridge register

# System-level installation (requires admin)
mcp-chrome-bridge register --system
```

资料来源：[app/native-server/install.md]()

### 3. Registry Configuration

On **Windows**, the native messaging host must be registered in the Windows Registry:

```
HKCU\Software\Google\Chrome\NativeMessagingHosts\com.mcp.chrome
```

Or for system-wide installation:

```
HKLM\Software\Google\Chrome\NativeMessagingHosts\com.mcp.chrome
```

资料来源：[app/native-server/install.md]()

## MCP Client Configuration

### Claude Desktop

Add the following to your Claude Desktop configuration file:

| OS | Config Location |
|----|-----------------|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |

```json
{
  "mcpServers": {
    "chrome": {
      "command": "npx",
      "args": ["-y", "@mcp-chrome/native-server"]
    }
  }
}
```

资料来源：[app/native-server/README.md]()

## Available MCP Tools

### Browser Navigation (4 tools)

| Tool | Description |
|------|-------------|
| `chrome_navigate` | Navigate to URL or perform search |
| `chrome_go_back_or_forward` | Browser history navigation |
| `chrome_reload` | Reload current page |
| `chrome_control_viewport` | Zoom and scroll control |

### Tab Management (6 tools)

| Tool | Description |
|------|-------------|
| `chrome_new_tab` | Open new tab with optional URL |
| `chrome_switch_tab` | Switch to specific tab |
| `chrome_close_tabs` | Close tabs by URL pattern or tab IDs |
| `chrome_get_tabs` | List all open tabs |
| `chrome_duplicate_tab` | Duplicate current tab |

资料来源：[README.md]()

### Interaction (3 tools)

| Tool | Parameters | Description |
|------|------------|-------------|
| `chrome_click_element` | `selector` (CSS) | Click elements using CSS selectors |
| `chrome_fill_or_select` | `selector`, `value` | Fill forms and select options |
| `chrome_keyboard` | `text`, `shortcut` | Simulate keyboard input |

### Content Analysis (4 tools)

| Tool | Description |
|------|-------------|
| `chrome_get_web_content` | Extract HTML/text content from pages |
| `chrome_get_interactive_elements` | Find clickable elements |
| `chrome_console` | Capture browser console output |
| `search_tabs_content` | AI-powered semantic search across tabs |

### Screenshot & Visual (1 tool)

| Tool | Features |
|------|----------|
| `chrome_screenshot` | Element targeting, full-page capture, custom dimensions |

### Network Monitoring (4 tools)

| Tool | Description |
|------|-------------|
| `chrome_network_capture_start/stop` | webRequest API network capture |
| `chrome_network_debugger_start/stop` | Debugger API with response bodies |
| `chrome_network_request` | Send custom HTTP requests |

### Data Management (5 tools)

| Tool | Description |
|------|-------------|
| `chrome_history` | Search browser history with time filters |
| `chrome_bookmark_search` | Find bookmarks by keywords |
| `chrome_bookmark_add` | Add bookmarks with folder support |
| `chrome_bookmark_delete` | Delete bookmarks |

资料来源：[README.md]()

## Content Script Injection

The extension uses content scripts for web page interaction. Scripts are injected via the Chrome scripting API:

```typescript
await chrome.scripting.executeScript({
  target: { tabId },
  files: ['inject-scripts/web-fetcher-helper.js'],
});
```

资料来源：[app/chrome-extension/utils/content-indexer.ts]()

Web-accessible resources are configured in `wxt.config.ts`:

| Resource Path | Purpose |
|---------------|---------|
| `/models/*` | Local AI models |
| `/workers/*` | Web workers |
| `/inject-scripts/*` | Helper scripts for content scripts |

```typescript
web_accessible_resources: [
  {
    resources: [
      '/models/*',
      '/workers/*',
      '/inject-scripts/*',
    ],
    matches: ['<all_urls>'],
  },
]
```

资料来源：[app/chrome-extension/wxt.config.ts]()

## Security Configuration

The extension implements Content Security Policy (CSP) for production builds:

```typescript
content_security_policy: {
  extension_pages:
    "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:;",
}
```

Note: Security policies are disabled in development mode to allow Vite dev server resource loading.

资料来源：[app/chrome-extension/wxt.config.ts]()

## URL Exclusion Patterns

The content indexer automatically excludes internal browser URLs:

```typescript
const excludePatterns = [
  /^chrome:\/\//,
  /^chrome-extension:\/\//,
  /^edge:\/\//,
  /^about:/,
  /^moz-extension:\/\//,
  /^file:\/\//,
];
```

资料来源：[app/chrome-extension/utils/content-indexer.ts]()

## Agent Thread Visualization

The sidepanel provides visual representation of agent activities:

```typescript
// Tool kinds tracked in useAgentThreads
const toolKinds = ['run', 'grep', 'edit', 'read', 'search'] as const;
```

Each tool execution is categorized and displayed with:

| Property | Description |
|----------|-------------|
| `kind` | Tool category (edit, run, grep, etc.) |
| `title` | File name or command |
| `details` | Output content |
| `severity` | Error, success, or info status |
| `phase` | Execution phase (input, output, result) |

资料来源：[app/chrome-extension/entrypoints/sidepanel/composables/useAgentThreads.ts]()

## Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| Extension not responding | Reload the extension at `chrome://extensions/` |
| Native messaging fails | Verify registry entries on Windows |
| Permission denied | Check that extension has required permissions |
| Tab access fails | Ensure tab ID is valid and accessible |

资料来源：[app/native-server/install.md]()

### Verbose Logging

Enable verbose logging for debugging:

```bash
mcp-chrome-bridge --verbose
```

## Quick Usage Example

Once installed, you can interact with Chrome using natural language:

```
Navigate to github.com and search for mcp-chrome repositories
```

This triggers the following workflow:

```mermaid
sequenceDiagram
    participant User
    participant MCP
    participant NativeServer
    participant Extension
    participant Chrome
    
    User->>MCP: "Navigate to github.com..."
    MCP->>NativeServer: chrome_navigate
    NativeServer->>Extension: chrome.tabs.update()
    Extension->>Chrome: Navigate URL
    Chrome-->>Extension: Page loaded
    Extension-->>NativeServer: Success response
    NativeServer-->>MCP: Result
    MCP-->>User: Confirmation
```

## Next Steps

- Explore the [Workflow Editor](builder) for advanced automation
- Configure AI model settings in the [Options page](options)
- Review [Content Analysis](content-analize) capabilities
- Set up [Bookmark Management](#bookmarks) workflows

---

<a id='page-architecture'></a>

## System Architecture

### 相关页面

相关主题：[System Architecture](#page-architecture), [Chrome Extension Structure](#page-extension-structure), [MCP Server Implementation](#page-mcp-server)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)
- [app/chrome-extension/utils/content-indexer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)
- [app/native-server/src/scripts/browser-config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts)
- [app/native-server/src/scripts/utils.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)
- [app/native-server/src/agent/session-service.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/session-service.ts)
- [app/native-server/src/agent/engines/claude.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/engines/claude.ts)
</details>

# System Architecture

## Overview

mcp-chrome is a Model Context Protocol (MCP) server implementation that extends AI assistant capabilities to browser automation through a Chrome extension and native messaging host architecture. The system enables AI models to interact with web pages, control browser tabs, capture network traffic, and perform semantic search across browser content.

## High-Level Architecture

The architecture consists of three primary layers:

| Layer | Component | Responsibility |
|-------|-----------|----------------|
| Chrome Extension | Background Service Worker | Hosts MCP tools, manages tab state, handles browser API calls |
| Native Server | Node.js Application | Manages browser configuration, handles native messaging, coordinates with AI engines |
| AI Integration | Claude Engine / Codex Engine | Executes AI logic, processes tool requests, manages sessions |

资料来源：[app/chrome-extension/wxt.config.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

## Extension Entrypoint Architecture

The Chrome extension uses multiple HTML entrypoints, each serving a specific purpose:

```mermaid
graph TD
    A[Chrome Extension] --> B[Background Service Worker]
    A --> C[Popup]
    A --> D[Side Panel]
    A --> E[Options Page]
    A --> F[Welcome Page]
    A --> G[Builder]
    A --> H[Offscreen Document]
    
    B --> I[MCP Tools]
    B --> J[Tab Management]
    B --> K[Content Indexer]
    
    C --> L[Quick Actions]
    D --> M[Workflow Management]
    E --> N[Userscripts Manager]
```

### Entrypoint Components

| Entrypoint | File Path | Purpose |
|------------|-----------|---------|
| Background | `entrypoints/background/` | Service worker hosting all MCP tools |
| Popup | `entrypoints/popup/` | Quick access to AI chat interface |
| Side Panel | `entrypoints/sidepanel/` | Workflow management and agent threads |
| Builder | `entrypoints/builder/` | Visual workflow editor |
| Options | `entrypoints/options/` | Userscripts configuration |
| Welcome | `entrypoints/welcome/` | Onboarding page |
| Offscreen | `entrypoints/offscreen/` | Background document for long-running tasks |

资料来源：[app/chrome-extension/entrypoints/popup/index.html:1-12](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/popup/index.html), [app/chrome-extension/entrypoints/sidepanel/index.html:1-12](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/sidepanel/index.html)

## Native Messaging Architecture

### Cross-Platform Manifest Configuration

The native server uses platform-specific manifest paths for Chrome Native Messaging, enabling secure communication between the native application and Chrome extension:

```mermaid
graph TD
    A[Native Server] --> B{Platform Detection}
    B -->|win32| C[Windows Registry + JSON]
    B -->|darwin| D[macOS plist]
    B -->|linux| E[Linux JSON]
    
    C --> F[User Manifest]
    C --> G[System Manifest]
    D --> H[User Manifest]
    D --> I[System Manifest]
    E --> J[User Manifest]
    E --> K[System Manifest]
```

**Manifest Path Locations by Platform:**

| Platform | User-Level Path | System-Level Path |
|----------|-----------------|-------------------|
| Windows | `%APPDATA%\Google\Chrome\NativeMessagingHosts\` | `%ProgramFiles%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `~/Library/Application Support/Google/Chrome/NativeMessagingHosts/` | `/Library/Google/Chrome/NativeMessagingHosts/` |
| Linux | `~/.config/google-chrome/NativeMessagingHosts/` | `/etc/opt/chrome/native-messaging-hosts/` |

资料来源：[app/native-server/src/scripts/browser-config.ts:1-100](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts), [app/native-server/src/scripts/utils.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)

### Browser Support Matrix

The native server supports multiple Chromium-based browsers:

| Browser | Chrome | Chromium | Edge |
|---------|--------|----------|------|
| Windows Registry | `HKCU\Software\Google\Chrome\` | `HKCU\Software\Chromium\` | Not configured |
| System Registry | `HKLM\Software\Google\Chrome\` | `HKLM\Software\Chromium\` | Not configured |

## MCP Tool Architecture

### Tool Categories

The extension exposes MCP tools organized into functional categories:

```mermaid
graph LR
    A[MCP Tools] --> B[Viewport & Navigation]
    A --> C[Interaction]
    A --> D[Data Management]
    A --> E[Screenshots & Visual]
    A --> F[Network Monitoring]
    A --> G[Content Analysis]
    
    B --> B1[chrome_set_viewport]
    B --> B2[chrome_switch_tab]
    B --> B3[chrome_close_tabs]
    B --> B4[chrome_go_back_or_forward]
    
    C --> C1[chrome_click_element]
    C --> C2[chrome_fill_or_select]
    C --> C3[chrome_keyboard]
    
    D --> D1[chrome_history]
    D --> D2[chrome_bookmark_*]
    
    E --> E1[chrome_screenshot]
    
    F --> F1[chrome_network_*]
    
    G --> G1[search_tabs_content]
    G --> G2[chrome_get_web_content]
```

### Tab Management Tools

Tabs can be queried and closed using URL pattern matching:

```typescript
const tabs = await chrome.tabs.query({ url: urlPattern });
await chrome.tabs.remove(tabIdsToClose);
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts:1-80](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)

### Dialog Handling

The extension can handle JavaScript dialogs using Chrome DevTools Protocol (CDP):

```typescript
await cdpSessionManager.sendCommand(tabId, 'Page.handleJavaScriptDialog', {
  accept: action === 'accept',
  promptText: action === 'accept' ? promptText : undefined,
});
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/dialog.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/dialog.ts)

## Content Indexer System

The content indexer provides semantic search across open browser tabs:

### Indexing Logic

| Step | Action | Trigger |
|------|--------|---------|
| 1 | Detect tab update | `chrome.tabs.onUpdated` with `status === 'complete'` |
| 2 | Check semantic engine readiness | 2-second delay to allow engine initialization |
| 3 | Extract content | Execute `web-fetcher-helper.js` injection script |
| 4 | Index content | Store in semantic engine for `search_tabs_content` |

### URL Exclusion Patterns

The indexer automatically excludes internal Chrome URLs:

```typescript
const excludePatterns = [
  /^chrome:\/\//,
  /^chrome-extension:\/\//,
  /^edge:\/\//,
  /^about:/,
  /^moz-extension:\/\//,
  /^file:\/\//,
];
```

资料来源：[app/chrome-extension/utils/content-indexer.ts:1-100](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)

## Agent Session Management

### Session Service Architecture

The native server manages AI agent sessions with structured metadata:

```mermaid
graph TD
    A[SessionService] --> B[ManagementInfo]
    A --> C[AgentSessionPreviewMeta]
    A --> D[Engine Configuration]
    
    B --> B1[models]
    B --> B2[commands]
    B --> B3[mcpServers]
    B --> B4[plugins]
    B --> B5[skills]
    
    D --> D1[ClaudeEngine]
    D --> D2[CodexEngine]
```

### Session Interface

```typescript
interface ManagementInfo {
  models?: Array<{ value: string; displayName: string; description: string }>;
  commands?: Array<{ name: string; description: string; argumentHint: string }>;
  account?: { email?: string; organization?: string; subscriptionType?: string };
  mcpServers?: Array<{ name: string; status: string }>;
  tools?: string[];
  agents?: string[];
  plugins?: Array<{ name: string; path?: string }>;
  skills?: string[];
  model?: string;
  permissionMode?: string;
  cwd?: string;
}
```

资料来源：[app/native-server/src/agent/session-service.ts:1-60](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/session-service.ts)

## Claude Engine Integration

### Tool Message Processing

The Claude engine handles streaming tool execution results:

```mermaid
sequenceDiagram
    participant AI as Claude API
    participant Engine as ClaudeEngine
    participant Dispatch as Tool Dispatch
    
    AI->>Engine: content_block_start (tool_use)
    Engine->>Dispatch: Register pending tool
    AI->>Engine: content_block_delta (input_json_delta)
    Engine->>Engine: Accumulate JSON parts
    AI->>Engine: content_block_stop
    Engine->>Engine: Parse accumulated JSON
    Engine->>Dispatch: Execute tool with full input
    Dispatch->>Engine: Result content
    Engine->>AI: tool_result
```

### Tool Result Processing

```typescript
if (contentBlock.type === 'tool_result') {
  const metadata = this.buildToolResultMetadata(contentBlock);
  const content = this.extractToolResultContent(contentBlock);
  const isError = contentBlock.is_error === true;
  
  dispatchToolMessage(
    isError ? `Error: ${content || 'Tool execution failed'}` : content || 'Tool completed',
    metadata,
    'tool_result',
    false
  );
}
```

资料来源：[app/native-server/src/agent/engines/claude.ts:1-100](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/engines/claude.ts)

## Security Architecture

### Content Security Policy

The extension implements strict CSP in production:

| Policy | Value | Purpose |
|--------|-------|---------|
| script-src | `'self' 'wasm-unsafe-eval'` | Restrict script execution |
| object-src | `'self'` | Limit object embedding |
| style-src | `'self' 'unsafe-inline'` | Allow Vite-compiled styles |
| img-src | `'self' data: blob:` | Permit data URIs for thumbnails |

### Cross-Origin Policies

| Policy | Value | Development | Production |
|--------|-------|-------------|------------|
| COOP | `same-origin` | Disabled | Enabled |
| COEP | `require-corp` | Disabled | Enabled |

资料来源：[app/chrome-extension/wxt.config.ts:20-35](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

### Web Accessible Resources

The extension exposes specific directories to content scripts:

| Directory | Purpose |
|-----------|---------|
| `/models/*` | AI model files |
| `/workers/*` | Web Worker scripts |
| `/inject-scripts/*` | Content script helpers |

## Record-Replay System

The background service includes a record-replay engine for browser automation testing:

### Wait Policy

The `waitForNetworkIdle` function monitors browser events:

| Event Type | Listener | Purpose |
|------------|----------|---------|
| `onCommitted` | webNavigation | Detect navigation commits |
| `onCompleted` | webNavigation | Detect page load completion |
| `onHistoryStateUpdated` | webNavigation (optional) | Detect SPA route changes |
| `onUpdated` | tabs | Detect tab status changes |

```typescript
const onUpdated = (updatedId: number, change: chrome.tabs.TabChangeInfo) => {
  if (updatedId !== tabId) return;
  if (change.status === 'loading') mark();
  if (typeof change.url === 'string' && (!prevUrl || change.url !== prevUrl)) mark();
};
```

资料来源：[app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts:1-60](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts)

## Data Flow Diagram

```mermaid
graph LR
    A[User] -->|Command| B[AI Assistant]
    B -->|MCP Request| C[Native Server]
    C -->|Native Messaging| D[Chrome Extension]
    D -->|Chrome APIs| E[Browser]
    E -->|Response| D
    D -->|Tool Result| C
    C -->|Stream| B
    B -->|Response| A
```

## Extension Development Configuration

### Keyboard Shortcuts

| Shortcut | Windows/Linux | macOS | Action |
|----------|---------------|-------|--------|
| Quick Panel | `Alt+Shift+U` | `Cmd+Shift+U` | Toggle AI Chat |

### Vite Plugin Stack

| Plugin | Purpose |
|--------|---------|
| TailwindCSS v4 | Styling without PostCSS |
| `@vitejs/plugin-vue` | Vue component auto-registration |
| WXT | Extension framework |

资料来源：[app/chrome-extension/wxt.config.ts:40-50](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

---

<a id='page-communication'></a>

## Communication Protocols

### 相关页面

相关主题：[System Architecture](#page-architecture), [MCP Server Implementation](#page-mcp-server)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/chrome-extension/entrypoints/background/native-host.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/native-host.ts)
- [app/native-server/src/scripts/utils.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)
- [app/native-server/src/scripts/browser-config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)
- [app/native-server/README.md](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/README.md)
</details>

# Communication Protocols

## Overview

The mcp-chrome project implements a sophisticated communication architecture that bridges Chrome browser extensions with a native Node.js server. This architecture relies on multiple communication protocols working in concert to enable AI-powered browser automation through the Model Context Protocol (MCP).

The system employs two primary communication mechanisms:

1. **Native Messaging Protocol** - Chrome extension to native host communication using Chrome's `runtime.connectNative` API
2. **Internal Message Routing** - Communication between background scripts and tool handlers within the extension

## Architecture Overview

```mermaid
graph TD
    A[Chrome Extension UI/Popup] -->|chrome.runtime.sendMessage| B[Background Script]
    B -->|call_tool messages| C[Tool Handlers]
    C -->|CDP Commands| D[Chrome DevTools Protocol]
    B -->|connectNative| E[Native Host]
    E -->|stdio| F[Node.js MCP Server]
    F -->|WebSocket/CDP| G[Browser Instance]
    
    H[Content Scripts] -->|injected scripts| G
    
    I[Offscreen Document] -->|file operations| E
```

## Native Messaging Protocol

### Purpose and Scope

The Native Messaging Protocol enables secure communication between the Chrome extension and a standalone native application. This is essential for operations that require direct system access, such as file system operations, native module execution, and port management for the MCP server.

资料来源：[app/native-server/README.md](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/README.md)

### Connection Establishment

The extension initiates native connections using Chrome's `runtime.connectNative` API with a predefined application ID:

```typescript
nativePort = chrome.runtime.connectNative('com.yourcompany.fastify_native_host');
```

资料来源：[app/native-server/README.md:40](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/README.md)

### Auto-Connect Behavior

The system implements intelligent auto-connect functionality that activates on browser startup and extension installation:

```typescript
// Auto-connect on Chrome browser startup
chrome.runtime.onStartup.addListener(() => {
  void ensureNativeConnected('onStartup').catch(() => {});
});

// Auto-connect on extension install/update
chrome.runtime.onInstalled.addListener(() => {
  void ensureNativeConnected('onInstalled').catch(() => {});
});
```

资料来源：[app/chrome-extension/entrypoints/background/native-host.ts:1-20](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/native-host.ts)

### Message Types

| Message Type | Direction | Purpose |
|--------------|-----------|---------|
| `call_tool` | Extension → Native | Request tool execution |
| `ENSURE_NATIVE` | UI → Background | Trigger connection check |
| `CONNECT_NATIVE` | UI → Background | User-initiated connection |
| `forward_to_native` | Background → Native | Route messages to native host |
| `file_operation` | Extension → Native | File preparation for uploads |
| `started` | Native → Extension | Server startup confirmation |
| `stopped` | Native → Extension | Server shutdown notification |
| `error` | Native → Extension | Error reporting |

### Message Format

Messages follow a structured format with `type` and `payload` fields:

```typescript
chrome.runtime.sendMessage({
  type: 'forward_to_native',
  message: {
    type: 'file_operation',
    requestId: requestId,
    payload: {
      action: 'prepareFile',
      fileUrl,
      base64Data,
      fileName,
    },
  },
});
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts:58-75](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts)

## Internal Message Routing

### Tool Call Routing

The background script routes tool calls from various sources (UI, content scripts) to appropriate handlers:

```typescript
chrome.runtime.onMessage.addListener((message, _sender, sendResponse) => {
  // Allow UI to call tools directly
  if (message && message.type === 'call_tool' && message.name) {
    handleCallTool({ name: message.name, args: message.args })
      .then((res) => sendResponse({ success: true, result: res }))
      .catch((err) =>
        sendResponse({ success: false, error: err instanceof Error ? err.message : String(err) }),
      );
    return true;
  }
  // ... additional routing logic
});
```

资料来源：[app/chrome-extension/entrypoints/background/native-host.ts:28-45](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/native-host.ts)

### Connection Management States

```mermaid
stateDiagram-v2
    [*] --> Disconnected
    Disconnected --> Connecting: ensureNativeConnected()
    Connecting --> Connected: Port established
    Connecting --> Disconnected: Connection failed
    Connected --> Disconnected: Port disconnect
    Connected --> AutoConnectDisabled: User explicit disconnect
    AutoConnectDisabled --> Connecting: CONNECT_NATIVE message
    AutoConnectDisabled --> Connecting: ensureNativeConnected()
```

## File Operation Protocol

### File Upload Flow

The file upload mechanism uses a request-response pattern with unique request IDs:

```typescript
const requestId = `${Date.now()}-${Math.random().toString(36).substring(2, 9)}`;

const handleMessage = (message: any) => {
  if (
    message.type === 'file_operation_response' &&
    message.responseToRequestId === requestId
  ) {
    clearTimeout(timeout);
    chrome.runtime.onMessage.removeListener(handleMessage);
    
    if (message.payload?.success && message.payload?.filePath) {
      resolve(message.payload.filePath);
    } else {
      resolve(null);
    }
  }
};
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts:25-48](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts)

### Timeout Handling

File operations implement a 30-second timeout to prevent hanging connections:

```typescript
const timeout = setTimeout(() => {
  console.error('File preparation request timed out');
  resolve(null);
}, 30000); // 30 second timeout
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts:14-18](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/file-upload.ts)

## Native Host Manifest Configuration

### Platform-Specific Paths

The native messaging host manifest location varies by operating system:

| Platform | User-Level Path |
|----------|-----------------|
| Windows | `%APPDATA%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `~/Library/Application Support/Google/Chrome/NativeMessagingHosts/` |
| Linux | `~/.config/google-chrome/NativeMessagingHosts/` |

| Platform | System-Level Path |
|----------|-------------------|
| Windows | `%ProgramFiles%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `/Library/Google/Chrome/NativeMessagingHosts/` |
| Linux | `/etc/opt/chrome/native-messaging-hosts/` |

资料来源：[app/native-server/src/scripts/utils.ts:1-35](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)

### Browser-Specific Configuration

Different Chromium-based browsers have distinct manifest locations:

```typescript
switch (browser) {
  case BrowserType.CHROME:
    return path.join(home, 'Library', 'Application Support', 'Google', 'Chrome', 'NativeMessagingHosts', `${HOST_NAME}.json`);
  case BrowserType.CHROMIUM:
    return path.join(home, 'Library', 'Application Support', 'Chromium', 'NativeMessagingHosts', `${HOST_NAME}.json`);
}
```

资料来源：[app/native-server/src/scripts/browser-config.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/browser-config.ts)

### Manifest Validation

The doctor command validates manifest files for correctness:

| Validation Check | Error Message |
|------------------|---------------|
| File existence | Manifest not found |
| JSON parsing | Failed to parse manifest |
| Name field | name != HOST_NAME |
| Type field | type != stdio |
| Path field | path is missing |
| Path existence | path target does not exist |

资料来源：[app/native-server/src/scripts/doctor.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/doctor.ts)

## Tab Operation Protocol

### URL Pattern Matching

The close tabs functionality supports glob-style URL pattern matching:

```typescript
let urlPattern = url;
if (!urlPattern.includes('*')) {
  try {
    new URL(urlPattern);
    urlPattern = urlPattern.endsWith('/') ? `${urlPattern}*` : `${urlPattern}/*`;
  } catch {
    urlPattern = urlPattern.endsWith('*')
      ? urlPattern
      : urlPattern.endsWith('/')
        ? `${urlPattern}*`
        : `${urlPattern}/*`;
  }
}
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts:1-30](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)

### Tab Query and Response Format

```typescript
const tabs = await chrome.tabs.query({ url: urlPattern });
await chrome.tabs.remove(tabIdsToClose);

return {
  content: [{
    type: 'text',
    text: JSON.stringify({
      success: true,
      message: `Closed ${tabIdsToClose.length} tabs with URL: ${url}`,
      closedCount: tabIdsToClose.length,
      closedTabIds: tabIdsToClose,
    }),
  }],
  isError: false,
};
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts:60-80](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)

## Port Configuration Protocol

### Stdio Configuration

The native server communicates over a configured port for MCP operations:

```typescript
const url = new URL(configValue.url as string);
const port = Number(url.port);
const portOk = port === EXPECTED_PORT;

checks.push({
  id: 'port.config',
  title: 'Port config',
  status: portOk ? 'ok' : 'error',
  message: configValue.url as string,
  details: {
    expectedPort: EXPECTED_PORT,
    actualPort: port,
    fix: portOk ? undefined : [`${COMMAND_NAME} update-port ${EXPECTED_PORT}`],
  },
});
```

资料来源：[app/native-server/src/scripts/doctor.ts:80-100](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/doctor.ts)

## Node.js Path Management

### Version Consistency

The system ensures consistent Node.js versions between installation and runtime to avoid native module version mismatches:

```typescript
export function writeNodePathFile(distDir: string, nodeExecPath = process.execPath): void {
  try {
    const nodePathFile = path.join(distDir, 'node_path.txt');
    fs.mkdirSync(distDir, { recursive: true });
    // Write Node.js executable path for runtime consistency
  } catch (error) {
    // Error handling
  }
}
```

资料来源：[app/native-server/src/scripts/utils.ts:80-100](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/utils.ts)

## Security Considerations

### Cross-Origin Policies

Production builds implement strict COEP and COOP headers:

```typescript
cross_origin_embedder_policy: { value: 'require-corp' as const },
cross_origin_opener_policy: { value: 'same-origin' as const },
content_security_policy: {
  extension_pages: "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:;",
}
```

资料来源：[app/chrome-extension/wxt.config.ts:1-30](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

### Web Accessible Resources

Only specific resource paths are accessible to web pages:

```typescript
web_accessible_resources: [
  {
    resources: [
      '/models/*',
      '/workers/*',
      '/inject-scripts/*',
    ],
    matches: ['<all_urls>'],
  },
],
```

资料来源：[app/chrome-extension/wxt.config.ts:20-30](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

## Summary

The mcp-chrome communication architecture consists of multiple protocol layers:

1. **Native Messaging Layer** - Chrome-to-native bridge using Chrome's native messaging API
2. **Internal Message Routing** - Efficient routing of tool calls within the extension
3. **File Operation Protocol** - Request-response pattern with timeout handling
4. **Tab Management Protocol** - URL pattern-based tab operations
5. **Port Configuration Protocol** - Stdio-based MCP server communication

All protocols emphasize reliability through timeout handling, error reporting, and connection state management.

---

<a id='page-extension-structure'></a>

## Chrome Extension Structure

### 相关页面

相关主题：[Browser Tools and APIs](#page-browser-tools), [Storage and Data Management](#page-storage)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)
- [app/chrome-extension/entrypoints/background/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/index.ts)
- [app/chrome-extension/entrypoints/content.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/content.ts)
- [app/chrome-extension/entrypoints/background/native-host.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/native-host.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)
- [app/chrome-extension/entrypoints/background/record-replay/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/index.ts)
- [app/chrome-extension/utils/content-indexer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)
</details>

# Chrome Extension Structure

## Overview

The mcp-chrome project implements a Chrome Extension that serves as a bridge between a Chrome browser instance and an AI-powered automation system. The extension enables Large Language Models (LLMs) to control browser behavior, capture web content, manage tabs, inject scripts, and automate user interactions through the Model Context Protocol (MCP).

The extension is built using **WXT**, a modern framework for Chrome extension development, which handles manifest generation, build configuration, and hot module replacement. The architecture follows a multi-process model with background service workers, content scripts, and UI entrypoints communicating through Chrome's message passing APIs and native messaging.

资料来源：[app/chrome-extension/wxt.config.ts:1-100]()

---

## Architecture Overview

The Chrome extension consists of several interconnected layers:

```mermaid
graph TD
    subgraph "UI Layer"
        POPUP[Popup UI]
        SIDEPANEL[Side Panel]
        WELCOME[Welcome Page]
        OPTIONS[Options Page]
    end
    
    subgraph "Background Layer"
        BG[Background Service Worker]
        NATIVE[Native Host Bridge]
        RECORDREPLAY[Record/Replay Engine]
    end
    
    subgraph "Content Layer"
        CONTENT[Content Scripts]
        INJECT[inject-scripts]
    end
    
    subgraph "Native Layer"
        NATIVE_SERVER[mcp-chrome-bridge]
    end
    
    POPUP <-->|chrome.runtime| BG
    SIDEPANEL <-->|chrome.runtime| BG
    OPTIONS <-->|chrome.runtime| BG
    
    BG <-->|nativeMessaging| NATIVE
    BG <-->|chrome.tabs.sendMessage| CONTENT
    CONTENT <-->|window messages| INJECT
    
    NATIVE <-->|stdin/stdout| NATIVE_SERVER
    
    RECORDREPLAY <-->|chrome.scripting| CONTENT
```

资料来源：[app/chrome-extension/wxt.config.ts:20-60]()

---

## Entry Points

The extension defines multiple entry points through WXT's manifest configuration. Each entry point represents a distinct execution context within Chrome.

### Entry Point Configuration

| Entry Point | File Path | Purpose | Permissions |
|-------------|-----------|---------|-------------|
| `action` (Popup) | `popup.html` | Quick access UI with MCP configuration | Standard |
| `side_panel` | `sidepanel.html` | Workflow management interface | `sidePanel` |
| `options_ui` | `options.html` | Extension settings | Storage |
| `welcome` | `welcome/index.html` | Onboarding page | None |
| Background | `background/index.ts` | Service worker | All browser APIs |

资料来源：[app/chrome-extension/wxt.config.ts:20-55]()

### Manifest Permissions

The extension declares extensive permissions to support its automation capabilities:

```typescript
permissions: [
  'nativeMessaging',  // Native host communication
  'tabs',             // Tab management
  'activeTab',        // Active tab access
  'scripting',        // Script injection
  'contextMenus',     // Context menu creation
  'downloads',        // Download management
  'webRequest',       // Network monitoring
  'webNavigation',    // Navigation events
  'debugger',         // Debugging capabilities
  'history',          // Browser history
  'bookmarks',        // Bookmark management
  'offscreen',        // Offscreen documents
  'storage',          // Local storage
  'declarativeNetRequest', // Ad blocking
  'alarms',           // Scheduled tasks
  'sidePanel',        // Side panel access
],
host_permissions: ['<all_urls>'],
```

资料来源：[app/chrome-extension/wxt.config.ts:20-40]()

---

## Background Service Worker

The background service worker (`entrypoints/background/index.ts`) serves as the central coordinator for all extension functionality.

### Core Responsibilities

```mermaid
graph LR
    A[Incoming Messages] --> B[Type Router]
    B --> C{Message Type}
    C -->|tool_call| D[Tool Executor]
    C -->|native| E[Native Host]
    C -->|rr_trigger| F[Record/Replay]
    D --> G[chrome.* APIs]
    E --> H[Native Bridge]
    F --> I[Content Scripts]
```

### Native Host Communication

The background script maintains a persistent connection to the native `mcp-chrome-bridge` server through Chrome's native messaging API:

```typescript
// Connection lifecycle management
chrome.runtime.onStartup.addListener(() => {
  void ensureNativeConnected('onStartup').catch(() => {});
});

chrome.runtime.onInstalled.addListener(() => {
  void ensureNativeConnected('onInstalled').catch(() => {});
});
```

资料来源：[app/chrome-extension/entrypoints/background/native-host.ts:1-20]()

The native messaging follows a request-response pattern with auto-reconnect capabilities:

| Message Type | Direction | Purpose |
|--------------|-----------|---------|
| `ENSURE_NATIVE` | UI → Background | Trigger connection without changing autoConnect state |
| `CONNECT_NATIVE` | UI → Background | Explicit user-initiated connection, re-enables auto-connect |
| `call_tool` | UI → Background | Direct tool execution via message |

资料来源：[app/chrome-extension/entrypoints/background/native-host.ts:25-45]()

### Tool Execution System

Tools are executed through the `handleCallTool` function, which routes requests to appropriate handlers:

```typescript
chrome.runtime.onMessage.addListener((message, _sender, sendResponse) => {
  if (message && message.type === 'call_tool' && message.name) {
    handleCallTool({ name: message.name, args: message.args })
      .then((res) => sendResponse({ success: true, result: res }))
      .catch((err) => sendResponse({ success: false, error: err.message }));
    return true;
  }
});
```

资料来源：[app/chrome-extension/entrypoints/background/native-host.ts:30-38]()

---

## Content Scripts Architecture

Content scripts (`entrypoints/content.ts`) run in the context of web pages and provide the bridge between the background service worker and page-level JavaScript.

### Injection Strategy

The extension uses a multi-file injection approach:

```typescript
// Files injected into all frames
files: ['inject-scripts/inject-bridge.js', 'inject-scripts/accessibility-tree-helper.js']
```

| Script | Purpose |
|--------|---------|
| `inject-bridge.js` | Message communication bridge |
| `accessibility-tree-helper.js` | DOM accessibility tree generation |
| `dom-observer.js` | DOM change monitoring for triggers |
| `web-fetcher-helper.js` | Content extraction utilities |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/index.ts:15-25]()

### Script Injection Modes

Content scripts can be injected with different world contexts:

| World | Isolation | Use Case |
|-------|-----------|----------|
| `ISOLATED` | Full isolation | Script injection via `chrome.scripting.executeScript` |
| `MAIN` | Shared with page | User scripts requiring DOM access |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/index.ts:20-25]()

---

## Web-Accessible Resources

The extension exposes certain resources to web pages for content script functionality:

```typescript
web_accessible_resources: [
  {
    resources: [
      '/models/*',           // ML models for content analysis
      '/workers/*',          // Web Workers
      '/inject-scripts/*',   // Helper scripts
    ],
    matches: ['<all_urls>'],
  },
]
```

资料来源：[app/chrome-extension/wxt.config.ts:55-65]()

---

## Content Indexing System

The `ContentIndexer` class manages automatic content indexing for semantic search capabilities.

### Indexing Flow

```mermaid
graph TD
    A[Tab Load Complete] --> B{URL Valid?}
    B -->|No| C[Skip]
    B -->|Yes| D{Semantic Engine Ready?}
    D -->|No| E[Wait + Retry]
    D -->|Yes| F[Extract Content]
    F --> G[Update Index]
    
    H[Tab Closed] --> I[Remove Index]
    J[Navigation Committed] --> K[Remove Index]
```

### URL Filtering

The indexer excludes internal Chrome pages:

```typescript
private shouldIndexUrl(url: string): boolean {
  const excludePatterns = [
    /^chrome:\/\//,
    /^chrome-extension:\/\//,
    /^edge:\/\//,
    /^about:/,
    /^moz-extension:\/\//,
    /^file:\/\//,
  ];
  return !excludePatterns.some((pattern) => pattern.test(url));
}
```

资料来源：[app/chrome-extension/utils/content-indexer.ts:30-50]()

Auto-indexing is triggered with a 2-second delay after page load:

```typescript
setTimeout(() => {
  if (!this.isSemanticEngineReady() && !this.isSemanticEngineInitializing()) {
    console.log('ContentIndexer: Skipping auto-index - semantic engine not ready');
    return;
  }
  this.indexTabContent(tabId).catch((error) => {
    console.error('ContentIndexer: Auto-indexing failed:', error);
  });
}, 2000);
```

资料来源：[app/chrome-extension/utils/content-indexer.ts:10-20]()

---

## Record/Replay Engine

The record/replay system enables automation workflow recording and playback.

### Trigger Types

| Type | Description | Configuration |
|------|-------------|---------------|
| `dom` | DOM element triggers | selector, appear, once, debounceMs |
| `network` | Network request patterns | request/response matching |
| `navigation` | Page navigation events | URL patterns |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/index.ts:5-15]()

### Trigger Initialization

```typescript
const domTriggers = triggers
  .filter((x) => x.type === 'dom' && x.enabled !== false)
  .map((x: any) => ({
    id: x.id,
    selector: x.selector,
    appear: x.appear !== false,
    once: x.once !== false,
    debounceMs: x.debounceMs ?? 800,
  }));
```

资料来源：[app/chrome-extension/entrypoints/background/record-replay/index.ts:10-20]()

### Node Runtimes

The engine executes steps through node runtimes:

| Node Type | File | Capabilities |
|-----------|------|--------------|
| `scroll` | `nodes/scroll.ts` | Page/container scrolling with offset or element targeting |
| `tabs` | `nodes/tabs.ts` | Tab creation, switching, closing |
| `wait` | `engine/policies/wait.ts` | Network idle and DOM state waiting |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/nodes/scroll.ts:1-20]()

### Navigation State Tracking

```typescript
const onCommitted = (d: any) => {
  if (d.tabId === tabId && d.frameId === 0 && d.timeStamp >= startedAt) mark();
};
const onHistoryStateUpdated = (d: any) => {
  if (d.tabId === tabId && d.frameId === 0 && d.timeStamp >= startedAt) mark();
};
const onUpdated = (updatedId: number, change: chrome.tabs.TabChangeInfo) => {
  if (updatedId !== tabId) return;
  if (change.status === 'loading') mark();
  if (typeof change.url === 'string' && (!prevUrl || change.url !== prevUrl)) mark();
};
```

资料来源：[app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts:15-25]()

---

## Tool System Architecture

### Browser Tool Categories

| Category | Tools |
|----------|-------|
| Navigation | go_back_or_forward, switch_tab, close_tabs |
| Interaction | click_element, fill_or_select, keyboard |
| Data | history, bookmark_search, bookmark_add, bookmark_delete |
| Screenshot | screenshot (with element targeting, full-page support) |
| Network | network_capture_start/stop, network_debugger_start/stop, network_request |
| Content | get_web_content, get_interactive_elements, console, search_tabs_content |

资料来源：[README.md]()

### Tool Execution Flow

```mermaid
sequenceDiagram
    participant UI
    participant BG as Background Worker
    participant NATIVE as Native Bridge
    participant CHROME as Chrome APIs
    
    UI->>BG: call_tool { name, args }
    BG->>BG: Route to Handler
    alt Native Tool
        BG->>NATIVE: nativeMessage
        NATIVE->>CHROME: Browser API
        CHROME-->>NATIVE: Response
        NATIVE-->>BG: { success, result }
    else Browser Tool
        BG->>CHROME: chrome.* API
        CHROME-->>BG: Result
    end
    BG-->>UI: { success, result }
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts:1-50]()

---

## Security Configuration

The extension implements security policies for production builds:

```typescript
...(IS_DEV
  ? {}
  : {
      cross_origin_embedder_policy: { value: 'require-corp' as const },
      cross_origin_opener_policy: { value: 'same-origin' as const },
      content_security_policy: {
        extension_pages:
          "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:;",
      },
    }),
```

资料来源：[app/chrome-extension/wxt.config.ts:65-80]()

| Policy | Development | Production |
|--------|-------------|------------|
| COEP | Disabled | `require-corp` |
| COOP | Disabled | `same-origin` |
| CSP | WXT Default | Restricted to self + wasm |

---

## Keyboard Shortcuts

The extension defines keyboard commands:

```typescript
commands: {
  toggle_quick_panel: {
    suggested_key: { default: 'Ctrl+Shift+Space', mac: 'Command+Shift+Space' },
    description: 'Toggle Quick Panel AI Chat',
  },
}
```

资料来源：[app/chrome-extension/wxt.config.ts:45-55]()

---

## Side Panel Interface

The side panel (`sidepanel.html`) provides a workflow management interface with agent thread tracking:

```typescript
// Message parsing rules
if (
  normalize(toolName) === 'bash' ||
  normalize(toolName).includes('shell') ||
  typeof command === 'string'
) {
  return { kind: 'run', title: commandDescription || extractedCommand };
}

if (normalize(toolName) === 'grep' || normalize(toolName).includes('search')) {
  return { kind: 'grep', title: displayPattern };
}
```

资料来源：[app/chrome-extension/entrypoints/sidepanel/composables/useAgentThreads.ts:1-30]()

---

## Native Messaging Bridge

The native messaging layer communicates with `mcp-chrome-bridge` installed on the host system.

### Communication Flow

```mermaid
graph LR
    subgraph "Chrome Extension"
        BG[Background]
        PORT[Native Port]
    end
    
    subgraph "Host System"
        BRIDGE[mcp-chrome-bridge]
        NATIVE_HOST[Native Host JSON]
    end
    
    BG <-->|Native Messaging| PORT
    PORT <-->|stdin/stdout| BRIDGE
    BRIDGE <-->|Browser APIs| CHROME[(Chrome)]
    
    NATIVE_HOST -.->|Manifest Path| BRIDGE
```

### Manifest Configuration

Native messaging manifests are installed at platform-specific paths:

| Platform | User-Level Path |
|----------|-----------------|
| Windows | `%APPDATA%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `~/Library/Application Support/Google/Chrome/NativeMessagingHosts/` |
| Linux | `~/.config/google-chrome/NativeMessagingHosts/` |

| Platform | System-Level Path |
|----------|-------------------|
| Windows | `%ProgramFiles%\Google\Chrome\NativeMessagingHosts\` |
| macOS | `/Library/Google/Chrome/NativeMessagingHosts/` |
| Linux | `/etc/opt/chrome/native-messaging-hosts/` |

资料来源：[app/native-server/src/scripts/browser-config.ts:1-80]()

---

## State Management

The extension uses Chrome's storage API for state persistence:

| Key | Type | Purpose |
|-----|------|---------|
| `RR_TRIGGERS` | Array | Record/replay trigger configurations |
| `USERSCRIPTS_DISABLED` | Boolean | Userscript execution toggle |
| `autoConnectEnabled` | Boolean | Native connection auto-connect state |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/index.ts:5-10]()

---

## Build Configuration

The extension uses Vite with TailwindCSS v4:

```typescript
vite: (env) => ({
  plugins: [
    tailwindcss(),
    Components({
      dts: true,
    }),
  ],
}),
```

资料来源：[app/chrome-extension/wxt.config.ts:80-90]()

### Build Requirements

| Dependency | Version | Purpose |
|------------|---------|---------|
| Node.js | >= 20.0.0 | Runtime |
| pnpm/npm | Latest | Package manager |
| WXT | Latest | Build framework |
| TailwindCSS v4 | Latest | Styling |

---

## Summary

The Chrome Extension architecture of mcp-chrome follows a layered design:

1. **UI Layer**: Popup, side panel, welcome page, and options provide user-facing interfaces
2. **Background Layer**: Service worker coordinates native messaging, tool execution, and automation
3. **Content Layer**: Injected scripts enable page-level interaction and accessibility tree generation
4. **Native Layer**: External `mcp-chrome-bridge` process handles OS-level browser automation

The extension leverages Chrome's extensive APIs for tab management, network monitoring, script injection, and state persistence while maintaining security through manifest-defined permissions and CSP policies.

---

<a id='page-browser-tools'></a>

## Browser Tools and APIs

### 相关页面

相关主题：[Chrome Extension Structure](#page-extension-structure), [Record and Replay Engine](#page-record-replay)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [docs/TOOLS.md](https://github.com/hangwin/mcp-chrome/blob/main/docs/TOOLS.md)
- [app/chrome-extension/entrypoints/background/tools/browser/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/index.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/computer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/computer.ts)
- [app/chrome-extension/shared/tools.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/shared/tools.ts)
- [app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)
- [app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)
</details>

# Browser Tools and APIs

## Overview

The Browser Tools and APIs module is a core component of the mcp-chrome extension that provides AI agents with programmatic control over browser functionality. This module bridges the gap between AI decision-making and browser operations, enabling automated web navigation, content extraction, tab management, and user interaction simulation.

The system operates as a collection of Chrome extension background script tools that expose browser capabilities through a structured MCP (Model Context Protocol) interface, allowing LLM-powered agents to interact with the browser as if performing actions themselves.

资料来源：[docs/TOOLS.md](https://github.com/hangwin/mcp-chrome/blob/main/docs/TOOLS.md)

## Architecture

### High-Level Architecture

```mermaid
graph TD
    A[MCP Client / AI Agent] -->|MCP Protocol| B[Native Server]
    B -->|Native Messaging| C[Chrome Extension Background]
    C -->|Chrome APIs| D[Browser Environment]
    
    E[Content Scripts] -->|Injection| D
    F[Injected Scripts] -->|Web Fetcher| D
    
    G[Sidepanel UI] -->|User Interaction| C
    H[Popup UI] -->|Quick Actions| C
```

### Tool Registration Flow

```mermaid
sequenceDiagram
    participant T as Tool Registry
    participant B as Background Script
    participant C as Chrome API
    participant M as MCP Bridge
    
    T->>B: Register tool handlers
    B->>C: Initialize chrome.tabs listeners
    C->>M: Expose via native messaging
    M->>T: Forward tool calls
```

The extension registers tools through the shared `tools.ts` module, which defines the complete tool schema including names, descriptions, input parameters, and response formats. The background script then connects these tool definitions to actual Chrome API implementations.

资料来源：[app/chrome-extension/shared/tools.ts:1-50](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/shared/tools.ts)

## Tool Categories

The browser tools are organized into the following functional categories:

| Category | Tools | Purpose |
|----------|-------|---------|
| **Navigation** | `chrome_navigate`, `chrome_go_back_or_forward` | Page navigation and history control |
| **Tab Management** | `chrome_switch_tab`, `chrome_close_tabs`, `chrome_close_other_tabs` | Tab lifecycle management |
| **View Control** | `chrome_set_viewport`, `chrome_screenshot` | Visual viewport manipulation and capture |
| **Content Extraction** | `chrome_get_web_content`, `chrome_get_interactive_elements` | Page content retrieval |
| **User Interaction** | `chrome_click_element`, `chrome_fill_or_select`, `chrome_keyboard` | Simulated user actions |
| **Data Management** | `chrome_bookmark_*`, `chrome_history` | Bookmark and history access |
| **Network Monitoring** | `chrome_network_*` | HTTP traffic inspection |
| **Script Injection** | `chrome_inject_script`, `chrome_send_command_to_inject_script` | Custom script execution |

资料来源：[docs/TOOLS.md](https://github.com/hangwin/mcp-chrome/blob/main/docs/TOOLS.md)

## Navigation Tools

### chrome_navigate

Navigates the current tab to a specified URL or performs search queries.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `url` | `string` | Yes | Target URL or search query |
| `options` | `object` | No | Navigation options |

**Options:**

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `waitUntil` | `string` | `"networkidle2"` | When to consider navigation complete |
| `timeout` | `number` | `30000` | Navigation timeout in milliseconds |

**Implementation Details:**

The navigation tool first validates the URL, handling search queries by converting them to search engine URLs. It then attempts to find an existing tab matching the URL before creating a new one, promoting existing tabs to reduce clutter.

```typescript
// Simplified navigation flow
async function chrome_navigate(url: string, options?: NavigationOptions) {
  const normalizedUrl = validateAndNormalizeUrl(url);
  const existingTab = await findExistingTab(normalizedUrl);
  
  if (existingTab) {
    await chrome.tabs.update(existingTab.id, { active: true });
    return existingTab;
  }
  
  return await chrome.tabs.create({ url: normalizedUrl, active: true });
}
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/index.ts)

### chrome_go_back_or_forward

Navigates browser history by a specified number of steps.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `historySteps` | `number` | Yes | Number of steps (- for back, + for forward) |

## Tab Management Tools

### Tab Query System

```mermaid
graph LR
    A[URL Pattern] -->|Parse| B[Tab Filter]
    B -->|Query| C[chrome.tabs.query]
    C -->|Results| D[Tab IDs Array]
    
    E[Tab ID] -->|Direct| D
    F[Window ID] -->|Window Filter| D
```

The tab management system supports multiple query mechanisms:

- **URL Pattern Matching**: Glob-style patterns with wildcard support
- **Direct Tab ID**: Integer-based tab identification
- **Window Scoping**: Restricting operations to specific windows

**Pattern Matching Implementation:**

```typescript
private async queryTabsByUrl(urlPattern: string): Promise<number[]> {
  // Normalize pattern with proper glob-to-regex conversion
  const pattern = urlPattern
    .replace(/\./g, '\\.')
    .replace(/\*/g, '.*')
    .replace(/\?/g, '.');
  
  const tabs = await chrome.tabs.query({ url: `<all_urls>` });
  return tabs
    .filter(tab => tab.url?.match(new RegExp(pattern)))
    .map(tab => tab.id)
    .filter((id): id is number => id !== undefined);
}
```

资料来源：[app/chrome-extension/entrypoints/background/tools/browser/common.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/tools/browser/common.ts)

### chrome_close_tabs

Closes tabs matching a URL pattern or specified tab IDs.

**Response Schema:**

```typescript
interface CloseTabsResponse {
  success: boolean;
  message: string;
  closedCount: number;
  closedTabIds: number[];
}
```

**Example Response:**

```json
{
  "success": true,
  "message": "Closed 3 tabs with URL: https://github.com/*",
  "closedCount": 3,
  "closedTabIds": [123, 456, 789]
}
```

### chrome_switch_tab

Activates a specific tab by ID or switches to adjacent tabs relative to the current tab.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `tabId` | `number` | Target tab ID |
| `offset` | `number` | Relative offset from current tab |

## View Control Tools

### chrome_screenshot

Captures screenshots with multiple targeting modes.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `captureArea` | `string` | No | `"fullpage"`, `"viewport"`, or CSS selector |
| `selector` | `string` | No | CSS selector for element targeting |
| `options` | `object` | No | Screenshot options (quality, format) |

**Capture Modes:**

| Mode | Description | Use Case |
|------|-------------|----------|
| `viewport` | Current visible area | Quick preview |
| `fullpage` | Entire scrollable page | Complete documentation |
| `element` | Specific element by selector | Targeted capture |

资料来源：[docs/TOOLS.md](https://github.com/hangwin/mcp-chrome/blob/main/docs/TOOLS.md)

### chrome_set_viewport

Controls the browser viewport dimensions for responsive testing and content adaptation.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `width` | `number` | Yes | Viewport width in pixels |
| `height` | `number` | Yes | Viewport height in pixels |
| `deviceScaleFactor` | `number` | No | Device pixel ratio (default: 1) |

## Content Extraction Tools

### chrome_get_web_content

Extracts structured content from web pages including text, metadata, and semantic elements.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `extractOptions` | `object` | Configuration for content extraction |
| `metadata` | `boolean` | Include page metadata (default: true) |

**Extracted Metadata Fields:**

| Field | Source | Description |
|-------|--------|-------------|
| `title` | `<title>`, `og:title`, JSON-LD | Page title |
| `byline` | `author`, `article:author` | Content author |
| `excerpt` | `description`, `og:description` | Page summary |
| `siteName` | `og:site_name` | Website name |
| `publishedTime` | `article:published_time` | Publication date |

**Content Script Injection:**

```typescript
// Web fetcher helper extracts structured data from pages
const response = await chrome.scripting.executeScript({
  target: { tabId },
  files: ['inject-scripts/web-fetcher-helper.js'],
});
```

资料来源：[app/chrome-extension/inject-scripts/web-fetcher-helper.js](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/inject-scripts/web-fetcher-helper.js)

### chrome_get_interactive_elements

Retrieves clickable and interactive elements from a page for automation targeting.

**Response:**

```typescript
interface InteractiveElementsResponse {
  elements: Array<{
    selector: string;      // CSS selector for the element
    tagName: string;       // HTML tag name
    text: string;          // Visible text content
    attributes: Record<string, string>;
    isClickable: boolean;
    boundingRect: DOMRect;
  }>;
}
```

### search_tabs_content

AI-powered semantic search across all open browser tabs.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `string` | Semantic search query |
| `maxResults` | `number` | Maximum results to return |

## User Interaction Tools

### chrome_click_element

Simulates mouse clicks on elements identified by CSS selectors.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `selector` | `string` | Yes | CSS selector for target element |
| `button` | `string` | No | Mouse button (`left`, `middle`, `right`) |
| `clickCount` | `number` | No | Number of clicks |

### chrome_fill_or_select

Fills form inputs and selects options.

**Supported Input Types:**

| Input Type | Action |
|------------|--------|
| `input[type=text]` | Text input |
| `input[type=email]` | Email input |
| `input[type=password]` | Password input |
| `textarea` | Multi-line text |
| `select` | Dropdown selection |
| `input[type=checkbox]` | Toggle state |
| `input[type=radio]` | Radio selection |

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `selector` | `string` | Yes | Target element selector |
| `value` | `string` | Yes | Value to input or select |
| `options` | `object` | No | Additional options |

### chrome_keyboard

Simulates keyboard input and shortcuts.

**Parameters:**

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` | `string` | No | Text to type |
| `shortcuts` | `string[]` | No | Keyboard shortcuts to execute |
| `options` | `object` | No | Keyboard simulation options |

**Supported Shortcuts:**

```typescript
const shortcuts = [
  'Ctrl+C',      // Copy
  'Ctrl+V',      // Paste
  'Ctrl+A',      // Select all
  'Ctrl+K',      // Clear
  'Enter',       // Submit
  'Escape',      // Cancel/Dismiss
  'Tab',         // Next element
  'Shift+Tab',   // Previous element
];
```

## Network Monitoring Tools

### chrome_network_capture_start / stop

Controls webRequest API-based network traffic capture.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `options` | `object` | Capture configuration |
| `filterUrls` | `string[]` | URL patterns to monitor |

**Captured Data:**

| Field | Description |
|-------|-------------|
| `url` | Request URL |
| `method` | HTTP method |
| `headers` | Request/response headers |
| `status` | HTTP status code |
| `timing` | Request timing data |
| `bodySize` | Response body size |

### chrome_network_debugger_start / stop

Enables Chrome DevTools Protocol debugger for detailed request/response inspection including response bodies.

### chrome_network_request

Sends custom HTTP requests through the browser context, useful for bypassing CORS and maintaining session state.

## Script Injection System

### chrome_inject_script

Injects JavaScript files into web pages for extended functionality.

**Implementation:**

```typescript
await chrome.scripting.executeScript({
  target: { tabId },
  files: ['inject-scripts/web-fetcher-helper.js'],
});
```

**Web Accessible Resources Configuration:**

The extension allows injection of specific resource categories defined in `wxt.config.ts`:

```typescript
web_accessible_resources: [
  {
    resources: [
      '/models/*',           // ML models
      '/workers/*',          // Web workers
      '/inject-scripts/*',   // Helper scripts
    ],
    matches: ['<all_urls>'],
  },
]
```

资料来源：[app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

### chrome_send_command_to_inject_script

Sends structured commands to injected content scripts via a messaging bridge.

**Command Flow:**

```mermaid
sequenceDiagram
    participant B as Background Script
    participant C as Content Script
    participant P as Web Page
    
    B->>C: chrome.tabs.sendMessage
    C->>P: postMessage to page context
    P-->>C: Response message
    C-->>B: chrome.runtime.sendResponse
```

## Bookmark and History Tools

### Bookmark Operations

| Tool | Function |
|------|----------|
| `chrome_bookmark_search` | Search bookmarks by keywords |
| `chrome_bookmark_add` | Create new bookmarks with folder support |
| `chrome_bookmark_delete` | Remove bookmarks |

### chrome_history

Searches browser history with temporal filtering.

**Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | `string` | Search query |
| `maxResults` | `number` | Maximum entries |
| `startTime` | `number` | Unix timestamp lower bound |
| `endTime` | `number` | Unix timestamp upper bound |

## Error Handling

All browser tools implement consistent error handling patterns:

```typescript
interface ToolResponse {
  content: Array<{
    type: 'text';
    text: string;  // JSON stringified response or error
  }>;
  isError: boolean;
}
```

**Error Response Format:**

```json
{
  "success": false,
  "error": "Error description",
  "code": "ERROR_CODE"
}
```

**URL Exclusion Patterns:**

Internal Chrome URLs are automatically excluded from operations:

```typescript
const excludePatterns = [
  /^chrome:\/\//,
  /^chrome-extension:\/\//,
  /^edge:\/\//,
  /^about:/,
  /^moz-extension:\/\//,
  /^file:\/\//,
];
```

资料来源：[app/chrome-extension/utils/content-indexer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)

## Security Considerations

### Cross-Origin Policies

The extension implements COOP/COEP headers in production to enable SharedArrayBuffer and other features:

```typescript
...(IS_DEV
  ? {}
  : {
      cross_origin_embedder_policy: { value: 'require-corp' as const },
      cross_origin_opener_policy: { value: 'same-origin' as const },
    })
```

### Content Security Policy

Production builds enforce strict CSP:

```
script-src 'self' 'wasm-unsafe-eval'; 
object-src 'self'; 
style-src 'self' 'unsafe-inline'; 
img-src 'self' data: blob:;
```

资料来源：[app/chrome-extension/wxt.config.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/wxt.config.ts)

## Usage with AI Agents

The browser tools are designed for seamless integration with AI agents:

```typescript
// Example: Agent task to research a topic
const task = {
  instruction: "Search for TypeScript best practices and summarize the top 5 tips",
  tools: [
    "chrome_navigate",      // Go to search engine
    "chrome_get_web_content", // Extract results
    "chrome_screenshot",    // Document findings
  ]
};
```

The tools provide structured, parseable responses that AI agents can use for decision-making and subsequent actions, enabling complex multi-step workflows in the browser environment.

---

<a id='page-record-replay'></a>

## Record and Replay Engine

### 相关页面

相关主题：[Browser Tools and APIs](#page-browser-tools), [Storage and Data Management](#page-storage)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/chrome-extension/entrypoints/background/record-replay/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/index.ts)
- [app/chrome-extension/entrypoints/background/record-replay/recording/browser-event-listener.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/recording/browser-event-listener.ts)
- [app/chrome-extension/entrypoints/background/record-replay/rr-utils.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/rr-utils.ts)
- [app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/engine/policies/wait.ts)
</details>

# Record and Replay Engine

## Overview

The Record and Replay Engine is a Chrome extension feature within mcp-chrome that enables automatic capture and reproduction of user browser interactions. It records navigation events, user actions, and DOM interactions into a structured flow, then replays these flows to automate repetitive browser tasks.

The engine operates as a background service within the Chrome extension, using Chrome's webExtension APIs to monitor and control browser behavior across tabs and windows.

## Architecture

```mermaid
graph TD
    subgraph Recording
        BEL[Browser Event Listener] -->|captures| Session[Recording Session]
        Session -->|stores| Flow[Flow Data]
    end
    
    subgraph Replay
        FR[Flow Runner] -->|executes| Steps[Flow Steps]
        Steps -->|waits for| NI[Network Idle Detection]
        Steps -->|injects| IS[Injected Scripts]
    end
    
    subgraph Triggers
        TriggerStore[Trigger Store] -->|activates| FR
        DOMObs[DOM Observer] -->|detects| TriggerStore
    end
    
    Flow -->|loads| FR
    TriggerStore -->|manages| Flow
```

## Core Components

### Session Manager

The session manager tracks the current recording state and maintains a list of active tabs under observation.

**Key Responsibilities:**
- Track recording status (`recording`, `idle`, `paused`)
- Manage set of active tabs during recording
- Store and retrieve flow data from Chrome storage

**State Management:**
```typescript
session.getStatus() !== 'recording'  // Check if recording is active
session.addActiveTab(tabId)           // Track tab for targeted STOP
session.removeActiveTab(tabId)        // Cleanup when tab closes
session.getFlow()                     // Retrieve current flow data
```

资料来源：[browser-event-listener.ts:25-30]()

### Browser Event Listener

The event listener hooks into Chrome's navigation and tab APIs to capture user interactions during recording.

**Monitored Events:**

| Event Source | Event Type | Recording Behavior |
|--------------|------------|-------------------|
| `chrome.webNavigation.onCommitted` | Link navigation | Record if `transitionType === 'link'` |
| `chrome.webNavigation.onCommitted` | reload/typed/generated | Always record |
| `chrome.webNavigation.onCommitted` | auto_bookmark/keyword | Record |
| `chrome.webNavigation.onCommitted` | form_submit | Record for search navigations |
| `chrome.tabs.onRemoved` | Tab close | Remove from active set |
| `chrome.tabs.onUpdated` | Status/URL change | Mark navigation event |

**Transition Types Tracked:**
```typescript
const shouldRecord = 
  t === 'reload' ||
  t === 'typed' ||
  t === 'generated' ||
  t === 'auto_bookmark' ||
  t === 'keyword' ||
  t === 'form_submit';
```

资料来源：[browser-event-listener.ts:8-22]()

### Flow Runner

The flow runner executes recorded flows by iterating through steps and performing the recorded actions.

**Execution Pipeline:**
1. Load flow from storage by `flowId`
2. Resolve template variables from execution arguments
3. Execute steps sequentially
4. Handle variable assignment between steps
5. Wait for network idle between navigation steps

**Key Features:**
- Template variable expansion using `\{variable\}` syntax
- Variable assignment from step outputs via `assign` map
- Network idle detection for reliable page load waits
- Support for both DOM-based and programmatic triggers

资料来源：[index.ts:1-50]()

### DOM Trigger System

Triggers enable automatic replay initiation based on DOM element presence or changes.

**Trigger Configuration:**
```typescript
{
  id: string,
  type: 'dom',
  enabled: boolean,
  selector: string,       // CSS selector for target element
  appear: boolean,        // Trigger on element appearance
  once: boolean,          // Fire only once per session
  debounceMs: number      // Default: 800ms
}
```

**Trigger Lifecycle:**
1. Store triggers in `chrome.storage.local` under `STORAGE_KEYS.RR_TRIGGERS`
2. Inject `dom-observer.js` into target tab
3. Send trigger configuration via message to injected script
4. DOM observer monitors for selector matches
5. On match, initiate flow replay

资料来源：[index.ts:60-85]()

### Utility Functions

**applyAssign**: Maps values from a source object to a target using path notation.

```typescript
applyAssign(target, source, {
  'variableName': 'path.to.value',
  'nested.value': 'data[0].field'
});
```

**expandTemplatesDeep**: Recursively expands template variables in any value type.

```typescript
// Input: "Navigate to \{baseUrl\}/home"
// Scope: { baseUrl: 'https://example.com' }
// Output: "Navigate to https://example.com/home"
```

资料来源：[rr-utils.ts:1-50]()

## Recording Process

### 1. Initialization

When recording starts:
1. Session status transitions to `recording`
2. Active tab list is initialized
3. Event listeners are attached to Chrome APIs

### 2. Event Capture

```mermaid
sequenceDiagram
    participant U as User
    participant C as Chrome API
    participant L as Event Listener
    participant S as Session
    
    U->>C: Click/Type/Navigate
    C->>L: webNavigation.onCommitted
    L->>L: Check transition type
    L->>S: getFlow()
    S-->>L: Current flow object
    L->>S: addNavigationStep(url)
    L->>L: ensureRecorderInjected(tabId)
    L->>L: broadcastControlToTab(START)
    S->>S: addActiveTab(tabId)
```

### 3. Navigation Step Recording

Navigation events are captured with the current tab URL:

```typescript
const tab = await chrome.tabs.get(tabId);
const url = tab.url || details.url;
if (flow && url) addNavigationStep(flow, url);
```

### 4. Script Injection

After each navigation, the extension ensures content scripts are injected for replay preparation:

```typescript
await ensureCoreInjected(details.tabId);
await ensureRecorderInjected(tabId);
await broadcastControlToTab(tabId, REC_CMD.START);
```

## Replay Process

### 1. Flow Loading

Flows are retrieved by ID and validated before execution:

```typescript
const flow = await getFlow(t.flowId);
if (!flow) return;
await runFlow(flow, { args: t.args || {}, returnLogs: false });
```

### 2. Variable Expansion

Template variables in flow steps are resolved from the execution arguments:

```typescript
expandTemplatesDeep(value, { ...scope, ...args });
```

### 3. Step Execution

Each step in the flow is executed in topological order, with:
- Network idle waits between navigation steps
- DOM query waits for element visibility
- Error handling for selector mismatches

### 4. Network Idle Detection

The engine waits for network activity to settle before proceeding:

```typescript
export async function waitForNetworkIdle(
  tabId: number,
  sniffMs: number = 2000
): Promise<void>
```

**Detection Strategy:**
- Monitor `onCommitted`, `onCompleted`, `onHistoryStateUpdated`
- Track tab loading status via `tabs.onUpdated`
- Set timeout for sniff period (default: 2000ms)

资料来源：[wait.ts:1-50]()

## Data Models

### Flow Structure

```typescript
interface Flow {
  id: string;
  name?: string;
  steps: Step[];
  variables?: Record<string, any>;
  createdAt: number;
}

interface Step {
  id: string;
  type: 'navigation' | 'interaction' | 'wait';
  action?: string;
  selector?: string;
  url?: string;
  value?: any;
  assign?: Record<string, string>;  // Variable assignments
}
```

### Trigger Storage

```typescript
interface StoredTriggers {
  triggers: Array<{
    id: string;
    type: 'dom' | 'network';
    enabled: boolean;
    selector?: string;
    appear?: boolean;
    once?: boolean;
    debounceMs?: number;
  }>;
}
```

## Security Considerations

### Content Security Policy

The extension enforces strict CSP in production:

```typescript
content_security_policy: {
  extension_pages: 
    "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'; " +
    "style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:;"
}
```

### Script Injection Safety

- Scripts are injected only into controlled pages
- DOM observers use isolated worlds
- Script injection is preheated only for valid flow replays

## Configuration

### Flow Execution Options

```typescript
interface RunFlowOptions {
  args?: Record<string, string>;      // Template variable values
  returnLogs?: boolean;               // Include execution logs
  triggerId?: string;                 // Associated trigger ID
}
```

### Recording Commands

```typescript
enum REC_CMD {
  START = 'start',      // Begin recording in tab
  STOP = 'stop',        // End recording in tab
  PAUSE = 'pause',      // Pause current recording
  RESUME = 'resume'     // Resume paused recording
}
```

## Extension Points

### Custom Event Handlers

The engine supports extension through the shared utility layer:

```typescript
import { TOOL_NAMES, topoOrder, mapNodeToStep } from 'chrome-mcp-shared';
```

### Policy-Based Waiting

Additional wait policies can be registered for specialized scenarios:

```typescript
// In engine/policies/wait.ts
export { waitForNetworkIdle };
```

## Related Components

| Component | Location | Purpose |
|-----------|----------|---------|
| DOM Observer | `inject-scripts/dom-observer.js` | Monitors DOM for trigger elements |
| Web Fetcher | `inject-scripts/web-fetcher-helper.js` | Extracts page content |
| Element Marker | `inject-scripts/element-marker.js` | Visual element selection UI |

## Usage Example

**Recording a flow:**
1. Open Chrome DevTools → MCP tab
2. Click "Start Recording"
3. Perform browser actions (navigate, fill forms, click)
4. Click "Stop Recording"
5. Flow is saved with captured steps

**Replaying a flow:**
1. Load saved flow by ID
2. Optionally provide template variable values
3. Engine executes steps with network idle waits
4. Variables are assigned and can be used in subsequent steps

## References

- Chrome Extension webNavigation API: [developer.chrome.com](https://developer.chrome.com/docs/extensions/reference/api/webNavigation)
- Chrome Extension tabs API: [developer.chrome.com](https://developer.chrome.com/docs/extensions/reference/api/tabs)
- Chrome.scripting API: [developer.chrome.com](https://developer.chrome.com/docs/extensions/reference/api/scripting)

---

<a id='page-mcp-server'></a>

## MCP Server Implementation

### 相关页面

相关主题：[Communication Protocols](#page-communication), [Browser Tools and APIs](#page-browser-tools), [AI Agent Engines](#page-agent-engines)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/native-server/src/mcp/mcp-server.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/mcp/mcp-server.ts)
- [app/native-server/src/mcp/register-tools.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/mcp/register-tools.ts)
- [app/native-server/src/server/index.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/server/index.ts)
- [app/native-server/src/mcp/mcp-server-stdio.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/mcp/mcp-server-stdio.ts)
- [app/native-server/src/agent/session-service.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/session-service.ts)
- [app/native-server/src/agent/engines/claude.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/engines/claude.ts)
- [app/native-server/src/scripts/doctor.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/scripts/doctor.ts)
</details>

# MCP Server Implementation

## Overview

The MCP Server Implementation provides a bridge between AI coding assistants and Chrome browser automation through the Model Context Protocol (MCP). This component enables AI agents to interact with the browser by exposing a comprehensive set of tools for navigation, content extraction, interaction, and network monitoring.

资料来源：[app/native-server/src/mcp/register-tools.ts]()

## Architecture

### System Components

The MCP Server Implementation consists of four primary components that work together to enable AI-browser interaction:

| Component | File | Purpose |
|-----------|------|---------|
| MCP Server Core | `mcp-server.ts` | Core protocol implementation and message handling |
| Tool Registry | `register-tools.ts` | Tool registration and definition |
| Stdio Transport | `mcp-server-stdio.ts` | Standard I/O communication layer |
| Server Entry | `server/index.ts` | Server initialization and lifecycle management |

资料来源：[app/native-server/src/mcp/mcp-server.ts](), [app/native-server/src/mcp/register-tools.ts]()

### Communication Flow

```mermaid
graph TD
    A[AI Client] -->|MCP Protocol| B[Stdio Transport Layer]
    B --> C[MCP Server Core]
    C --> D[Tool Registry]
    D --> E[Chrome Extension]
    E --> F[Browser Tabs]
    F -->|Results| E
    E -->|Tool Results| D
    D -->|Structured Response| C
    C -->|JSON-RPC| B
    B -->|stdout| A
```

## MCP Server Core

The core server implementation handles the MCP protocol lifecycle, managing tool invocations, tool results, and resource operations.

资料来源：[app/native-server/src/mcp/mcp-server.ts]()

### Message Handling

The server processes incoming JSON-RPC requests and dispatches them to appropriate handlers:

```typescript
// Message dispatch pattern
dispatchToolMessage(
  isError
    ? `Error: ${content || 'Tool execution failed'}`
    : content || 'Tool completed',
  metadata,
  'tool_result',
  false,
);
```

资料来源：[app/native-server/src/agent/engines/claude.ts:50-56]()

### Tool Metadata Building

Tool metadata is constructed with full input details to provide context to AI clients:

```typescript
const metadata = buildToolMetadata({
  name: pending.toolName,
  id: pending.toolId,
  input,
});
```

资料来源：[app/native-server/src/agent/engines/claude.ts:85-89]()

## Tool Categories

### Navigation Tools (7 tools)

| Tool Name | Description | Key Parameters |
|-----------|-------------|----------------|
| `chrome_navigate` | Navigate to URLs or perform searches | `url`, `query` |
| `chrome_go_back_or_forward` | Browser navigation control | `direction` |
| `chrome_switch_tab` | Switch the current active tab | `tabId` |
| `chrome_close_tabs` | Close specific tabs or windows | `tabIds`, `windowIds` |
| `chrome_reload` | Reload current page or tabs | `tabId` |
| `chrome_inject_script` | Inject content scripts into web pages | `script`, `tabId` |
| `chrome_send_command_to_inject_script` | Send commands to injected scripts | `command`, `tabId` |

### Interaction Tools (3 tools)

| Tool Name | Description |
|-----------|-------------|
| `chrome_click_element` | Click elements using CSS selectors |
| `chrome_fill_or_select` | Fill forms and select options |
| `chrome_keyboard` | Simulate keyboard input and shortcuts |

### Data Management (5 tools)

| Tool Name | Description | Parameters |
|-----------|-------------|------------|
| `chrome_history` | Search browser history with time filters | `query`, `startTime`, `endTime` |
| `chrome_bookmark_search` | Find bookmarks by keywords | `query` |
| `chrome_bookmark_add` | Add new bookmarks with folder support | `url`, `title`, `folder` |
| `chrome_bookmark_delete` | Delete bookmarks | `bookmarkId` |

### Network Monitoring (4 tools)

| Tool Name | Description | API Used |
|-----------|-------------|----------|
| `chrome_network_capture_start/stop` | webRequest API network capture | `chrome.webRequest` |
| `chrome_network_debugger_start/stop` | Debugger API with response bodies | `chrome.debugger` |
| `chrome_network_request` | Send custom HTTP requests | `fetch` via background |

### Content Analysis (4 tools)

| Tool Name | Description |
|-----------|-------------|
| `search_tabs_content` | AI-powered semantic search across browser tabs |
| `chrome_get_web_content` | Extract HTML/text content from pages |
| `chrome_get_interactive_elements` | Find clickable elements |
| `chrome_console` | Capture and retrieve console output |

### Visual Tools (1 tool)

| Tool Name | Description | Parameters |
|-----------|-------------|------------|
| `chrome_screenshot` | Advanced screenshot capture | `element`, `fullPage`, `width`, `height` |

资料来源：[app/native-server/src/mcp/register-tools.ts]()

## Tool Parameter Extraction

The MCP server extracts structured metadata from tool invocations to provide AI clients with meaningful context:

```typescript
// Bash/shell - command extraction
if (normalizedName === 'bash' || normalizedName.includes('shell')) {
  if (typeof input.command === 'string') {
    metadata.command = input.command;
  }
  if (typeof input.description === 'string') {
    metadata.commandDescription = input.description;
  }
}
```

### Search Tool Metadata

```typescript
// Search tools (grep, glob)
if (normalizedName === 'grep' || normalizedName.includes('search')) {
  if (typeof input.pattern === 'string') metadata.pattern = input.pattern;
  if (typeof input.path === 'string') metadata.searchPath = input.path;
  if (typeof input.glob === 'string') metadata.glob = input.glob;
  if (typeof input.output_mode === 'string') metadata.outputMode = input.output_mode;
}
```

资料来源：[app/native-server/src/agent/engines/claude.ts:120-140]()

## Stdio Transport Layer

The stdio transport layer handles communication between the MCP server and AI clients through standard input/output streams.

```mermaid
sequenceDiagram
    participant AI as AI Client
    participant Stdio as Stdio Transport
    participant Server as MCP Server Core
    participant Chrome as Chrome Extension

    AI->>Stdio: JSON-RPC Request (stdin)
    Stdio->>Server: Parsed Request
    Server->>Chrome: Tool Invocation
    Chrome->>Chrome: Execute in Browser
    Chrome-->>Server: Tool Result
    Server-->>Stdio: JSON-RPC Response
    Stdio-->>AI: stdout Response
```

资料来源：[app/native-server/src/mcp/mcp-server-stdio.ts]()

### Port Configuration

The stdio transport requires proper port configuration to communicate with the Chrome extension:

```typescript
const url = new URL(configValue.url as string);
const port = Number(url.port);
const portOk = port === EXPECTED_PORT;
```

资料来源：[app/native-server/src/scripts/doctor.ts:120-128]()

## Agent Session Integration

The MCP server integrates with the agent session service to support multi-session environments:

```typescript
export interface AgentSessionPreviewMeta {
  /** Compact display text */
  displayText?: string;
  /** Client metadata for special rendering */
  clientMeta?: {
    kind?: string;
    // ...
  };
}
```

### Session Management Configuration

```typescript
export interface SessionConfig {
  mcpServers?: Record<string, unknown>;
  outputFormat?: Record<string, unknown>;
  enableFileCheckpointing?: boolean;
  sandbox?: Record<string, unknown>;
  env?: Record<string, string>;
  codexConfig?: Partial<CodexEngineConfig>;
}
```

资料来源：[app/native-server/src/agent/session-service.ts:15-35]()

## Claude Engine Integration

The MCP server works seamlessly with the Claude engine through structured tool result handling:

### Tool Input Parsing

```typescript
// Parse accumulated JSON input
const fullJsonStr = pending.inputJsonParts.join('');
let input: Record<string, unknown> = {};
try {
  if (fullJsonStr) {
    input = JSON.parse(fullJsonStr);
  }
} catch (e) {
  console.error(`[ClaudeEngine] Failed to parse tool input JSON: ${e}`);
}
```

### Content Preview Extraction

```typescript
// Write tool - content preview
if (normalizedName.includes('write') || normalizedName === 'create_file') {
  if (typeof input.content === 'string') {
    metadata.contentPreview = input.content.slice(0, 200);
    metadata.totalLines = input.content.split('\n').length;
  }
}

// Read tool - offset/limit
if (normalizedName.includes('read')) {
  if (typeof input.offset === 'number') metadata.offset = input.offset;
  if (typeof input.limit === 'number') metadata.limit = input.limit;
}
```

资料来源：[app/native-server/src/agent/engines/claude.ts:90-110]()

## Chrome Extension Communication

The MCP server communicates with the Chrome extension through a defined message protocol:

```mermaid
graph LR
    A[MCP Server] -->|chrome.runtime.sendMessage| B[Background Script]
    B --> C[Content Scripts]
    C -->|DOM Access| D[Web Pages]
    B -->|Tabs API| E[Browser Tabs]
```

### Web Accessible Resources

The extension exposes necessary resources for tool execution:

```typescript
web_accessible_resources: [
  {
    resources: [
      '/models/*',        // AI models
      '/workers/*',       // Web workers
      '/inject-scripts/*' // Content script helpers
    ],
    matches: ['<all_urls>'],
  },
]
```

资料来源：[app/chrome-extension/wxt.config.ts]()

## Error Handling

The MCP server implements comprehensive error handling:

### Error Detection Patterns

```typescript
const isError =
  meta.is_error === true ||
  meta.isError === true ||
  (typeof msg.content === 'string' && msg.content.trimStart().startsWith('Error:'));
```

### Tool Severity Classification

| Condition | Severity |
|-----------|----------|
| `isError` is true | `error` |
| Tool execution successful | `success` |
| Tool execution in progress | `info` |

## Configuration Management

### Environment-Specific Security

```typescript
...(IS_DEV
  ? {}
  : {
      cross_origin_embedder_policy: { value: 'require-corp' as const },
      cross_origin_opener_policy: { value: 'same-origin' as const },
      content_security_policy: {
        extension_pages:
          "script-src 'self' 'wasm-unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:;",
      },
    })
```

资料来源：[app/chrome-extension/wxt.config.ts]()

## Diagnostic Tools

The MCP server includes diagnostic capabilities:

| Check ID | Title | Purpose |
|----------|-------|---------|
| `port.config` | Port config | Verify stdio-config.json port |
| `port.constant` | Port constant | Verify native server port constant |

### Doctor Script Output

```typescript
checks.push({
  id: 'port.config',
  title: 'Port config',
  status: portOk ? 'ok' : 'error',
  message: configValue.url as string,
  details: {
    expectedPort: EXPECTED_PORT,
    actualPort: port,
    fix: portOk ? undefined : [`${COMMAND_NAME} update-port ${EXPECTED_PORT}`],
  },
});
```

资料来源：[app/native-server/src/scripts/doctor.ts:130-145]()

## Summary

The MCP Server Implementation provides a robust bridge between AI coding assistants and Chrome browser automation. Key features include:

- **Protocol Compliance**: Full MCP protocol implementation with JSON-RPC message handling
- **Comprehensive Tools**: 24+ tools covering navigation, interaction, data management, network monitoring, and content analysis
- **Structured Metadata**: Rich tool result metadata for AI context understanding
- **Flexible Transport**: Stdio-based communication for easy integration with various AI clients
- **Security**: Environment-aware security policies and proper isolation
- **Debugging**: Built-in diagnostic capabilities for troubleshooting connectivity issues

---

<a id='page-agent-engines'></a>

## AI Agent Engines

### 相关页面

相关主题：[MCP Server Implementation](#page-mcp-server), [Browser Tools and APIs](#page-browser-tools)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/native-server/src/agent/engines/claude.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/engines/claude.ts)
- [app/native-server/src/agent/engines/codex.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/engines/codex.ts)
- [app/native-server/src/agent/tool-bridge.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/tool-bridge.ts)
- [app/native-server/src/agent/chat-service.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/chat-service.ts)
- [packages/shared/src/agent-types.ts](https://github.com/hangwin/mcp-chrome/blob/main/packages/shared/src/agent-types.ts)
</details>

# AI Agent Engines

## Overview

The AI Agent Engines system is a core architectural component of the MCP Chrome project that provides an abstraction layer for interacting with different LLM (Large Language Model) backends. This module enables the browser extension to leverage various AI providers (Claude, Codex) for intelligent automation, workflow execution, and browser control.

The engine system follows a unified interface pattern, allowing seamless switching between different AI providers while maintaining consistent behavior for tool execution, message handling, and state management.

资料来源：[app/native-server/src/agent/engines/claude.ts:1-50]()
资料来源：[app/native-server/src/agent/engines/codex.ts:1-50]()

## Architecture

### System Components

```mermaid
graph TD
    A[Chat Service] --> B[Tool Bridge]
    B --> C[Claude Engine]
    B --> D[Codex Engine]
    C --> E[Claude API]
    D --> F[Codex API]
    E --> G[Stream Events]
    F --> G
    G --> H[Tool Dispatcher]
    H --> I[Browser/Tools]
```

### Supported Engines

| Engine | Provider | Status | Primary Use Case |
|--------|----------|--------|------------------|
| Claude | Anthropic | Active | General AI assistance, code generation |
| Codex | OpenAI | Active | Code-focused tasks, GitHub integration |

资料来源：[app/native-server/src/agent/engines/claude.ts:1-30]()
资料来源：[app/native-server/src/agent/engines/codex.ts:1-30]()

## Engine Interface

### Base Capabilities

All agent engines implement a common interface that handles:

- **Message Streaming**: Real-time event streaming from AI providers
- **Tool Execution**: Routing tool calls to appropriate handlers
- **Content Parsing**: Processing multi-modal content blocks
- **Error Handling**: Graceful error management and recovery

### Event Processing

The engines process events through a unified event loop:

```mermaid
sequenceDiagram
    participant API as AI API
    participant Engine as Agent Engine
    participant Bridge as Tool Bridge
    participant Browser as Browser/Tools
    
    API->>Engine: content_block_start
    Engine->>Engine: accumulateToolInput()
    API->>Engine: content_block_delta
    Engine->>Engine: parseToolInput()
    API->>Engine: content_block_stop
    Engine->>Bridge: dispatchToolMessage()
    Bridge->>Browser: executeTool()
    Browser->>Bridge: toolResult
    Bridge->>Engine: forwardResult
    Engine->>API: streamResponse
```

资料来源：[app/native-server/src/agent/engines/claude.ts:50-120]()
资料来源：[app/native-server/src/agent/engines/codex.ts:50-100]()

## Claude Engine

### Overview

The Claude Engine (`claude.ts`) implements the Anthropic Claude API integration, handling streaming responses and tool execution through the Claude Messages API.

### Core Features

**Content Block Handling**

The engine processes different content block types:

- `content_block_start`: Initializes tool input accumulation
- `content_block_delta`: Accumulates tool input JSON chunks
- `content_block_stop`: Finalizes and parses accumulated tool input

**Tool Result Processing**

```typescript
// Extract tool result content from content blocks
const extractToolResultContent = (contentBlock) => {
  if (contentBlock.type === 'tool_result') {
    return contentBlock.content?.text || null;
  }
  return null;
};
```

资料来源：[app/native-server/src/agent/engines/claude.ts:80-100]()

**Error Handling**

The engine detects errors through multiple indicators:

| Error Indicator | Source | Priority |
|-----------------|--------|----------|
| `is_error: true` | API response | High |
| `isError: true` | Metadata | High |
| Content starts with "Error:" | Message content | Medium |

资料来源：[app/native-server/src/agent/engines/claude.ts:40-60]()

### Metadata Building

The Claude engine constructs comprehensive tool metadata:

```typescript
const buildToolMetadata = ({
  name: pending.toolName,
  id: pending.toolId,
  input,
}) => {
  // Returns structured metadata for tool invocation
};
```

资料来源：[app/native-server/src/agent/engines/claude.ts:100-130]()

## Codex Engine

### Overview

The Codex Engine (`codex.ts`) integrates with OpenAI's Codex API, providing specialized handling for code-related tasks and GitHub integration.

### Core Features

**Tool Message Dispatch**

The engine implements a sophisticated tool dispatching system:

```typescript
const dispatchToolMessage = (
  content: string,
  metadata: Record<string, unknown>,
  type: string,
  isUpdate: boolean
) => {
  // Dispatches tool execution results with metadata
};
```

资料来源：[app/native-server/src/agent/engines/codex.ts:30-60]()

**Todo List Management**

The Codex engine includes specialized support for task tracking:

```mermaid
graph LR
    A[Agent Response] --> B[emitTodoListUpdate]
    B --> C{Phase}
    C -->|started| D[Tool Use Event]
    C -->|update| D
    C -->|completed| E[Tool Result Event]
    
    F[Raw Items] --> G[normalizeTodoListItems]
    G --> H[buildTodoListContent]
```

**Item Event Types**

| Event Type | Handler | Output |
|------------|---------|--------|
| `command_execution` | `emitCommandStart` | Command start message |
| `todo_list` | `emitTodoListUpdate` | Todo list state |
| `agent_message` | Text extraction | Text content |

资料来源：[app/native-server/src/agent/engines/codex.ts:60-100]()

**Command Execution Tracking**

```typescript
const emitCommandStart = (record: Record<string, unknown>) => {
  const command = this.pickFirstString(record.command);
  const description = this.pickFirstString(record.description);
  // Emits command start event for UI tracking
};
```

资料来源：[app/native-server/src/agent/engines/codex.ts:40-50]()

## Tool Bridge Integration

### Purpose

The Tool Bridge (`tool-bridge.ts`) acts as the intermediary between agent engines and actual tool implementations, providing:

- **Tool Routing**: Directs tool calls to appropriate handlers
- **Parameter Transformation**: Converts engine-specific formats to tool formats
- **Result Formatting**: Standardizes tool responses for engines

### Tool Execution Flow

```mermaid
graph LR
    A[Engine Tool Call] --> B[Tool Bridge]
    B --> C{toolId match?}
    C -->|Yes| D[Execute Tool]
    C -->|No| E[Error Handler]
    D --> F[Format Result]
    E --> G[Error Response]
    F --> H[Return to Engine]
```

## Chat Service Coordination

### Service Layer

The Chat Service (`chat-service.ts`) orchestrates the interaction between user interfaces and agent engines:

```mermaid
graph TD
    A[User Request] --> B[Chat Service]
    B --> C[Select Engine]
    C --> D[Claude or Codex]
    D --> E[Stream Events]
    E --> F[Tool Bridge]
    F --> G[Execute Tools]
    G --> H[Return Results]
    H --> E
    E --> I[User Response]
```

### Message Handling

All engines communicate through a standardized message format defined in `agent-types.ts`:

| Field | Type | Description |
|-------|------|-------------|
| `content` | string | Message text content |
| `metadata` | object | Engine-specific metadata |
| `type` | string | Message type (tool_use, tool_result, etc.) |
| `isUpdate` | boolean | Whether this is an incremental update |

资料来源：[packages/shared/src/agent-types.ts:1-50]()

## Common Patterns

### Tool Input Accumulation

Both engines implement a similar pattern for handling streaming tool inputs:

1. **Start**: Initialize input accumulator on `content_block_start`
2. **Delta**: Append JSON chunks on `content_block_delta`
3. **Stop**: Parse complete JSON on `content_block_stop`

```typescript
// Pseudocode for accumulation pattern
const pendingToolInputs = new Map<number, PendingInput>();

function handleContentBlockStart(index, toolName, toolId) {
  pendingToolInputs.set(index, {
    toolName,
    toolId,
    inputJsonParts: []
  });
}

function handleContentBlockDelta(index, chunk) {
  const pending = pendingToolInputs.get(index);
  if (pending) {
    pending.inputJsonParts.push(chunk);
  }
}
```

### Error Detection Strategy

Engines use a layered error detection approach:

| Layer | Check | Action |
|-------|-------|--------|
| 1 | `is_error` flag | Immediate error |
| 2 | `isError` flag | Error from metadata |
| 3 | Content prefix | Parse "Error:" prefix |

### Severity Classification

Tool execution results are classified by severity:

| Severity | Trigger | Visual Indicator |
|----------|---------|------------------|
| `error` | Error detected | Red highlight |
| `success` | Tool completed | Green checkmark |
| `info` | In progress | Neutral/informational |

## Configuration

### Engine Selection

Engines are selected based on:

- User preference/settings
- Task requirements (code vs. general)
- API availability

### Environment Variables

| Variable | Engine | Purpose |
|----------|--------|---------|
| `ANTHROPIC_API_KEY` | Claude | Authentication |
| `OPENAI_API_KEY` | Codex | Authentication |
| `API_BASE_URL` | Both | Endpoint configuration |

## Extension Points

### Adding New Engines

To add a new engine:

1. Create engine file in `engines/` directory
2. Implement common interface methods
3. Register in engine factory/registry
4. Add tool handlers in Tool Bridge

### Custom Tool Handlers

Tool handlers can be extended by:

1. Implementing handler in `tool-bridge.ts`
2. Registering handler in tool registry
3. Adding type definitions in `agent-types.ts`

## Best Practices

### Error Handling

- Always check multiple error indicators
- Provide meaningful error messages
- Log errors with context for debugging

### Streaming

- Handle partial content gracefully
- Accumulate tool inputs properly
- Update UI incrementally

### Resource Management

- Clean up pending inputs after processing
- Limit retry attempts for failed calls
- Monitor API rate limits

## Related Documentation

- [Chrome Extension Architecture](../architecture.md)
- [Tool System](../tools/overview.md)
- [API Reference](../api/reference.md)

---

<a id='page-storage'></a>

## Storage and Data Management

### 相关页面

相关主题：[Chrome Extension Structure](#page-extension-structure), [Record and Replay Engine](#page-record-replay)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [app/chrome-extension/entrypoints/background/record-replay/storage/indexeddb-manager.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/storage/indexeddb-manager.ts)
- [app/chrome-extension/entrypoints/background/record-replay/storage/flows.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/record-replay/storage/flows.ts)
- [app/native-server/src/agent/db/client.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/db/client.ts)
- [app/native-server/src/agent/db/schema.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/db/schema.ts)
- [app/chrome-extension/utils/content-indexer.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/utils/content-indexer.ts)
- [app/native-server/src/agent/session-service.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/native-server/src/agent/session-service.ts)
- [app/chrome-extension/entrypoints/background/semantic-similarity.ts](https://github.com/hangwin/mcp-chrome/blob/main/app/chrome-extension/entrypoints/background/semantic-similarity.ts)
</details>

# Storage and Data Management

## Overview

The mcp-chrome project implements a dual-layer storage architecture that separates concerns between the Chrome Extension (client-side) and the Native Server (server-side). This design enables the extension to operate independently for recording, replaying, and managing browser-related data while delegating persistent storage of agent sessions and workflows to the native backend.

资料来源：[app/native-server/src/agent/db/client.ts:1-50]()

The storage system handles several key data domains:

| Data Domain | Storage Layer | Primary Use |
|-------------|---------------|-------------|
| Recording Flows | IndexedDB | Browser session recording and replay |
| Semantic Index | chrome.storage.local | Content indexing and similarity search |
| Agent Sessions | SQLite (better-sqlite3) | Persistent agent workflow management |
| Model State | chrome.storage.local | ML model download and status tracking |
| Tab Content | IndexedDB | Web page content caching |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/storage/flows.ts:1-30]()
资料来源：[app/native-server/src/agent/session-service.ts:1-80]()

## Architecture Overview

```mermaid
graph TD
    subgraph ChromeExtension["Chrome Extension"]
        A[IndexedDB] -->|Flow Recording| B[indexeddb-manager.ts]
        C[chrome.storage.local] -->|Settings & State| D[Background Scripts]
        E[Semantic Engine] -->|Indexing| F[content-indexer.ts]
    end
    
    subgraph NativeServer["Native Server"]
        G[SQLite Database] -->|Sessions| H[db/client.ts]
        I[Schema Definitions] -->|Tables| G
        J[Session Service] -->|CRUD| H
    end
    
    K[MCP Protocol] -->|Communication| L[Native Messaging]
    
    B -->|Storage Ops| A
    F -->|Cache| C
    H -->|Async Queries| G
    J -->|Session Mgmt| H
```

资料来源：[app/native-server/src/agent/db/schema.ts:1-100]()

## Chrome Extension Storage

### IndexedDB Manager

The IndexedDB manager (`indexeddb-manager.ts`) provides structured access to browser-native storage for recording flows and session data.

#### Core Operations

| Method | Purpose |
|--------|---------|
| `initialize()` | Open/create IndexedDB database |
| `getFlow(id)` | Retrieve a recording flow by ID |
| `getAllFlows()` | List all stored flows |
| `saveFlow(flow)` | Persist a new or updated flow |
| `deleteFlow(id)` | Remove a flow from storage |
| `clearAll()` | Wipe all stored data |

资料来源：[app/chrome-extension/entrypoints/background/record-replay/storage/indexeddb-manager.ts:1-150]()

#### Database Schema

```typescript
interface FlowRecord {
  id: string;
  name: string;
  createdAt: number;
  updatedAt: number;
  steps: FlowStep[];
  triggers?: TriggerConfig[];
  args?: Record<string, unknown>;
}
```

The IndexedDB schema supports:
- **Flows**: Complete recording sessions with metadata
- **Steps**: Individual navigation and interaction steps
- **Triggers**: DOM-based or URL-based automation triggers
- **Args**: Configuration parameters for flow execution

资料来源：[app/chrome-extension/entrypoints/background/record-replay/storage/flows.ts:1-50]()

### Flow Storage Module

The `flows.ts` module wraps the IndexedDB manager with higher-level operations for flow management.

```mermaid
graph LR
    A[Recording Start] --> B[Create Flow Record]
    B --> C[Capture Events]
    C --> D[Append Steps]
    D --> E[Trigger Detection]
    E --> F[Save to IndexedDB]
    F --> G[Recording Complete]
```

#### Flow Lifecycle

1. **Creation**: Initialize a new flow with timestamp and metadata
2. **Capture**: Record DOM events, network requests, and navigation
3. **Trigger Binding**: Associate DOM selectors or URL patterns
4. **Persistence**: Batch write to IndexedDB on completion or intervals
5. **Retrieval**: Load flows for replay with full state reconstruction

资料来源：[app/chrome-extension/entrypoints/background/record-replay/storage/flows.ts:50-150]()

### chrome.storage.local Usage

The extension uses Chrome's `storage.local` API for lightweight, synchronous state management:

```typescript
// Model state tracking
const modelState = {
  status: string;
  downloadProgress: number;
  isDownloading: boolean;
  lastUpdated: number;
  errorMessage: string;
  errorType: 'network' | 'file' | 'unknown';
};

// Semantic engine state
const semanticEngineState = {
  isReady: boolean;
  isInitializing: boolean;
  modelPath?: string;
};
```

资料来源：[app/chrome-extension/entrypoints/background/semantic-similarity.ts:1-80]()

#### Storage Keys

| Key | Type | Purpose |
|-----|------|---------|
| `modelState` | Object | ML model download status |
| `semanticEngineState` | Object | Content indexing engine status |
| `RR_TRIGGERS` | Array | Record-replay DOM triggers |
| `contentIndex` | Object | Cached page content metadata |

资料来源：[app/chrome-extension/utils/content-indexer.ts:1-60]()

## Native Server Database

### Database Client

The native server uses `better-sqlite3` for synchronous, high-performance SQLite access.

```typescript
import Database from 'better-sqlite3';

class AgentDatabase {
  private db: Database.Database;
  
  constructor(dbPath: string);
  initialize(): void;
  close(): void;
  // ... query methods
}
```

资料来源：[app/native-server/src/agent/db/client.ts:1-80]()

#### Configuration

| Parameter | Default | Description |
|-----------|---------|-------------|
| `dbPath` | `~/.claude/mcp-chrome.db` | SQLite database file path |
| WAL Mode | Enabled | Write-Ahead Logging for concurrency |
| Foreign Keys | Enabled | Referential integrity enforcement |

资料来源：[app/native-server/src/agent/db/client.ts:80-120]()

### Database Schema

```mermaid
erDiagram
    SESSIONS {
        string id PK
        string name
        string description
        string engine
        timestamp created_at
        timestamp updated_at
        json metadata
    }
    
    MESSAGES {
        string id PK
        string session_id FK
        string role
        text content
        timestamp created_at
        json attachments
        json raw
    }
    
    MESSAGES }o--|| SESSIONS : belongs_to
```

#### Sessions Table

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `id` | TEXT | PRIMARY KEY | Unique session identifier |
| `name` | TEXT | NOT NULL | Display name for session |
| `description` | TEXT | | Session description |
| `engine` | TEXT | | AI engine (claude, codex, etc.) |
| `created_at` | INTEGER | NOT NULL | Unix timestamp |
| `updated_at` | INTEGER | NOT NULL | Last modification time |
| `metadata` | TEXT | | JSON-encoded metadata |

资料来源：[app/native-server/src/agent/db/schema.ts:1-100]()

#### Messages Table

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `id` | TEXT | PRIMARY KEY | Message UUID |
| `session_id` | TEXT | FOREIGN KEY | Parent session reference |
| `role` | TEXT | NOT NULL | Message role (user/assistant/system) |
| `content` | TEXT | | Message text content |
| `created_at` | INTEGER | NOT NULL | Timestamp |
| `attachments` | TEXT | | JSON array of attachments |
| `raw` | TEXT | | Raw metadata JSON |

资料来源：[app/native-server/src/agent/db/schema.ts:100-200]()

### Session Service

The session service provides the high-level interface for session and message management:

```typescript
export interface ManagementInfo {
  models?: Array<{ value: string; displayName: string; description: string }>;
  commands?: Array<{ name: string; description: string; argumentHint: string }>;
  account?: { email?: string; organization?: string; subscriptionType?: string };
  mcpServers?: Array<{ name: string; status: string }>;
  tools?: string[];
  agents?: string[];
  plugins?: Array<{ name: string; path?: string }>;
  skills?: string[];
  slashCommands?: string[];
  model?: string;
  permissionMode?: string;
  cwd?: string;
  outputStyle?: string;
  betas?: string[];
  claudeCodeVersion?: string;
  apiKeySource?: string;
  lastUpdated?: string;
}
```

资料来源：[app/native-server/src/agent/session-service.ts:30-70]()

#### Session Preview Metadata

```typescript
export interface AgentSessionPreviewMeta {
  displayText?: string;  // Compact display text
  clientMeta?: {
    kind?: 'web_editor_apply_batch' | 'web_editor_apply_single';
    elementCount?: number;
  };
}
```

This metadata enables special UI rendering for web editor apply operations.

资料来源：[app/native-server/src/agent/session-service.ts:75-90]()

## Content Indexing System

### ContentIndexer Class

The content indexer manages semantic search capabilities for browser tab content:

```typescript
class ContentIndexer {
  private options: IndexerOptions;
  private semanticEngine: SemanticEngine;
  
  async indexTabContent(tabId: number): Promise<void>;
  async removeTabIndex(tabId: number): Promise<void>;
  private shouldIndexUrl(url: string): boolean;
  private async extractTabContent(tabId: number): Promise<ContentResult>;
}
```

资料来源：[app/chrome-extension/utils/content-indexer.ts:20-80]()

### URL Filtering

Certain URLs are automatically excluded from indexing:

| Pattern | Reason |
|---------|--------|
| `chrome://*` | Chrome internal pages |
| `chrome-extension://*` | Extension pages |
| `edge://*` | Edge internal pages |
| `about:*` | Browser about pages |
| `moz-extension://*` | Firefox extension pages |
| `file://*` | Local file system |

资料来源：[app/chrome-extension/utils/content-indexer.ts:50-70]()

### Auto-Indexing Behavior

```mermaid
graph TD
    A[Tab Load Complete] --> B{URL Valid?}
    B -->|No| C[Skip Indexing]
    B -->|Yes| D{Engine Ready?}
    D -->|No| E[Wait 2 seconds]
    E --> D
    D -->|Yes| F[Execute Fetcher Script]
    F --> G[Extract Page Content]
    G --> H[Update Semantic Index]
    H --> I[Store in IndexedDB]
```

The auto-indexing is triggered with a 2-second delay to allow dynamic content to load.

资料来源：[app/chrome-extension/utils/content-indexer.ts:10-40]()

## Data Flow Summary

### Recording Flow

```mermaid
sequenceDiagram
    participant U as User
    participant E as Extension
    participant IDB as IndexedDB
    participant NS as Native Server
    
    U->>E: Start Recording
    E->>IDB: Create Flow Record
    loop User Actions
        E->>E: Capture Event
        E->>IDB: Append Step
    end
    U->>E: Stop Recording
    E->>IDB: Finalize Flow
    E->>NS: Sync Flow Metadata
```

### Replay Flow

```mermaid
sequenceDiagram
    participant T as Trigger
    participant E as Extension
    participant IDB as IndexedDB
    participant B as Browser
    
    T->>E: Trigger Matched
    E->>IDB: Load Flow
    loop Flow Steps
        E->>E: Process Step
        E->>B: Execute Action
    end
```

## Error Handling

Both storage layers implement robust error handling:

| Layer | Error Strategy |
|-------|----------------|
| IndexedDB | Graceful degradation, console logging |
| chrome.storage | Fallback messaging to background script |
| SQLite | Transaction rollback, WAL recovery |
| Network sync | Retry with exponential backoff |

资料来源：[app/chrome-extension/entrypoints/offscreen/main.ts:40-80]()

## Best Practices

1. **Batch Writes**: Group multiple IndexedDB operations to reduce I/O
2. **Index Maintenance**: Periodically clean stale tab indices
3. **Transaction Scope**: Keep SQLite transactions short to prevent locks
4. **Memory Management**: Release DOM references after content extraction
5. **Storage Quotas**: Monitor chrome.storage.local usage for extension limits

---

---

## Doramagic 踩坑日志

项目：hangwin/mcp-chrome

摘要：发现 16 个潜在踩坑项，其中 2 个为 high/blocking；最高优先级：安装坑 - 来源证据：Bug: Singleton McpServer causes 'Already connected to a transport' on HTTP endpoint。

## 1. 安装坑 · 来源证据：Bug: Singleton McpServer causes 'Already connected to a transport' on HTTP endpoint

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Bug: Singleton McpServer causes 'Already connected to a transport' on HTTP endpoint
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ba76885913a744f0832665f5c62d0f1d | https://github.com/hangwin/mcp-chrome/issues/321 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 2. 安装坑 · 来源证据：Opencode cannot use chrome-mcp via mcp-server-stdio on Windows (Failed to connect to MCP server)

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Opencode cannot use chrome-mcp via mcp-server-stdio on Windows (Failed to connect to MCP server)
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_4ac5f778b292489482bfdb7953190f96 | https://github.com/hangwin/mcp-chrome/issues/319 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 3. 安装坑 · 来源证据：Codex CLI setup guide still points to ~/.codex/config.json and a port-only flow Codex does not pick up

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Codex CLI setup guide still points to ~/.codex/config.json and a port-only flow Codex does not pick up
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_559f34b86ade48e598dac510e9341b9e | https://github.com/hangwin/mcp-chrome/issues/339 | 来源类型 github_issue 暴露的待验证使用条件。

## 4. 安装坑 · 来源证据：[Feature] Profile-aware bridge — multi-Chrome-profile support without port collision

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Feature] Profile-aware bridge — multi-Chrome-profile support without port collision
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_93bfb3c85fbc4c0a8134e7159ddf4511 | https://github.com/hangwin/mcp-chrome/issues/347 | 来源类型 github_issue 暴露的待验证使用条件。

## 5. 安装坑 · 来源证据：mcp-chrome-bridge 无法使用

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：mcp-chrome-bridge 无法使用
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_5f19fb6526174af2abe5c1eab67e25ff | https://github.com/hangwin/mcp-chrome/issues/333 | 来源讨论提到 npm 相关条件，需在安装/试用前复核。

## 6. 安装坑 · 来源证据：🐛 Status indicator stuck on yellow - Service shows as "not started" after connection

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：🐛 Status indicator stuck on yellow - Service shows as "not started" after connection
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_2b32536c95a7456788ba78e1fc705116 | https://github.com/hangwin/mcp-chrome/issues/342 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 7. 配置坑 · 可能修改宿主 AI 配置

- 严重度：medium
- 证据强度：source_linked
- 发现：项目面向 Claude/Cursor/Codex/Gemini/OpenCode 等宿主，或安装命令涉及用户配置目录。
- 对用户的影响：安装可能改变本机 AI 工具行为，用户需要知道写入位置和回滚方法。
- 建议检查：列出会写入的配置文件、目录和卸载/回滚步骤。
- 防护动作：涉及宿主配置目录时必须给回滚路径，不能只给安装命令。
- 证据：capability.host_targets | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | host_targets=mcp_host, claude

## 8. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | README/documentation is current enough for a first validation pass.

## 9. 运行坑 · 来源证据：v0.0.6

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：v0.0.6
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_035dffafcbe64f4ea112f20258430e71 | https://github.com/hangwin/mcp-chrome/releases/tag/v0.0.6 | 来源类型 github_release 暴露的待验证使用条件。

## 10. 维护坑 · 来源证据：[Feature/Bug] Multi-client MCP support — second Claude Code session kills first via shared MCP Server singleton

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：[Feature/Bug] Multi-client MCP support — second Claude Code session kills first via shared MCP Server singleton
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_4d4eaf47bd284105af5632bbe220b628 | https://github.com/hangwin/mcp-chrome/issues/345 | 来源类型 github_issue 暴露的待验证使用条件。

## 11. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | last_activity_observed missing

## 12. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | no_demo; severity=medium

## 13. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | no_demo; severity=medium

## 14. 安全/权限坑 · 来源证据：issue

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：issue
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_82ad93635a5e4ff5aeb2b35e36cbd02e | https://github.com/hangwin/mcp-chrome/issues/330 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 15. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | issue_or_pr_quality=unknown

## 16. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:998796026 | https://github.com/hangwin/mcp-chrome | release_recency=unknown

<!-- canonical_name: hangwin/mcp-chrome; human_manual_source: deepwiki_human_wiki -->