# https://github.com/AIPexStudio/AIPex Project Manual

Generated at: 2026-06-22 06:05:53 UTC

## Table of Contents

- [Repository Overview & Architecture](#page-1)
- [Browser Automation: DOM Snapshot, Locator & Tools](#page-2)
- [AI Chat, MCP Bridge & Skills](#page-3)
- [Extension UI, Build, Deployment & Operations](#page-4)

<a id='page-1'></a>

## Repository Overview & Architecture

### Related Pages

Related topics: [Browser Automation: DOM Snapshot, Locator & Tools](#page-2), [AI Chat, MCP Bridge & Skills](#page-3), [Extension UI, Build, Deployment & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md)
- [package.json](https://github.com/AIPexStudio/AIPex/blob/main/package.json)
- [mcp-bridge/package.json](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/package.json)
- [packages/core/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/package.json)
- [packages/aipex-react/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/package.json)
- [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json)
- [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json)
- [packages/dom-snapshot/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/package.json)
- [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md)
- [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md)
- [packages/browser-runtime/src/skill/lib/services/skill-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/skill/lib/services/skill-manager.ts)
- [packages/browser-runtime/src/lib/vm/zenfs-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/lib/vm/zenfs-manager.ts)
- [packages/browser-ext/src/lib/message-adapter.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/message-adapter.ts)
- [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx)
- [packages/browser-ext/src/pages/options/file-components/FilePreview.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/file-components/FilePreview.tsx)
- [packages/browser-ext/src/services/share-conversation.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/share-conversation.ts)
</details>

# Repository Overview & Architecture

## 1. Purpose and Scope

AIPex is an open-source browser automation project that lets users drive a Chromium-based browser with natural-language commands. Per the top-level [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md), the project's headline positioning is "Automate your browser with natural language commands — The open source browser-use solution." The repository combines:

- A published Chrome / Edge extension (the user-facing product).
- A pure-TypeScript AI agent framework reusable outside the extension.
- An MCP bridge that exposes AIPex tools to external agents such as Cursor, Claude Code, and GitHub Copilot.
- A reusable DOM snapshot library that avoids the Chrome DevTools Protocol (CDP) Accessibility Tree.

According to [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json), the extension is published under the name `@aipexstudio/cool-aipex` with display name "AIPex". The `README.md` confirms distribution on the Chrome Web Store and Microsoft Edge Add-ons. Firefox support is tracked as an open enhancement request in [issue #1](https://github.com/AIPexStudio/AIPex/issues/1).

## 2. Repository Layout

The root [package.json](https://github.com/AIPexStudio/AIPex/blob/main/package.json) declares `private: true` and configures `workspaces: ["packages/*"]`, making this a pnpm monorepo. The root scripts provide workspace-wide commands such as `build`, `dev`, `preflight` (format + lint + typecheck + test), and `lint:dependencies` (via `knip`). A `postinstall` hook runs `prek install` to install pre-commit hooks.

The monorepo contains six top-level deliverables:

| Path | Package name | Role | Source of record |
|---|---|---|---|
| `packages/core` | `@aipexstudio/aipex-core` | Pure-TS AI agent framework, depends on `@openai/agents` | [packages/core/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/package.json) |
| `packages/aipex-react` | `@aipexstudio/aipex-react` | React UI toolkit, hooks, chat & omni components, adapters | [packages/aipex-react/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/package.json) |
| `packages/browser-runtime` | `@aipexstudio/browser-runtime` | Browser automation runtime, CDP/DOM locator strategies, VM, ZenFS, skill manager | [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json) |
| `packages/browser-ext` | `@aipexstudio/cool-aipex` | The AIPex browser extension (Side Panel, Options, Chatbot UI, MCP bridge panel) | [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json) |
| `packages/dom-snapshot` | `@aipexstudio/dom-snapshot` | DOM-based accessibility snapshot library | [packages/dom-snapshot/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/package.json) |
| `mcp-bridge/` | (MCP bridge server) | Standalone WebSocket bridge exposing AIPex over MCP | [mcp-bridge/package.json](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/package.json) |

## 3. Architecture and Data Flow

AIPex follows a layered design. The extension is the user-facing shell; below it, the runtime exposes 30+ browser tools; the core package provides the agent loop; and the MCP bridge lets external agents reuse the same tools.

```mermaid
flowchart TB
  subgraph Client["Browser (Chrome / Edge)"]
    EXT["@aipexstudio/cool-aipex<br/>(Side Panel + Options UI)"]
    UI["aipex-react components<br/>(chatbot, omni)"]
    DOM["@aipexstudio/dom-snapshot<br/>(pure-DOM AXTree)"]
  end
  subgraph Runtime["Browser Runtime Layer"]
    BR["@aipexstudio/browser-runtime<br/>CDP & DOM locators, VM, ZenFS, skills"]
  end
  subgraph Core["Agent Layer"]
    CORE["@aipexstudio/aipex-core<br/>Agent loop (Zod, lru-cache)"]
    SDK["@openai/agents + extensions"]
  end
  subgraph External["External Agents"]
    MCP["mcp-bridge (WebSocket + MCP)"]
    EXT_AGENTS["Cursor / Claude Code / Copilot"]
  end

  EXT --> UI --> BR
  BR --> DOM
  BR --> CORE --> SDK
  MCP -. tools/list, tools/call .-> BR
  EXT_AGENTS --> MCP
```

Key data flows:

- **Snapshot capture** — [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md) documents `collectDomSnapshotInPage()` and `collectDomSnapshot(document, options)` producing a serialized tree with stable `data-aipex-nodeid` attributes, hidden-element filtering, and same-origin iframe traversal.
- **Tool execution** — [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md) describes two locator modes (`"cdp"` and `"dom"`), `SmartLocator` / `DomLocator` element handles, and runtime contracts (`RuntimeAddon`, `NoopBrowserAutomationHost`, `InMemoryOmniActionRegistry`, `NullInterventionHost`, `NoopContextProvider`).
- **Sandboxed execution** — [packages/browser-runtime/src/lib/vm/zenfs-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/lib/vm/zenfs-manager.ts) shows file-system and rename primitives backed by QuickJS + ZenFS, including extension whitelisting for text previews surfaced through [packages/browser-ext/src/pages/options/file-components/FilePreview.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/file-components/FilePreview.tsx).
- **Skill lifecycle** — [packages/browser-runtime/src/skill/lib/services/skill-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/skill/lib/services/skill-manager.ts) bootstraps built-in skills (e.g. `wcag22-a11y-audit`) into IndexedDB with metadata describing version, capabilities, and enabled state.
- **UI ↔ runtime messaging** — [packages/browser-ext/src/lib/message-adapter.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/message-adapter.ts) converts between runtime `UIMessage` shapes and tool-call parts, handling business failures returned as `{ success: false, error }`.
- **MCP integration** — [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx) documents that the bridge exposes `tools/list` and `tools/call` over MCP and only accepts localhost connections (127.0.0.1, ::1).

## 4. Cross-Cutting Concerns and Community-Known Limitations

Several recurring themes from community issues directly map to architectural decisions:

- **Browser coverage** — Per [issue #1](https://github.com/AIPexStudio/AIPex/issues/1), Firefox support is still open. The README only advertises Chrome and Edge store badges, and the codebase relies on Chromium-specific extension APIs, so Firefox would require a framework-level refactor.
- **Repository size** — [issue #53](https://github.com/AIPexStudio/AIPex/issues/53) tracks reducing `size-pack` from ~97 MiB. This is consistent with the heavy native dependencies listed in [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json) (`@jitl/quickjs-ng-wasmfile-release-sync`, `@zenfs/core`).
- **Iframe handling in snapshots** — [issue #100](https://github.com/AIPexStudio/AIPex/issues/100) reports that `search_elements` does not recurse into iframes. [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md) confirms the current scope is "same-origin iframe support" — cross-origin iframes remain out of scope.
- **MCP multi-session** — [issue #202](https://github.com/AIPexStudio/AIPex/issues/202) reports that only one Claude session can connect at a time. [mcp-bridge/package.json](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/package.json) lists `websocket` and `bridge` among its keywords, and the [mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx) notice restricts access to localhost only, which together explain the single-client limitation.
- **Conversation sharing** — [packages/browser-ext/src/services/share-conversation.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/share-conversation.ts) shows that shareable conversations are uploaded via `fetch` with an auth cookie; screenshots are intentionally omitted from the share payload, which is a deliberate privacy choice rather than a bug.

Together these patterns position AIPex as a layered monorepo: a stable core agent and snapshot library, a feature-rich runtime, a React UI toolkit, and a thin extension shell that can be re-targeted (in principle) at additional hosts beyond Chromium.

## See Also

- [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md) — DOM snapshot API and iframe scope.
- [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md) — CDP/DOM locator contracts and test setup.
- [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md) — Advanced MCP bridge options.
- [skill/SKILL.md](https://github.com/AIPexStudio/AIPex/blob/main/skill/SKILL.md) — The `aipex-browser` skill definition for agent runtimes.
- [DEVELOPMENT.md](https://github.com/AIPexStudio/AIPex/blob/main/DEVELOPMENT.md) — Local development setup.

---

<a id='page-2'></a>

## Browser Automation: DOM Snapshot, Locator & Tools

### Related Pages

Related topics: [Repository Overview & Architecture](#page-1), [AI Chat, MCP Bridge & Skills](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md)
- [packages/dom-snapshot/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/package.json)
- [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md)
- [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json)
- [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json)
- [packages/browser-ext/src/services/tool-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/tool-manager.ts)
- [packages/browser-ext/src/pages/common/app-root.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/common/app-root.tsx)
- [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx)
- [packages/aipex-react/src/lib/screenshot-utils.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/src/lib/screenshot-utils.ts)
- [packages/browser-runtime/src/skill/lib/services/skill-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/skill/lib/services/skill-manager.ts)
- [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md)
</details>

# Browser Automation: DOM Snapshot, Locator & Tools

## Overview

AIPex is an open-source browser-automation agent that lets users drive Chrome/Edge with natural-language commands. The automation stack is split across the monorepo into two cooperating layers: `@aipexstudio/dom-snapshot`, a pure-JS/TS page serializer, and `@aipexstudio/browser-runtime`, which packages that serializer with CDP-based automation, React-agnostic locators, and a curated `FunctionTool` bundle exposed to LLM agents. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md) and [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md).

The package manifest confirms the dependency direction: `browser-runtime` declares `@aipexstudio/dom-snapshot: workspace:*` as a dependency, and `browser-ext` (the actual Chrome MV3 extension) consumes `browser-runtime` and `aipex-core`. Source: [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json).

```mermaid
flowchart LR
  LLM[LLM Agent] -->|FunctionTool call| Tools[allBrowserTools]
  Tools --> Locator{Strategy}
  Locator -->|cdp| Smart[SmartLocator / DebuggerManager]
  Locator -->|dom| Dom[DomLocator / DomElementHandle]
  Smart --> Page[CDP via chrome.debugger]
  Dom --> Snap[collectDomSnapshot]
  Snap --> Iframe[Same-origin iframe traversal]
  Snap --> Text[buildTextSnapshot / formatSnapshot]
  Text -->|searchSnapshotText| Tools
```

## DOM Snapshot: Collection, Formatting, Search

`@aipexstudio/dom-snapshot` was created specifically to avoid the Chrome DevTools Protocol (CDP) Accessibility Tree dependency, which the README cites as having "browser dependency," "performance overhead," and "complex setup" drawbacks. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md).

The public API exposes two top-level helpers:

- `collectDomSnapshotInPage()` — current document.
- `collectDomSnapshot(document, options?)` — explicit document with options (`maxTextLength`, `includeHidden`, `captureTextNodes`).

Each call returns a serialized tree (`root`, `idToNode`, `totalNodes`, `metadata.url`) and stamps every interactive element with a stable `data-aipex-nodeid` attribute. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md).

Text rendering is a two-step pipeline. `buildTextSnapshot` produces a `TextSnapshot`; `formatSnapshot` emits the human/LLM-readable form with three prefix markers: `*` for the focused element, `→` for ancestors of focus, and a single space for normal nodes. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md).

`searchSnapshotText(formatted, query, options?)` supports both literal and `|`-separated multi-term queries, plus glob patterns when `useGlob` is enabled. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md).

The release history shows steady refinement of the serializer: cursor-pointer detection for interactive elements (v0.0.14), explicit accessibility-label support (v0.0.13), and StaticText extraction for non-interactive containers (v0.0.11). Source: [GitHub Releases v0.0.11–v0.0.14](https://github.com/AIPexStudio/AIPex/releases).

## Locators and Element Handles

`browser-runtime` ships two parallel locator strategies, both surfaced through `SnapshotManager`:

| Component | Strategy | Use when |
|---|---|---|
| `SmartLocator` / `SmartElementHandle` | `cdp` | Chrome/Edge with `chrome.debugger` available; needs full AXTree |
| `DomLocator` / `DomElementHandle` | `dom` | Any browser context; works without CDP using `@aipexstudio/dom-snapshot` |

Source: [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md).

The DOM-based path automatically tracks coordinate offsets for nested same-origin iframes, which is necessary because an element's bounding rectangle is only valid relative to its own document. The CDP path is the higher-fidelity option when an extension can attach the debugger to a tab.

## Tools Bundle and Tool Manager

`allBrowserTools` is the curated agent surface — 32 `FunctionTool`s grouped into seven categories: tabs (7), UI ops (7), page content (4), screenshots (2), downloads (2), interventions (4), and skills (6). Source: [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md).

The browser extension wraps that bundle in a `ToolManager` singleton that supports dynamic registration/unregistration, subscriber notifications via `ToolEventType`, and exposes category metadata. Source: [packages/browser-ext/src/services/tool-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/tool-manager.ts).

Key tool names used by the agent:

- `search_elements` — runs `searchSnapshotText` over the current snapshot.
- `click`, `hover_element_by_uid`, `fill_element_by_uid`, `fill_form`, `get_editor_value`.
- `get_page_metadata`, `scroll_to_element`, `highlight_element`, `highlight_text_inline`.
- `capture_screenshot`, `capture_tab_screenshot` — see `isCaptureScreenshotTool` in [packages/aipex-react/src/lib/screenshot-utils.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/src/lib/screenshot-utils.ts).
- Skill tools: `load_skill`, `execute_skill_script`, plus four others. Built-in skills such as `wcag22-a11y-audit` are auto-seeded into IndexedDB. Source: [packages/browser-runtime/src/skill/lib/services/skill-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/src/skill/lib/services/skill-manager.ts).

The extension wires these into the side-panel chat via the `useBrowserTools` hook inside `app-root.tsx`, which composes `BrowserContextProvider`s, `ChromeStorageAdapter`s, and the `InputModeProvider`. Source: [packages/browser-ext/src/pages/common/app-root.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/common/app-root.tsx).

## Known Limitations and Community Issues

Several open issues are worth flagging for any contributor touching this stack:

- **Iframe traversal is not recursive across nested iframes** — Issue [#100](https://github.com/AIPexStudio/AIPex/issues/100) reports that `search_elements` cannot locate elements inside pages with iframes and points to chrome-devtools-mcp / Playwright / Stagehand as reference implementations. The README explicitly states only "same-origin iframe support" today. Source: [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md).
- **Firefox support is not implemented** — Issue [#1](https://github.com/AIPexStudio/AIPex/issues/1) tracks this. Because the current runtime depends on `chrome.*` APIs and `indexedDB`, porting would require the planned framework refactor mentioned in the issue. Source: [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md).
- **"Requested device not found"** — Issue [#80](https://github.com/AIPexStudio/AIPex/issues/80) discusses a runtime error in the screenshot/automation path; reviewers should check device-targeting flags when reproducing.
- **MCP bridge is single-session** — Issue [#202](https://github.com/AIPexStudio/AIPex/issues/202) notes that opening a second Claude window fails to connect; the bridge currently accepts only one localhost consumer at a time. Source: [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx).
- **Repository size** — Issue [#53](https://github.com/AIPexStudio/AIPex/issues/53) tracks a 97 MiB pack size; contributors should be mindful before adding large binary assets to history.

## See Also

- [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md) — top-level project introduction and installation
- `packages/aipex-core` — platform-agnostic agent loop and event bus
- `packages/aipex-react` — React UI toolkit consumed by the side panel
- `packages/browser-ext` — the Manifest V3 extension that ships the user-facing product

---

<a id='page-3'></a>

## AI Chat, MCP Bridge & Skills

### Related Pages

Related topics: [Repository Overview & Architecture](#page-1), [Browser Automation: DOM Snapshot, Locator & Tools](#page-2), [Extension UI, Build, Deployment & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md)
- [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md)
- [mcp-bridge/package.json](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/package.json)
- [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx)
- [packages/browser-ext/src/lib/ai-provider.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/ai-provider.ts)
- [packages/core/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/README.md)
- [packages/core/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/package.json)
- [packages/aipex-react/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/README.md)
- [packages/aipex-react/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/package.json)
- [packages/aipex-react/src/lib/models.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/src/lib/models.ts)
- [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json)
- [packages/dom-snapshot/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md)
- [package.json](https://github.com/AIPexStudio/AIPex/blob/main/package.json)
</details>

# AI Chat, MCP Bridge & Skills

## Overview

AIPex exposes a layered stack that lets an end user (or an external AI agent) drive a real browser through natural-language commands. The three pieces that sit on top of the browser runtime are:

1. **AI Chat** — the in-extension assistant rendered by `@aipexstudio/aipex-react` and powered by `@aipexstudio/aipex-core` (which wraps `@openai/agents`). Source: [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md), [packages/core/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/README.md).
2. **MCP Bridge** — a local Node service (`mcp-bridge/`) and an in-extension panel that expose AIPex browser tools to external coding agents via the [Model Context Protocol](https://modelcontextprotocol.io). Source: [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md).
3. **Skill** — a packaged `aipex-browser` skill definition in `skill/SKILL.md` that ships tool-usage strategy, schemas, and automation patterns to skill-protocol agents such as Claude Code. Source: [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md).

The repository is a pnpm workspace monorepo; the relevant packages are declared under `workspaces` in the root and the extension depends on the core and react packages. Source: [package.json](https://github.com/AIPexStudio/AIPex/blob/main/package.json), [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json).

## AI Chat Layer

### Core Agent (`@aipexstudio/aipex-core`)

The core package is platform-agnostic — it has no Chrome API or Node-only assumptions — and exposes a streaming-first agent API. `AIPex.chat()` returns an `AsyncGenerator<AgentEvent>` that emits structured events such as `content_delta`, `tool_call_start`, `tool_call_complete`, `tool_call_args_streaming_start` and `tool_call_args_streaming_complete`. Source: [packages/core/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/README.md).

Optional layers that productize the agent include:

- Conversation/session management (persistence, listing, forking).
- Conversation compression for long-running sessions.
- A `ContextManager` / `ContextProvider` system for attaching external data to a turn.
- A schema-first `ToolRegistry`.

The runtime depends on `@openai/agents`, `@openai/agents-extensions`, `lru-cache`, and `zod`, with optional peer integrations for OpenAI, Anthropic, Google, and OpenRouter. Source: [packages/core/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/package.json).

### React UI (`@aipexstudio/aipex-react`)

The React package depends only on core and provides a drop-in chat surface, headless hooks, and extension building blocks:

- `<Chatbot />` and related component slots/themes.
- `useChat` and `ChatAdapter` (converts `AgentEvent` streams into UI messages).
- `<SettingsPage />` and `useChatConfig` with `KeyValueStorage` (localStorage by default, swappable for `chrome.storage` or IndexedDB).
- `<ContentScript />`, `<Omni />`, intervention cards, and a fake mouse cursor.
- Optional i18n and theme providers via subpath exports.

Source: [packages/aipex-react/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/README.md) and [packages/aipex-react/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/package.json).

### AI Provider Factory

The extension resolves an AI provider in `ai-provider.ts` using one of two modes:

- **BYOK (Bring Your Own Key)** — the user supplies `apiKey` plus optional `baseURL`. Providers are constructed via `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, and `@ai-sdk/openai-compatible`.
- **Proxy mode** — falls back to `https://www.claudechrome.com/api/ai` with cookie-based auth. The default model in this mode is `deepseek/deepseek-chat-v3.1`.

Host URLs are validated to reject private/internal addresses (SSRF guard). Source: [packages/browser-ext/src/lib/ai-provider.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/ai-provider.ts).

The model catalog is fetched from `https://www.claudechrome.com/api/models`, cached under the `cachedModelList` storage key, and price-bucketed into `cheap`, `normal`, or `expensive` tiers. Source: [packages/aipex-react/src/lib/models.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/src/lib/models.ts).

## MCP Bridge

### Architecture

The bridge is a local server plus an in-extension sidecar that lets external coding agents drive the user's browser. It supports **multiple simultaneous clients** (Cursor, Claude Code, VS Code Copilot) over StreamableHTTP. Source: [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md).

```mermaid
flowchart LR
  A[Cursor] -->|HTTP POST /mcp| S
  B[Claude Code] -->|HTTP POST /mcp| S
  C[VS Code Copilot] -->|HTTP POST /mcp| S
  S[aipex-mcp-server<br/>localhost:9223] -->|WebSocket| E[AIPex Extension]
  E -->|chrome.* APIs| BR[Browser]
  classDef ext fill:#eef,stroke:#557;
  class E,BR ext;
```

### Endpoints

| Endpoint | Protocol | Purpose |
| --- | --- | --- |
| `/mcp` | StreamableHTTP | MCP client entry point for agents |
| `/extension` | WebSocket | Bidirectional link to the AIPex extension |
| `/health` | HTTP | Liveness check |

The default listen port is `9223` and only `127.0.0.1` / `::1` connections are accepted, as reflected in the options page UI. Source: [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx), [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md).

### Client Configuration

Run the server with `npx aipex-mcp-server` and point an agent at `http://localhost:9223/mcp`. Example for Cursor (`.cursor/mcp.json`):

```json
{
  "mcpServers": {
    "aipex-browser": { "url": "http://localhost:9223/mcp" }
  }
}
```

Claude Code users can run `claude mcp add --transport http aipex-browser http://localhost:9223/mcp`. Source: [mcp-bridge/README.md](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md).

### In-Extension Panel

The options page exposes a connection panel that shows status (`connected`, `connecting`, `disconnected`), reconnect attempt counters, the `since <time>` label, and a URL input that defaults to `ws://localhost:9223`. It exposes `tools/list` and `tools/call` over MCP and enforces the localhost-only rule. Source: [packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/mcp-bridge-panel.tsx).

### Known Limitation

Issue #202 reports that running multiple Claude sessions against the bridge causes the second window to fail to connect while the first one keeps working — a symptom of the current single-session assumption in the bridge. Source: [community context — #202](https://github.com/AIPexStudio/AIPex/issues/202).

## Skills

AIPex ships an `aipex-browser` skill — a packaged skill definition for agents that speak the skill protocol (Claude Code, OpenClaw-compatible runtimes). The bundle includes tool-usage strategy, full parameter schemas for the 30+ browser tools, and common automation patterns so an agent can drive the browser without rediscovering the tool surface. The canonical definition lives at `skill/SKILL.md`. Source: [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md).

Skills are loaded by the agent runtime; the chat layer does not embed them. They complement the MCP bridge: an agent that uses the skill can be paired with the bridge to translate high-level intentions into extension tool calls.

## See Also

- [DOM Snapshot Library](https://github.com/AIPexStudio/AIPex/blob/main/packages/dom-snapshot/README.md) — the structured page representation the chat and MCP tools reason over.
- [MCP Bridge Server](https://github.com/AIPexStudio/AIPex/blob/main/mcp-bridge/README.md) — configuration and endpoint reference.
- [Core Agent README](https://github.com/AIPexStudio/AIPex/blob/main/packages/core/README.md) — `AgentEvent` stream and tool registry.
- [React Toolkit README](https://github.com/AIPexStudio/AIPex/blob/main/packages/aipex-react/README.md) — `<Chatbot />`, hooks, and settings.
- Community tracking: Firefox support (#1), repository size (#53), iframe-aware snapshots (#100), "Requested device not found" (#80).

---

<a id='page-4'></a>

## Extension UI, Build, Deployment & Operations

### Related Pages

Related topics: [Repository Overview & Architecture](#page-1), [AI Chat, MCP Bridge & Skills](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [package.json](https://github.com/AIPexStudio/AIPex/blob/main/package.json)
- [README.md](https://github.com/AIPexStudio/AIPex/blob/main/README.md)
- [packages/browser-ext/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/package.json)
- [packages/browser-runtime/package.json](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/package.json)
- [packages/browser-ext/src/services/tool-manager.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/tool-manager.ts)
- [packages/browser-ext/src/services/share-conversation.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/services/share-conversation.ts)
- [packages/browser-ext/src/lib/automation-mode-toolbar.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/automation-mode-toolbar.tsx)
- [packages/browser-ext/src/lib/message-adapter.ts](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/lib/message-adapter.ts)
- [packages/browser-ext/src/pages/options/file-components/FilePreview.tsx](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-ext/src/pages/options/file-components/FilePreview.tsx)
- [packages/browser-runtime/README.md](https://github.com/AIPexStudio/AIPex/blob/main/packages/browser-runtime/README.md)
</details>

# Extension UI, Build, Deployment & Operations

## Overview

AIPex is a browser automation platform distributed as a browser extension that lets users control their browser using natural-language commands. The product is delivered as a pnpm monorepo containing the browser extension package (`@aipexstudio/cool-aipex`), a runtime package (`@aipexstudio/browser-runtime`), the core agent package (`@aipexstudio/aipex-core`), and a DOM snapshot package (`@aipexstudio/dom-snapshot`). The extension UI is built with Vite, React, TypeScript, and Tailwind CSS, and is wired into a chat-based agent that uses the Vercel AI SDK and OpenAI Agents SDK to call into 30+ browser tools.

This page covers the extension's user-facing UI surfaces, how the monorepo and extension are built, what the deployment artifact looks like, and the operational concerns (tool registry, message adaptation, conversation sharing) that govern day-to-day behavior.

## Extension UI Surfaces

### Workspace layout

The root `package.json` declares a `workspaces: ["packages/*"]` arrangement with pnpm. The browser extension is the umbrella workspace, declared in `packages/browser-ext/package.json` as `@aipexstudio/cool-aipex` with display name "AIPex" and version `0.1.0`. It pulls in `@aipexstudio/aipex-core`, `@aipexstudio/aipex-react`, and `@aipexstudio/browser-runtime` via `workspace:*` references, so the extension is always built against the in-repo versions of the runtime and core packages.

Source: [package.json:1-15]()
Source: [packages/browser-ext/package.json:1-25]()

### Tool-driven UI

The extension uses a `ToolManager` that wraps `allBrowserTools` from the runtime into a unified interface for the extension context. Tools are organized into categories that the UI can render and filter on:

| Category | Purpose |
| --- | --- |
| `browser` | Tab management, navigation |
| `ui` | Locate, click, hover, fill elements |
| `page` | Metadata, scroll, highlight |
| `screenshot` | Capture to data URL |
| `download` | Save images from agent workflow |
| `intervention` | Human-in-the-loop requests |
| `skill` | Load and execute skill scripts |

The `ToolManager` exposes a subscriber model (`tool_registered` / `tool_unregistered` events) so UI components can react as tools become available, and it is implemented as a singleton via `ToolManager.getInstance()`. This pattern lets the extension's chat panel render the live tool inventory as skills and tools are added at runtime.

Source: [packages/browser-ext/src/services/tool-manager.ts:15-80]()

### Automation mode toolbar

The chat input is augmented by an `AutomationModeInputToolbar` that lets the user toggle between focus mode (visual feedback, window focus) and background mode (silent operation). The component reads the current mode from storage using the `useStorage` hook from `@aipexstudio/browser-runtime/hooks` and validates writes through `validateAutomationMode` from `@aipexstudio/aipex-core`. The toolbar also shows a `TokenUsageIndicator` and stop/send affordances, demonstrating how extension UI components are composed from shared React primitives in `@aipexstudio/aipex-react`.

Source: [packages/browser-ext/src/lib/automation-mode-toolbar.tsx:1-45]()

### File preview and conversation sharing

The options page includes a `FilePreview` dialog for browsing files uploaded to the skill sandbox. It renders text content with `SyntaxHighlighter` (word-wrap, line numbers, monospace font) and shows a binary file alert with a formatted byte count when preview is not possible. Conversation sharing is handled by `shareConversation`, which serializes `UIMessage`s to a `ShareableMessage[]` (omitting screenshots and system messages), and posts the payload to a share API using an auth cookie header.

Source: [packages/browser-ext/src/pages/options/file-components/FilePreview.tsx:1-45]()
Source: [packages/browser-ext/src/services/share-conversation.ts:1-60]()

```mermaid
flowchart LR
    UI[Extension UI<br/>Sidepanel / Options] --> TM[ToolManager]
    TM --> RT[allBrowserTools<br/>@aipexstudio/browser-runtime]
    RT --> Core[aipex-core<br/>Agent / Sessions]
    Core --> LLM[AI SDK Providers]
    Core --> DOM[dom-snapshot]
    TM -->|tool events| UI
    UI -->|share| API[Share API]
```

## Build System

### Root scripts

The root `package.json` exposes a small set of orchestration scripts: `build` runs `pnpm -r --if-present build` to build every workspace; `dev` runs the extension's Vite dev server; `preflight` chains format, lint, typecheck, and test; `lint:dependencies` uses `knip --strict` to catch unused exports. The `postinstall` hook installs `prek`, a faster pre-commit runner.

Source: [package.json:10-35]()

### Extension build

The extension package's build scripts are intentionally thin: `dev` → `vite`, `build` → `vite build`, `preview` → `vite preview`, `build:css` → Tailwind CLI minification, `typecheck` → `tsc --project tsconfig.json`, and `test` → `vitest`. CSS is pre-built to `src/style.css` and shipped alongside the JavaScript bundle. The published `files` array for the extension is `["build/", "README.md", "package.json", "host-access-config.json"]`, so the deployment artifact is the Vite `build/` directory plus the host access policy file.

Source: [packages/browser-ext/package.json:14-40]()

### Runtime build

The runtime package (`@aipexstudio/browser-runtime`) compiles with `tsc -b` (project references), with typecheck via `tsc --project tsconfig.json` and tests via `vitest`. It exposes two entry points — the main module and a `./hooks` subpath for React hooks — and depends on `@aipexstudio/aipex-core` and `@aipexstudio/dom-snapshot` from the workspace. Heavy dependencies such as `@zenfs/core` and the QuickJS WebAssembly file are bundled at the runtime layer, keeping the extension's own bundle lean.

Source: [packages/browser-runtime/package.json:14-50]()

### Linting and formatting

`preflight` runs Biome's `format`, `check --fix --unsafe`, and `tsc` for every workspace. This enforces a single style and import-order rule across packages.

## Deployment Artifact

The extension is published as the `@aipexstudio/cool-aipex` package on npm. The `description` field reads "Automate your browser with natural language commands - The open source browser-use solution", and the package lists several AI SDK peer dependencies as optional (`@ai-sdk/openai`, `@ai-sdk/anthropic`, `@ai-sdk/google`, `@openrouter/ai-sdk-provider`). Users can install only the providers they need; the core agent and OpenAI Agents SDK are required runtime dependencies.

Source: [packages/browser-ext/package.json:5-60]()
Source: [packages/core/package.json:25-50]()

For the MCP bridge, `mcp-bridge/package.json` targets Node `>=18.0.0` and ships as a separate package with keywords such as `mcp`, `model-context-protocol`, `cursor`, `claude`, and `copilot`. It is the recommended way to wire AI clients (Cursor, Claude Code, GitHub Copilot) to the running extension.

Source: [mcp-bridge/package.json:1-30]()

## Operations

### Message adaptation

Internal messages travel through a `message-adapter` that maps `UIMessage` parts to a runtime-friendly shape: `text` parts are passed through, `tool` parts are projected to `{ type: "tool_use", id, name, input }`, and unrecognized part types fall back to `{ type: "text", text: "[<type>]" }`. The adapter also implements `safeJsonParse` and `extractBusinessFailure`, which detect business-level failures (`{ success: false, error: "..." }`) so the UI can surface them as user-visible errors instead of silent JSON strings.

Source: [packages/browser-ext/src/lib/message-adapter.ts:1-70]()

### Tool bundle composition

`allBrowserTools` ships 32 tools by default; a small set is intentionally disabled to avoid known issues: `switch_to_tab` (context switching), `duplicate_tab` (not enabled), and `wait_*` helpers (disabled for stability). The remaining tools cover seven tabs-management actions, seven UI operations, four page-level helpers, two screenshot tools, two download tools, four intervention helpers, and six skill-management tools.

Source: [packages/browser-runtime/README.md:1-55]()

### Known operational issues from the community

- Repository size is large (≈97 MiB packed); tracking is logged in issue #53.
- The MCP bridge only supports a single Claude session at a time (issue #202); opening a second Claude window causes the first to disconnect.
- Snapshots do not yet recursively descend into iframes (issue #100); pages containing iframes cannot have their inner elements located.
- A "Requested device not found" error has been reported (issue #80), usually when the native messaging host is not registered.
- Firefox support is an open feature request (issue #1); the team has stated the codebase will be refactored to support multiple browsers, but Firefox is not yet a supported target.

Source: [README.md:1-30]()

## See Also

- [Core Agent & Sessions](./core-agent.md)
- [DOM Snapshot Library](./dom-snapshot.md)
- [Browser Runtime & Tool Registry](./browser-runtime.md)
- [MCP Bridge](./mcp-bridge.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: AIPexStudio/AIPex

Summary: Found 8 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

## 1. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/AIPexStudio/AIPex

## 2. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/AIPexStudio/AIPex

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/AIPexStudio/AIPex

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AIPexStudio/AIPex

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/AIPexStudio/AIPex

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/AIPexStudio/AIPex

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AIPexStudio/AIPex

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/AIPexStudio/AIPex

<!-- canonical_name: AIPexStudio/AIPex; human_manual_source: deepwiki_human_wiki -->
