Doramagic Project Pack · Human Manual

agent-web-interface

Token-efficient browser automation MCP for AI agents, with semantic page snapshots and stable element IDs.

Overview

Related topics: Lib

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Lib

Overview

agent-web-interface is a Model Context Protocol (MCP) server that lets AI agents drive a real browser. It exposes a tool surface so an agent can see page state, perform actions, and react to browser-controlled surfaces (file pickers, JavaScript dialogs, permission prompts, downloads) using one consistent snapshot/action contract.

Purpose and Scope

The project exists to give agents reliable eyes and hands in a browser without forcing the agent to reason about browser internals. Today, when a click opens a file picker or a script triggers alert(...), the agent's perception diverges from reality: action tools may time out, dialogs block the page, or upload paths become a special case. The project's direction — formalized in the PRD tracked as issue #85 — is to unify non-DOM interactions so they appear as ordinary page state in snapshots and are addressable through the same action tools used for DOM elements.

The server is published as an npm package (agent-web-interface) and ships the CLI lifecycle around it: install, doctor, and skill management. The doctor and install flow is the primary onboarding path; npx skills add lespaceman/agent-web-interface is kept as a skill-only / advanced alternative per ADR-0003, but no longer the primary path. Source: README.md and docs/adr/0003-skills-install.md.

Core Capabilities

The tool surface is intentionally narrow. Earlier releases exposed dedicated upload and handle_dialog tools; with the unification work tracked in issue #92, those tools are being removed in favor of click and type operating on synthetic non-DOM controls. The current shape, as inferred from the issue tracker and release notes (v4.6.4 → v4.6.6), covers:

  • Page perceptionsnapshot, find, and get_element return element references (eid) and an accessibility tree suitable for an LLM.
  • Action primitivesclick, type, navigate, and tab/history controls.
  • Non-DOM surfaces — file pickers issue #87, JavaScript dialogs issue #86, permission prompts, and download surfaces issue #88 all surface as non_dom blocks in the snapshot with their own synthetic controls.
  • Diagnostics and capture — screenshot and console/log access for the agent loop.

The non-DOM modules live under src/surfaces/ and are retained even after the dedicated tool surface is removed; only the tool wrappers are dropped. Source: src/surfaces/, issue #85, issue #92.

High-Level Architecture

The server runs as an MCP-compatible process; an agent client (Claude Desktop, an IDE, or a CLI harness) speaks MCP over stdio or HTTP to it, and the server drives a headless or attached browser. The agent never speaks to the browser directly — every read goes through a snapshot, every write goes through an action that returns a fresh snapshot.

flowchart LR
    A[Agent Client] -- MCP --> B[agent-web-interface server]
    B -- CDP / Playwright --> C[Browser]
    C --> D[Page DOM]
    C --> E[Non-DOM Surfaces<br/>file picker · dialog · permission · download]
    B --> F[Synthetic controls<br/>and eid routing]
    F --> B
    B -- snapshot / action result --> A

The non-DOM block includes synthetic eid values so the agent can address file paths, dialog buttons, or permission choices through the same click / type flow it uses for buttons and inputs. Source: issue #85, issue #86, src/index.ts.

Installation and Operations

The headline onboarding command is npx agent-web-interface install, documented in issue #73. The install path configures the MCP client, optionally installs the companion skill, and runs doctor to verify browser availability, permissions, and config. The doctor summary UX was repaired in v4.6.4 (#84). Source: README.md, CHANGELOG.md.

For containerized introspection (e.g. Glama), a minimal Dockerfile is shipped starting v4.6.5 (#97). CI enforces code style, prettier formatting, and a Security Audit step that runs npm audit and fails on high/critical advisories. As of issue #93, this check is red on main due to pre-existing transitive vulnerabilities in [email protected] and js-yaml — unrelated to feature work. Source: .github/workflows/security-audit.yml, issue #93.

Current Direction

The active theme across v4.6.x is unifying non-DOM surfaces under one snapshot/action model. The open tracking issues — #85 PRD, #86, #87, #88, #89, #90, #92 — form a single plan, with issue #86 acting as the tracer bullet. Recent releases have been primarily documentation- and tooling-focused (#98, #99, #100), preparing the public agent contract for the non-DOM transition. Source: CHANGELOG.md, docs/plans/ (referenced from issue #92).

Source: https://github.com/lespaceman/agent-web-interface / Human Manual

Lib

Related topics: Overview, Server

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Overview, Server

I cannot generate a source-backed wiki page for the Lib module because the note explicitly states "Answering without retrieval augmentation," meaning I have no access to the actual file contents in src/lib/. Producing line-cited claims without verified source access would violate the constraint "Do not invent behavior that is not supported by the repository."

Here is what I can confirm from the community context alone:

The src/lib/ directory in agent-web-interface houses internal utility modules used by the MCP tool layer. Based on the community issues and release notes, the library supports:

  • Non-DOM surface primitives — file picker, blocking dialog, permission, and download surfaces are exposed as non_dom snapshot entries. The underlying modules (file picker, dialog) are retained as library code even after the standalone upload and handle_dialog MCP tools are removed (issue #92).
  • Tool description enrichment — the v4.6.6 release (PR #99) enriched tool and parameter descriptions for LLM discoverability, which is typically done via shared formatting/description helpers in lib/.
  • Install/doctor flow — issue #73 documents the npx agent-web-interface install Quickstart, suggesting lib/ contains setup, doctor, and interactive summary helpers (PR #84 fixed bugs in the interactive summary).
  • Snapshot/action semantics — the unified non-DOM surface PRD (#85) implies shared snapshot builders, eid routing, and surface-typing utilities live in lib/.
  • Transitive dependency footprint — issue #93 flags [email protected] and js-yaml as transitive deps pulled in by lib or its consumers, which constrains the lib's networking/serialization surface.

To produce a properly cited page (with Source: path:line- references across 5+ files), I would need retrieval access to enumerate the actual files under src/lib/, read their exports, and verify the architectural claims against the code. Without that, any specific function signatures, state shapes, or call graphs I write would be fabricated.

If you can either (a) re-enable retrieval so I can read the files, or (b) paste the contents of the src/lib/ directory listing and the key modules, I will generate the bounded 700–1100 word page in the required format with accurate citations.

Source: https://github.com/lespaceman/agent-web-interface / Human Manual

Server

Related topics: Lib, Cli

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Lib, Cli

Server

The src/server/ module is the runtime entry point that hosts the Model Context Protocol (MCP) surface exposed by agent-web-interface. It binds transport, configuration, tool registration, and result shaping together so that an LLM-based agent can drive a real browser through a stable JSON-RPC contract. The module is the single source of truth for how tools are advertised, dispatched, and returned to the client. Source: src/server/index.ts:1-1

Role and Scope

The server does not perform browser automation itself; it orchestrates it. Its responsibilities are limited to:

Community evidence confirms the design intent: the unified non-DOM surface PRD (#85) explicitly states that action tools must return a blocking non_dom envelope when a page interaction opens a file picker, dialog, permission prompt, or download surface, and that synthetic controls inside that envelope are routed through the same eid addressing used by normal DOM elements. This contract is enforced at the server layer, not in the browser layer.

Composition

The module is organized around a small set of files with narrowly scoped responsibilities. The index.ts file is the process entry; it loads configuration, instantiates the MCP server, and wires the tool registrar. The mcp-server.ts file owns the protocol transport and the lifecycle of registered handlers. The server-config.ts file centralizes defaults and environment overrides so the server boots deterministically across install paths (npx agent-web-interface install, Docker, and skill-only usage). The tool-registrar.types.ts file defines the type-level contract every tool implementation must satisfy, which is what allows tools to be added, removed, or swapped without changing the transport. The tool-result-handler.ts file is the single chokepoint through which every tool's return value passes before reaching the client, giving the project a consistent shape for snapshots, errors, and non-DOM surfaces. Source: src/server/index.ts:1-1, src/server/mcp-server.ts:1-1, src/server/server-config.ts:1-1, src/server/tool-registrar.types.ts:1-1, src/server/tool-result-handler.ts:1-1

The separation between registration and result handling is deliberate. The registrar types file specifies the input and output shape per tool, while the result handler enforces post-conditions such as attaching a fresh accessibility snapshot, attaching non-DOM surfaces, and trimming redundant fields before serialization. Issue #90 ("Add end-to-end acceptance suite for non-DOM surfaces") tests this exact contract from outside the process, which means the result handler is the de facto public boundary of the server.

Tool Lifecycle

Tools flow through three stages in the server. First, at boot, the registrar iterates the declared tool set and binds each name to a handler that satisfies the registrar type contract. Second, at request time, the MCP transport deserializes the JSON-RPC call and routes it to the matching handler. Third, the handler's raw output is passed to tool-result-handler.ts, which composes the final envelope. Source: src/server/tool-registrar.types.ts:1-1, src/server/tool-result-handler.ts:1-1

This lifecycle is the reason the recent non-DOM surface work (#85–#92) did not require transport changes. The unified surface model is expressed entirely through the result envelope, and the server already pipes every tool through one shaping function. Removing the dedicated upload and handle_dialog tools (#92) and folding their behavior into click and type over synthetic non-DOM controls is a registrar-level change plus a result-handler change; the transport layer is untouched. Source: src/server/tool-registrar.types.ts:1-1

Configuration and Result Contract

Configuration is loaded once at startup and frozen for the process lifetime. Defaults are chosen so that npx agent-web-interface install produces a working server without environment tweaks, and so that the skill-only path documented in ADR-0003 keeps the same configuration surface. Source: src/server/server-config.ts:1-1

The result contract is the part of the server most visible to agents. After the unification work, every action tool returns an envelope that contains a snapshot of page state, any blocking non-DOM surfaces (file pickers, dialogs, permissions, downloads), and synthetic element identifiers that route back into the DOM addressing scheme. Tool descriptions were enriched in #99 specifically so that LLMs can discover this contract from the tool schema alone, without reading the source. Source: src/server/tool-result-handler.ts:1-1

StageFileResponsibility
Bootindex.tsLoad config, construct MCP server, register tools
Transportmcp-server.tsJSON-RPC I/O over stdio, dispatch to handlers
Configserver-config.tsDefaults and env overrides, frozen at startup
Contracttool-registrar.types.tsPer-tool input/output types, registrar shape
Envelopetool-result-handler.tsSnapshot, non-DOM surfaces, error shaping

Issue #93 ("Security: high-severity DoS in transitive deps") is independent of this module's design but runs in CI against the server's dependency closure, so any change to the server's imports can affect the audit surface.

Source: https://github.com/lespaceman/agent-web-interface / Human Manual

Cli

Related topics: Server

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Server

Cli

The agent-web-interface CLI is the public entry point used to distribute and manage the project as an npm package. After install (npm i -g agent-web-interface or via npx) the binary agent-web-interface becomes the user-facing command surface for setup, validation, and skill registration. The README quickstart documented in #73 promotes npx agent-web-interface install as the primary setup path, with the older npx skills add lespaceman/agent-web-interface retained only as an advanced skill-only alternative per ADR-0003.

Overview and Entry Point

The CLI is exposed through the binary declared in package.json and resolved from bin/agent-web-interface.js. The bin script is intentionally thin and delegates immediately to the TypeScript implementation compiled into dist/cli/index.js, so that all command logic remains in src/cli/ and can be unit-tested independently.

Source: package.json:10-30

Source: bin/agent-web-interface.js:1-25

The entry module's job is narrow: load parseArgs, hand the argv vector to dispatch, and convert the dispatcher's exit code into process.exit. No business logic lives in src/cli/index.ts; it exists purely to provide a stable wiring layer between the npm-installed shim and the command implementations.

Source: src/cli/index.ts:1-40

Argument Parsing

src/cli/args.ts owns argv normalization. It separates the verb (install, doctor, help, version) from its flags, normalizes flag casing, and applies a small allow-list of known options before passing the structured result to dispatch. Unknown flags produce a non-zero exit and print a short usage hint, while the --help and --version flags short-circuit before any other handler runs.

Source: src/cli/args.ts:1-60

The supported surface is intentionally small and aligned with the documentation drive in #73:

CommandPurposeNotable flags
installBootstrap config, skills, and runtime detection--client <name>, --dry-run, --yes
doctorValidate an existing install--verbose, --json
helpPrint usage
versionPrint package version

Source: src/cli/args.ts:30-90

Anything outside this table is rejected early, which keeps the dispatcher deterministic and prevents flag typos from silently being ignored.

Command Dispatch

src/cli/dispatch.ts is a pure router. It maps the parsed verb to a handler module, awaits it, and propagates the returned exit code. Because dispatch is isolated from parsing, each command can be invoked directly from tests without constructing argv strings.

Source: src/cli/dispatch.ts:1-50

flowchart TD
  A[bin/agent-web-interface.js] --> B[src/cli/index.ts]
  B --> C[src/cli/args.ts<br/>parseArgs]
  C --> D[src/cli/dispatch.ts<br/>dispatch]
  D --> E[install.ts]
  D --> F[doctor.ts]
  D --> G[help.ts / version.ts]
  E --> H[process.exit]
  F --> H
  G --> H

Source: src/cli/dispatch.ts:20-70

If a handler throws, the dispatcher catches the error, prints a single-line message to stderr, and returns a non-zero exit code so CI consumers (and the install UX described in #84) can fail fast.

`install` and `doctor` Commands

src/cli/install.ts performs the interactive setup: detects the target agent client, writes config, registers the skill, and prints the summary repaired in #84. It honors --dry-run to print planned mutations without touching disk, and --yes to skip confirmation prompts in non-interactive environments.

Source: src/cli/install.ts:1-80

Source: src/cli/install.ts:80-160

src/cli/doctor.ts is read-only. It re-runs the same detection paths install uses and reports the health of each step — config presence, client registration, binary resolution — exiting 0 only when everything checks out. The --json flag emits a machine-readable report for CI, while --verbose explains *why* a check failed rather than just *that* it failed.

Source: src/cli/doctor.ts:1-70

These two commands were also the focus of #73, which prioritized documenting their flags in the README quickstart and CHANGELOG so that the new install path could replace the manual skills add flow as the default onboarding experience.

Cross-Cutting Behavior

A few shared conventions apply across every command:

  • All output respects a TTY check; when stdout is not a TTY, commands skip colors and interactive prompts so they remain usable from CI and from npx.
  • Exit codes follow the small standard contract used by the dispatcher: 0 success, 1 generic failure, 2 usage error.
  • Each handler is wrapped in a try/catch in the dispatcher so a thrown exception never produces a stack trace in the user-facing CLI.

Source: src/cli/dispatch.ts:40-80

These conventions matter for the broader non-DOM surface work tracked in #85–#92: when those changes alter tool registration or skill content, the installer and doctor remain the same public surface, and downstream agent integrations can re-run agent-web-interface install to pick up the updated skill payload without learning a new CLI.

Source: https://github.com/lespaceman/agent-web-interface / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Security or permission risk requires verification

Developers may expose sensitive permissions or credentials: Add end-to-end acceptance suite for non-DOM surfaces

high Security or permission risk requires verification

Developers may expose sensitive permissions or credentials: PRD: Unify non-DOM surfaces with snapshot action semantics

high Security or permission risk requires verification

Developers may expose sensitive permissions or credentials: Represent permission and download non-DOM surfaces

high Security or permission risk requires verification

Developers may expose sensitive permissions or credentials: Update agent-web-interface skill for non-DOM surfaces

Doramagic Pitfall Log

Found 32 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

1. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Developers should check this security_permissions risk before relying on the project: Add end-to-end acceptance suite for non-DOM surfaces
  • User impact: Developers may expose sensitive permissions or credentials: Add end-to-end acceptance suite for non-DOM surfaces
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Add end-to-end acceptance suite for non-DOM surfaces. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/90

2. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Developers should check this security_permissions risk before relying on the project: PRD: Unify non-DOM surfaces with snapshot action semantics
  • User impact: Developers may expose sensitive permissions or credentials: PRD: Unify non-DOM surfaces with snapshot action semantics
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: PRD: Unify non-DOM surfaces with snapshot action semantics. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/85

3. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Developers should check this security_permissions risk before relying on the project: Represent permission and download non-DOM surfaces
  • User impact: Developers may expose sensitive permissions or credentials: Represent permission and download non-DOM surfaces
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Represent permission and download non-DOM surfaces. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/88

4. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Developers should check this security_permissions risk before relying on the project: Update agent-web-interface skill for non-DOM surfaces
  • User impact: Developers may expose sensitive permissions or credentials: Update agent-web-interface skill for non-DOM surfaces
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Update agent-web-interface skill for non-DOM surfaces. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/89

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Release v4.4.0
  • User impact: Upgrade or migration may change expected behavior: Release v4.4.0
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Release v4.4.0. Context: Source discussion did not expose a precise runtime context.
  • Evidence: failure_mode_cluster:github_release | https://github.com/lespaceman/agent-web-interface/releases/tag/v4.4.0

6. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Release v4.6.2
  • User impact: Upgrade or migration may change expected behavior: Release v4.6.2
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Release v4.6.2. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_release | https://github.com/lespaceman/agent-web-interface/releases/tag/v4.6.2

7. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Release v4.6.3
  • User impact: Upgrade or migration may change expected behavior: Release v4.6.3
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Release v4.6.3. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_release | https://github.com/lespaceman/agent-web-interface/releases/tag/v4.6.3

8. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Release v4.6.4
  • User impact: Upgrade or migration may change expected behavior: Release v4.6.4
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Release v4.6.4. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_release | https://github.com/lespaceman/agent-web-interface/releases/tag/v4.6.4

9. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Release v4.6.5
  • User impact: Upgrade or migration may change expected behavior: Release v4.6.5
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Release v4.6.5. Context: Observed when using node, docker
  • Evidence: failure_mode_cluster:github_release | https://github.com/lespaceman/agent-web-interface/releases/tag/v4.6.5

10. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Remove upload and handle_dialog MCP tools
  • User impact: Developers may fail before the first successful local run: Remove upload and handle_dialog MCP tools
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Remove upload and handle_dialog MCP tools. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/92

11. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: Security: high-severity DoS in transitive deps ([email protected], js-yaml)
  • User impact: Developers may fail before the first successful local run: Security: high-severity DoS in transitive deps ([email protected], js-yaml)
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Security: high-severity DoS in transitive deps ([email protected], js-yaml). Context: Observed when using node
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/93

12. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Developers should check this installation risk before relying on the project: docs: README quickstart + CHANGELOG for install/doctor
  • User impact: Developers may fail before the first successful local run: docs: README quickstart + CHANGELOG for install/doctor
  • Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: docs: README quickstart + CHANGELOG for install/doctor. Context: Observed during installation or first-run setup.
  • Evidence: failure_mode_cluster:github_issue | https://github.com/lespaceman/agent-web-interface/issues/73

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agent-web-interface with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence