# https://github.com/callstack/agent-device Project Manual

Generated at: 2026-06-18 04:09:09 UTC

## Table of Contents

- [System Architecture & Daemon](#page-1)
- [Platform Backends & Runtime](#page-2)
- [Replay System & E2E Workflows](#page-3)
- [AI Agent Integration & CLI Surface](#page-4)

<a id='page-1'></a>

## System Architecture & Daemon

### Related Pages

Related topics: [Platform Backends & Runtime](#page-2), [AI Agent Integration & CLI Surface](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/commands/observability/runtime/diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts)
- [src/commands/observability/runtime/diagnostics-format.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-format.ts)
- [src/commands/observability/runtime/diagnostics-types.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-types.ts)
- [src/commands/observability/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/index.ts)
- [src/commands/management/runtime/admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts)
- [src/commands/management/runtime/admin-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin-router.test.ts)
- [src/commands/management/runtime/apps.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/apps.ts)
- [src/commands/recording/runtime/recording.ts](https://github.com/callstack/agent-device/blob/main/src/commands/recording/runtime/recording.ts)
- [src/commands/capture/runtime/snapshot.ts](https://github.com/callstack/agent-device/blob/main/src/commands/capture/runtime/snapshot.ts)
- [package.json](https://github.com/callstack/agent-device/blob/main/package.json)
- [README.md](https://github.com/callstack/agent-device/blob/main/README.md)
- [ios-runner/README.md](https://github.com/callstack/agent-device/blob/main/ios-runner/README.md)
</details>

# System Architecture & Daemon

`agent-device` is a device-automation CLI that gives AI agents structured access to real iOS, Android, TV, and desktop applications. The system is organised around a small, layered runtime that decouples protocol/transport concerns from platform-specific automation backends and from the agent-facing command surface. This page describes the high-level architecture and the role of the long-running daemon that fronts the runtime.

## Purpose and Scope

The runtime exists so that one CLI can speak to many platforms (iOS Simulator, Android Emulator, physical devices, tvOS, macOS, Linux, and React Native overlays) through a single, typed command set. Agents receive token-efficient snapshots, semantic refs such as `@e3`, and on-demand evidence (logs, network, video, traces) without having to know which driver sits underneath.

Source: [README.md](https://github.com/callstack/agent-device/blob/main/README.md) — describes the CLI as "Mobile app verification for AI agents" and lists platforms, capabilities, and use cases.

The runtime contract defines three collaborators that the daemon wires together: a typed `AgentDeviceBackend` that performs platform work, a session store that tracks app/device context, and an `ArtifactAdapter` that handles file inputs and outputs. Commands are pure functions of `(runtime, options)` so they are trivially testable and transport-agnostic.

Source: [src/commands/observability/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/index.ts) — exposes `bindObservabilityCommands(runtime)` which turns runtime commands into bound async functions for a specific runtime instance.

## Layered Architecture

```mermaid
flowchart TB
  Agent[AI Agent / CLI client] -->|HTTP / MCP| Daemon[agent-device daemon]
  Daemon --> Router[Request Router & Command Registry]
  Router --> Runtime[AgentDeviceRuntime]
  Runtime -->|commands| Obs[Observability]
  Runtime -->|commands| Mgmt[Management / Admin / Apps]
  Runtime -->|commands| Capture[Capture / Snapshot]
  Runtime -->|commands| Rec[Recording / Trace]
  Obs --> Backend[AgentDeviceBackend]
  Mgmt --> Backend
  Capture --> Backend
  Rec --> Backend
  Backend --> iOS[iOS / XCUITest Runner]
  Backend --> Android[Android / ADB / Emulator]
  Backend --> Desktop[macOS / Linux desktop drivers]
  Runtime --> Sessions[Session Store]
  Runtime --> Artifacts[Artifact Adapter]
```

The diagram shows how the daemon funnels agent requests through a router into the runtime, which dispatches to command families, which in turn call the backend primitives. Sessions and artifact I/O are cross-cutting concerns supplied at construction time.

### Runtime and Backend Boundary

Every command is a `RuntimeCommand<Options, Result>` whose body is a thin wrapper that checks for backend support, normalises options, and delegates to a backend primitive. For example, `logsCommand`, `networkCommand`, and `perfCommand` all begin with a guard that throws `AppError('UNSUPPORTED_OPERATION', ...)` if the backend does not implement the corresponding method.

Source: [src/commands/observability/runtime/diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts) — implements the `UNSUPPORTED_OPERATION` pattern for `backend.readLogs`, `backend.dumpNetwork`, and `backend.measurePerf`.

Source: [src/commands/management/runtime/admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts) — applies the same pattern to `listDevices`, `bootDevice`, `shutdownDevice`, `installApp`, and `reinstallApp`.

This pattern means a single command set can be exposed over multiple backends (local CLI, MCP server, CI runner) and a backend can opt in or out of capabilities by simply not implementing a method.

### Command Families

Commands are grouped by domain and each family provides a `bind*Commands(runtime)` helper that closes over the runtime and returns promise-returning methods. The `observability` family currently binds `logs`, `network`, and `perf`.

Source: [src/commands/observability/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/index.ts) — declares `DiagnosticsCommands`, `BoundObservabilityCommands`, and `diagnosticsCommands`.

The `management` family owns `admin.*` (devices, boot, shutdown, install) and `apps.*` (open, close, list, push). The `capture` family handles snapshots and diffs, and the `recording` family handles screen recording and traces. Tests confirm the wiring by constructing a fake `AgentDeviceBackend` and asserting that typed primitives are called with the expected context.

Source: [src/commands/management/runtime/admin-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin-router.test.ts) — exercises `device.admin.devices({ filter: { platform: 'ios' } })` and asserts on the `adminDevices` result kind.

Source: [src/commands/observability/runtime/diagnostics-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-router.test.ts) — exercises `device.observability.logs(...)` against a `restrictedCommandPolicy()` and a memory session store, asserting redaction of sensitive fields.

## Daemon Responsibilities

The daemon is the long-lived process that:

1. Boots an HTTP/MCP transport and routes incoming requests to the command registry.
2. Constructs a single `AgentDeviceRuntime` from a backend, a session store, an `ArtifactAdapter`, and a command policy.
3. Enforces the policy (`localCommandPolicy()` for full local control, `restrictedCommandPolicy()` for read-only / agent-facing surfaces).
4. Formats results uniformly via the shared `BackendResultEnvelope` and redaction helpers before returning them.

Diagnostic results are normalised through dedicated formatters that cap payload size, redact keys matching `/(?:authorization|cookie|token|secret|password|passwd|api[-_]?key)/i`, and surface a `redacted: boolean` flag so callers know whether scrubbing occurred.

Source: [src/commands/observability/runtime/diagnostics-format.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-format.ts) — defines `formatLogsResult`, `formatNetworkResult`, `formatPerfResult`, and the secret-key regex used to redact log metadata and network bodies.

The iOS side is backed by a small XCUITest target (`AgentDeviceRunnerUITests`) that speaks a TCP/HTTP protocol to the daemon; the runner is intentionally split into focused files (`RunnerTests+Interaction.swift`, `RunnerTests+Lifecycle.swift`, `RunnerTests+Transport.swift`, etc.) so contributors and LLM agents can load only the surface they need.

Source: [ios-runner/README.md](https://github.com/callstack/agent-device/blob/main/ios-runner/README.md) — documents the split-file layout of the XCUITest runner and its protocol.

## Configuration Surface

| Surface | Mechanism | Source |
| --- | --- | --- |
| Command enablement | `localCommandPolicy()` vs `restrictedCommandPolicy()` | [admin-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin-router.test.ts), [diagnostics-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-router.test.ts) |
| Backend capability | Optional methods on `AgentDeviceBackend`; commands throw `UNSUPPORTED_OPERATION` when absent | [diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts), [admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts) |
| Sessions | `createMemorySessionStore([...])` keyed by session name → `appId` / `appBundleId` | [diagnostics-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-router.test.ts) |
| File I/O | `ArtifactAdapter` with `resolveInput`, `reserveOutput`, `createTempFile` | [admin-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin-router.test.ts) |
| Result envelopes | `BackendResultEnvelope` and `BackendResultVariant` carry backend metadata | [admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts), [recording.ts](https://github.com/callstack/agent-device/blob/main/src/commands/recording/runtime/recording.ts) |

## Common Failure Modes

- **Backend missing a primitive** — Commands guard on the presence of the backend method and throw `AppError('UNSUPPORTED_OPERATION', ...)`. Callers should treat this as a capability gap, not a crash.
- **Limit violations** — Log/network/perf commands enforce numeric ranges (e.g. `LOG_LIMIT_MAX = 500`, `PERF_SAMPLE_MIN_MS = 100`). Requests outside the range fail validation rather than silently truncating.
- **Sensitive data leakage** — Diagnostic formatters redact authorization, cookie, token, secret, password, and API-key fields and return a `redacted: true` flag. Operators relying on raw payloads should not bypass these formatters.
- **Replay scripts without parameters** — Community issue #432 ("Parametrise `.ad` replay scripts") highlights that `.ad` replay files are still literal-only, so reusing a flow across app variants (e.g. `com.example.debug` vs `com.example.prod`) currently requires duplicated scripts. This is a known limitation of the recorder/replayer, not the runtime itself.

## See Also

- [README.md](https://github.com/callstack/agent-device/blob/main/README.md) — project overview, capabilities, and use cases.
- [package.json](https://github.com/callstack/agent-device/blob/main/package.json) — scripts, dependencies, and the `yaml` runtime dependency.
- [ios-runner/README.md](https://github.com/callstack/agent-device/blob/main/ios-runner/README.md) — XCUITest runner layout and protocol pointers.

---

<a id='page-2'></a>

## Platform Backends & Runtime

### Related Pages

Related topics: [System Architecture & Daemon](#page-1), [Replay System & E2E Workflows](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/commands/observability/runtime/diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts)
- [src/commands/observability/runtime/diagnostics-format.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-format.ts)
- [src/commands/observability/runtime/diagnostics-types.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-types.ts)
- [src/commands/observability/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/index.ts)
- [src/commands/observability/runtime/diagnostics-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-router.test.ts)
- [src/commands/management/runtime/admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts)
- [src/commands/management/runtime/admin-router.test.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin-router.test.ts)
- [src/commands/management/runtime/apps.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/apps.ts)
- [src/commands/recording/runtime/recording.ts](https://github.com/callstack/agent-device/blob/main/src/commands/recording/runtime/recording.ts)
- [src/commands/capture/runtime/snapshot.ts](https://github.com/callstack/agent-device/blob/main/src/commands/capture/runtime/snapshot.ts)
- [src/commands/interaction/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/interaction/runtime/index.ts)
- [ios-runner/README.md](https://github.com/callstack/agent-device/blob/main/ios-runner/README.md)
- [README.md](https://github.com/callstack/agent-device/blob/main/README.md)
- [package.json](https://github.com/callstack/agent-device/blob/main/package.json)
</details>

# Platform Backends & Runtime

## Overview

`agent-device` is a CLI that gives AI agents token-efficient access to iOS, Android, TV, and desktop automation. The **Platform Backends & Runtime** layer is the abstraction that makes this possible: it isolates device-specific drivers behind a typed `AgentDeviceBackend` contract, then routes user-facing commands through a uniform runtime that handles session, artifact, and policy concerns.

The runtime is a thin, deterministic wrapper. It does not speak to devices directly. Every interaction — `openApp`, `snapshot`, `logs`, `install` — flows from a typed command module into `runtime.backend.<method>` on a platform-specific backend. This split lets the same command surface drive iOS (XCUITest), Android, tvOS, and macOS backends without changes upstream. Source: [README.md:1-25]()

Community-driven limitations also surface in this layer. Issue #432 reports that `.ad` replay scripts must currently hardcode literals like bundle identifiers, which makes the runtime's lack of script parametrisation a recurring friction point. Source: [README.md:1-25]()

## Runtime Architecture

Commands are organized into domain modules under `src/commands/<domain>/runtime/`. Each module exports a `RuntimeCommand<Options, Result>` function that performs three jobs:

1. **Capability check** — if `runtime.backend.<method>` is missing, the command throws `UNSUPPORTED_OPERATION`. For example, `logsCommand` requires `runtime.backend.readLogs`. Source: [src/commands/observability/runtime/diagnostics.ts:1-100]()
2. **Context translation** — call-site options pass through `toBackendContext(runtime, options)` to produce a `BackendCommandContext` the backend understands. Source: [src/commands/observability/runtime/diagnostics.ts:1-100]()
3. **Result normalization** — backend output is mapped into a stable client shape by `format*Result` helpers, which also redact sensitive fields (e.g. `authorization`, `cookie`, `token`, `secret`, `password`, `api[-_]?key`). Source: [src/commands/observability/runtime/diagnostics-format.ts:1-50]()

```mermaid
flowchart LR
  A[Client / Agent] --> B[RuntimeCommand]
  B --> C{Capability Check}
  C -- missing --> Z[UNSUPPORTED_OPERATION]
  C -- present --> D[toBackendContext]
  D --> E[AgentDeviceBackend]
  E --> F[format*Result + Redaction]
  F --> G[Typed Result Envelope]
```

The `bindObservabilityCommands` helper at the module boundary converts `RuntimeCommand` shapes into bound, single-argument callables exposed on the public `device.observability.*` surface. Source: [src/commands/observability/runtime/index.ts:1-50]()

## Backend Contract

`AgentDeviceBackend` is the single seam between the runtime and platform code. It is a structural TypeScript type — no inheritance, no classes — that lists optional methods for every supported operation. Methods are split by domain:

| Domain    | Example methods                                           | Command module                              |
|-----------|-----------------------------------------------------------|---------------------------------------------|
| Admin     | `listDevices`, `bootDevice`, `shutdownDevice`             | `src/commands/management/runtime/admin.ts`  |
| Apps      | `openApp`, `closeApp`, `listApps`, `pushPayload`          | `src/commands/management/runtime/apps.ts`   |
| Observability | `readLogs`, `dumpNetwork`, `measurePerf`              | `src/commands/observability/runtime/diagnostics.ts` |
| Recording | `startRecording`, `stopRecording`, `startTrace`           | `src/commands/recording/runtime/recording.ts` |
| Capture   | `captureSnapshot`                                         | `src/commands/capture/runtime/snapshot.ts`  |

Because every method is optional, a backend can declare only the capabilities it supports. The runtime's `UNSUPPORTED_OPERATION` path turns partial support into a clean error rather than a crash. Source: [src/commands/management/runtime/admin.ts:1-100]()

Backend methods receive a normalized `BackendCommandContext` and return backend-specific result envelopes. The runtime then re-shapes these into the public result types declared next to each command, e.g. `DiagnosticsLogsCommandResult` with `kind: 'diagnosticsLogs'`. Source: [src/commands/observability/runtime/diagnostics-types.ts:1-30]()

### Validation and limits

Runtime commands apply typed validation before delegating to the backend. Examples from the diagnostics module:

- `LOG_LIMIT_DEFAULT = 100`, `LOG_LIMIT_MAX = 500` for log paging. Source: [src/commands/observability/runtime/diagnostics.ts:1-100]()
- `PERF_SAMPLE_MIN_MS = 100`, `PERF_SAMPLE_MAX_MS = 60_000` for performance sampling windows. Source: [src/commands/observability/runtime/diagnostics.ts:1-100]()
- `MAX_APP_EVENT_PAYLOAD_BYTES = 8 * 1024` for app-event payloads, enforced with the regex `^[A-Za-z0-9_.:-]{1,64}$`. Source: [src/commands/management/runtime/apps.ts:1-50]()

These limits keep the runtime predictable for agents that drive long, automated sessions.

## Platform Implementations

The runtime is platform-agnostic; each backend plugs in separately. The most mature example is the **iOS Runner**, a small XCUITest target embedded under `ios-runner/`. It exposes UI automation over a lightweight HTTP/TCP server embedded in the test bundle, and is split across focused files to keep context sizes manageable for LLM agents:

- `RunnerTests.swift` — shared state, `setUp()`, `testCommand()` entry point. Source: [ios-runner/README.md:1-50]()
- `RunnerTests+Models.swift` — wire protocol (`Command`, `Response`, snapshot payloads). Source: [ios-runner/README.md:1-50]()
- `RunnerTests+Transport.swift` — TCP request handling and HTTP parsing/encoding. Source: [ios-runner/README.md:1-50]()
- `RunnerTests+CommandExecution.swift` — command dispatch and the `execute*` switch. Source: [ios-runner/README.md:1-50]()

The runner is consumed by the TypeScript client at `src/platforms/ios/runner-client.ts` and modeled by `src/platforms/ios/runner-contract.ts`. Recent release v0.17.6 added support for an external XCUITest runner artifact and improved flag classification, reflecting an ongoing move toward pluggable, externally-built runners. Source: [ios-runner/README.md:1-50]()

Android, tvOS, and macOS backends follow the same `AgentDeviceBackend` shape; the runtime does not need to know which platform is wired in.

## Testing the Boundary

The runtime layer is verified end-to-end with mock backends. Tests instantiate `createAgentDevice` with a fake `AgentDeviceBackend` and an in-memory session store, then assert that commands:

- Call the expected backend primitive with a `BackendCommandContext`. Source: [src/commands/observability/runtime/diagnostics-router.test.ts:1-50]()
- Surface the right `kind` discriminator on the result. Source: [src/commands/management/runtime/admin-router.test.ts:1-50]()
- Mark `redacted: true` whenever formatting touched sensitive keys. Source: [src/commands/observability/runtime/diagnostics-router.test.ts:1-50]()

This pattern keeps the runtime's contract honest: any new backend method must come with a runtime command that validates input, normalizes context, and re-shapes results through the redaction-aware formatting layer.

## See Also

- [Architecture & Runtime Contract](architecture.md)
- [iOS Runner Protocol](ios-runner-protocol.md)
- [Command Reference](commands.md)

---

<a id='page-3'></a>

## Replay System & E2E Workflows

### Related Pages

Related topics: [System Architecture & Daemon](#page-1), [AI Agent Integration & CLI Surface](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [src/replay/script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/script.ts)
- [src/replay/control-flow-runtime.ts](https://github.com/callstack/agent-device/blob/main/src/replay/control-flow-runtime.ts)
- [src/replay/vars.ts](https://github.com/callstack/agent-device/blob/main/src/replay/vars.ts)
- [src/replay/open-script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/open-script.ts)
- [src/replay/script-formatting.ts](https://github.com/callstack/agent-device/blob/main/src/replay/script-formatting.ts)
- [src/commands/replay/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/replay/index.ts)
- [README.md](https://github.com/callstack/agent-device/blob/main/README.md)
- [package.json](https://github.com/callstack/agent-device/blob/main/package.json)
</details>

# Replay System & E2E Workflows

## Overview

The Replay System is the component of `agent-device` that converts one-time device interactions into persistent, reproducible end-to-end (E2E) test scripts. It records human- or agent-driven actions on real devices and emulators into `.ad` files, then re-executes them against the same — or a different — target with deterministic semantics. According to the [README.md](https://github.com/callstack/agent-device/blob/main/README.md), the system supports four primary use cases:

- **Local replay** for iterative development.
- **CI integration** for gate-keeping pull requests.
- **Repeatable E2E checks** that survive platform and device churn.
- **Strict Maestro YAML export** when a flow needs to be portable to a different runner.

The `.ad` script format is treated as a first-class artifact in the repository: it is version-controlled alongside source code, executed by the dedicated `replay` command surface, and validated by a dedicated test suite split per platform. Source: [src/commands/replay/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/replay/index.ts).

## Script Format & Recording

An `.ad` script is a textual, line-oriented sequence of replayable steps. Its lifecycle is split across three modules:

- **[src/replay/script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/script.ts)** defines the canonical step schema, parser, and structural validators.
- **[src/replay/script-formatting.ts](https://github.com/callstack/agent-device/blob/main/src/replay/script-formatting.ts)** produces canonical output so that scripts remain diff-friendly under source control.
- **[src/replay/open-script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/open-script.ts)** opens a script, resolves the target device/session context, and feeds the parsed steps into the execution runtime.

Steps recorded into a script represent the same observable actions an agent can perform interactively — `open`, `tap`, `type`, `scroll`, `wait`, `assert`, `back`, etc. Because the recording layer is wired through the same backend primitives as interactive commands, a script faithfully captures what the device actually saw, not what the user clicked. Source: [src/replay/script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/script.ts).

```mermaid
flowchart LR
    A[Interactive session] --> B[Recorder]
    B --> C[.ad script file]
    C --> D[Parser / script.ts]
    D --> E[Control-flow runtime]
    E --> F[Backend primitives]
    F --> G[Device / Simulator / Emulator]
    C --> H[Maestro YAML export]
    H --> G
```

## Execution Runtime

Replay execution is not a string-level replayer — it is a typed interpreter. Two modules own the runtime semantics:

- **[src/replay/control-flow-runtime.ts](https://github.com/callstack/agent-device/blob/main/src/replay/control-flow-runtime.ts)** dispatches each parsed step to the matching runtime command, evaluates control flow (`if`, `repeat`, retry policies), and propagates backend results back as structured command results.
- **[src/replay/vars.ts](https://github.com/callstack/agent-device/blob/main/src/replay/vars.ts)** is the in-script variable layer, used for step-to-step communication such as capturing element refs and reusing computed values.

Because dispatch goes through the same `AgentDeviceRuntime` used by the interactive CLI, replayed steps benefit from identical redaction, snapshot diffing, and observability pipelines as live sessions. For example, a replayed `log` step will go through the same redaction logic defined in [src/commands/observability/runtime/diagnostics-format.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-format.ts). Source: [src/replay/control-flow-runtime.ts](https://github.com/callstack/agent-device/blob/main/src/replay/control-flow-runtime.ts).

## CI Integration & Test Surfaces

The repository ships platform-scoped replay test suites, invoked through `pnpm` scripts defined in [package.json](https://github.com/callstack/agent-device/blob/main/package.json):

| Script | Target |
| --- | --- |
| `test:replay:android` | Android emulator/device replays |
| `test:replay:macos` | macOS desktop replays |
| `test:replay:linux` | Linux desktop / AT-SPI replays |

These are conventional E2E entry points: a CI job invokes one of the scripts, the runner opens the script via [src/replay/open-script.ts](https://github.com/callstack/agent-device/blob/main/src/replay/open-script.ts), and exit codes mirror the structured command result. This is also how release v0.17.6 is validated — the recent changes around external XCTest runner artifacts ([PR #806](https://github.com/callstack/agent-device/pull/806)) and the no-op XCTest runner flag classification ([PR #810](https://github.com/callstack/agent-device/pull/810)) are exercised through these replay surfaces. Source: [package.json](https://github.com/callstack/agent-device/blob/main/package.json).

## Known Limitations & Community Discussion

The most prominent open issue on the replay system is **#432: "Parametrise `.ad` replay scripts"**. As reported by the community, `.ad` scripts today require every value to be a literal in the file. This blocks three common patterns:

1. Reusing a single script across app variants (e.g., `com.example.debug` vs `com.example.prod`).
2. Tuning timings (delays, retry budgets) per environment without duplicating the file.
3. Re-targeting a script between device types without a code change.

The proposed direction in the issue aligns with the responsibility split in the codebase: a parameter resolver layer that hands typed values to the existing [src/replay/control-flow-runtime.ts](https://github.com/callstack/agent-device/blob/main/src/replay/control-flow-runtime.ts) dispatcher, similar to how [src/replay/vars.ts](https://github.com/callstack/agent-device/blob/main/src/replay/vars.ts) already handles intra-script variables. Until that lands, scripts remain a single-context artifact and the workaround is to generate them at CI time or duplicate per variant. Source: community context for issue #432.

## See Also

- [README.md](https://github.com/callstack/agent-device/blob/main/README.md) — top-level capability summary and platform matrix.
- [src/commands/observability/runtime/diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts) — diagnostics commands that replayed steps can invoke.
- [src/commands/recording/runtime/recording.ts](https://github.com/callstack/agent-device/blob/main/src/commands/recording/runtime/recording.ts) — adjacent recording primitives (screen recording, traces) that complement replay evidence.

---

<a id='page-4'></a>

## AI Agent Integration & CLI Surface

### Related Pages

Related topics: [System Architecture & Daemon](#page-1), [Replay System & E2E Workflows](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/callstack/agent-device/blob/main/README.md)
- [package.json](https://github.com/callstack/agent-device/blob/main/package.json)
- [src/commands/observability/runtime/diagnostics.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics.ts)
- [src/commands/observability/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/index.ts)
- [src/commands/observability/runtime/diagnostics-format.ts](https://github.com/callstack/agent-device/blob/main/src/commands/observability/runtime/diagnostics-format.ts)
- [src/commands/management/runtime/apps.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/apps.ts)
- [src/commands/management/runtime/admin.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/admin.ts)
- [src/commands/management/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/management/runtime/index.ts)
- [src/commands/interaction/runtime/index.ts](https://github.com/callstack/agent-device/blob/main/src/commands/interaction/runtime/index.ts)
- [ios-runner/README.md](https://github.com/callstack/agent-device/blob/main/ios-runner/README.md)
- [website/package.json](https://github.com/callstack/agent-device/blob/main/website/package.json)
</details>

# AI Agent Integration & CLI Surface

## Purpose and Scope

`agent-device` is positioned as a device automation CLI built for AI coding agents. It exposes a single, session-aware command surface that lets agents open real apps, inspect structured UI state, perform interactions, and capture diagnostics evidence across iOS, Android, tvOS, Android TV, macOS, and Linux targets. The design goal expressed in the README is "token-efficient snapshots, semantic refs, and evidence captured only when needed" — i.e. output that is safe to feed straight into an LLM context.

Source: [README.md:1-30]()

The integration story has three layers, all anchored in the same runtime:

1. **Shell-out CLI** — agents invoke the binary as a subprocess and parse the structured JSON/text results.
2. **Node API** — typed clients are exposed via package subpaths: `agent-device`, `agent-device/io`, `agent-device/artifacts`, `agent-device/metro`, `agent-device/batch`, and `agent-device/remote-config`.
3. **MCP server** — the manifest declares `mcpName: io.github.callstackincubator/agent-device`, allowing MCP-aware hosts (Claude Code, Cursor, etc.) to discover and call the tool directly.

Source: [package.json:18-40]()

## Command Organization

The command surface is split into three top-level domains, each registered in its own runtime module:

| Domain | Module | Representative commands |
| --- | --- | --- |
| Management | `src/commands/management/runtime/index.ts` | `apps.open`, `apps.close`, `apps.list`, `apps.state`, `apps.push`, `apps.triggerEvent`, `admin.devices`, `admin.boot`, `admin.shutdown`, `admin.install` |
| Interaction | `src/commands/interaction/runtime/index.ts` | `click`, `fill`, `focus`, `press`, `longPress`, `pinch`, `scroll`, `swipe`, `type`, plus selector reads `find`, `get`, `getText`, `is`, `wait`, `waitForText` |
| Observability | `src/commands/observability/runtime/index.ts` | `observability.logs`, `observability.network`, `observability.perf` |

Source: [src/commands/management/runtime/index.ts:50-80]()
Source: [src/commands/interaction/runtime/index.ts:40-60]()
Source: [src/commands/observability/runtime/index.ts:20-33]()

Every command follows the same shape — a `RuntimeCommand<Options, Result>` function that receives an `AgentDeviceRuntime` and an options bag, then returns a typed result. A `bind*Commands(runtime)` helper wraps the module so consumers receive ready-to-call async methods. For observability this is implemented as:

```ts
export function bindObservabilityCommands(runtime: AgentDeviceRuntime): BoundObservabilityCommands {
  return {
    logs:     (options) => diagnosticsCommands.logs(runtime, options),
    network:  (options) => diagnosticsCommands.network(runtime, options),
    perf:     (options) => diagnosticsCommands.perf(runtime, options),
  };
}
```

Source: [src/commands/observability/runtime/index.ts:25-33]()

## Result Envelopes and Diagnostics Formatting

Each command returns a `BackendResultEnvelope` that augments the domain payload with backend metadata. Opening an app, for instance, yields a result whose `kind` discriminator (`'appOpened'`) is what an agent typically pattern-matches on:

```ts
{
  kind: 'appOpened',
  target: { app: 'com.example.app' },
  relaunch: true,
  backendResult: { opened: true },
  message: 'Opened: com.example.app',
}
```

Source: [src/commands/management/runtime/apps.ts:50-90]()

Diagnostics output goes through a dedicated formatter that redacts sensitive fields and truncates large payloads before the result ever leaves the runtime. The relevant constants and redaction pattern are defined once and reused:

```ts
const PAYLOAD_MAX_CHARS    = 2048;
const MESSAGE_MAX_CHARS    = 4096;
const SECRET_KEY_PATTERN   = /(?:authorization|cookie|token|secret|password|passwd|api[-_]?key)/i;
```

Source: [src/commands/observability/runtime/diagnostics-format.ts:10-20]()

When any redaction occurs, the resulting envelope includes a `redacted: true` flag, giving the calling agent a hint that the captured text was sanitized. This pre-processing is what makes the CLI safe to feed directly into an LLM context.

## Session Awareness and Backend Routing

Every command carries a `CommandContext` (session name, working directory, output preferences) and is routed to a typed `AgentDeviceBackend`. Each platform implements the same contract, which keeps the surface uniform:

- **iOS / tvOS / macOS** — XCTest runner that exposes UI automation over a small HTTP server. The Swift files in `AgentDeviceRunnerUITests/RunnerTests` are intentionally split (transport, command execution, lifecycle, interaction) to keep context size manageable for both contributors and LLM agents.
- **Android** — ADB plus the Android snapshot helper, surfaced through the `agent-device/android-adb` subpath for logcat, clipboard, keyboard, app helpers, and port reverse management.
- **Linux** — AT-SPI for desktop automation.

Source: [README.md:90-110]()
Source: [ios-runner/README.md:1-30]()

When a backend lacks a primitive, the runtime fails fast with an `AppError('UNSUPPORTED_OPERATION', ...)` rather than silently no-opping. This is important for agent reliability — a missing capability should not look like a successful empty result:

```ts
if (!runtime.backend.listDevices) {
  throw new AppError('UNSUPPORTED_OPERATION', 'admin.devices is not supported by this backend');
}
```

Source: [src/commands/management/runtime/admin.ts:120-130]()

Diagnostics commands apply the same strict pattern: options such as `limit` are validated via `requireIntInRange` before any backend call, and time-window inputs are normalized at the edge so agents can pass ISO strings or relative shorthands interchangeably.

Source: [src/commands/observability/runtime/diagnostics.ts:1-40]()

## Community Considerations

A recurring community request (issue #432) is that `.ad` replay scripts cannot yet be parametrised — every value in a script is currently a literal. This forces users to duplicate the file when switching app variants (`com.example.debug` vs `com.example.prod`) or when tuning timings between environments. Until variable substitution lands in the replay format, agents that need reusable flows should generate the `.ad` content programmatically from a template before invoking the replay command, rather than relying on static files. The community context for v0.17.6 also notes ongoing iOS XCTest runner improvements (external xctest runner artifact support and flag classification) that affect the surface available on Apple-family targets.

Source: Community context — issue #432 and release notes for v0.17.6.

## See Also

- [Commands and Replay](commands-and-replay.md) — `.ad` script grammar, replay lifecycle, and the command catalog.
- [Platform Backends](platform-backends.md) — XCTest, ADB, and AT-SPI integration details.
- [Configuration and Sessions](configuration-and-sessions.md) — session stores, command policies, and artifact adapters.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: callstack/agent-device

Summary: Found 12 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/538

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/694

## 3. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/126

## 4. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/808

## 5. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://www.npmjs.com/package/agent-device

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://www.npmjs.com/package/agent-device

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://www.npmjs.com/package/agent-device

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://www.npmjs.com/package/agent-device

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/696

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/callstack/agent-device/issues/699

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://www.npmjs.com/package/agent-device

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://www.npmjs.com/package/agent-device

<!-- canonical_name: callstack/agent-device; human_manual_source: deepwiki_human_wiki -->
