# https://github.com/browserable/browserable Project Manual

Generated at: 2026-06-22 06:40:06 UTC

## Table of Contents

- [Overview, Architecture & Getting Started](#page-overview)
- [AI Agents, Prompts & LLM Integration](#page-agents-llm)
- [REST API, JavaScript SDK & Custom Functions](#page-api-sdk)
- [Deployment, Configuration & Troubleshooting](#page-deployment-ops)

<a id='page-overview'></a>

## Overview, Architecture & Getting Started

### Related Pages

Related topics: [AI Agents, Prompts & LLM Integration](#page-agents-llm), [REST API, JavaScript SDK & Custom Functions](#page-api-sdk), [Deployment, Configuration & Troubleshooting](#page-deployment-ops)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js)
- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js)
- [tasks/agents/jarvis.js](https://github.com/browserable/browserable/blob/main/tasks/agents/jarvis.js)
- [tasks/prompts/agents/browserable/actionPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/actionPrompts.js)
- [tasks/prompts/agents/browserable/extractPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/extractPrompts.js)
- [tasks/prompts/agents/jarvis/richOutputPrompt.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/jarvis/richOutputPrompt.js)
- [ui/package.json](https://github.com/browserable/browserable/blob/main/ui/package.json)
- [sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)
- [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)
- [ui/src/containers/FlowContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/FlowContainer.jsx)
- [ui/src/routes/NotFound.jsx](https://github.com/browserable/browserable/blob/main/ui/src/routes/NotFound.jsx)
</details>

# Overview, Architecture & Getting Started

## What is Browserable

Browserable is an open-source platform that lets AI agents drive a real web browser to complete user tasks. The repository bundles the agents, an orchestration layer, a browser automation service, a JavaScript SDK, and a desktop-style Admin UI into a single monorepo that can be launched with the `npx browserable` CLI ([README context referenced in community issue #8](https://github.com/browserable/browserable/issues/8)).

The defining capability is the `BROWSER_AGENT`, declared in [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js), which exposes four primitives — `open_new_tab`, `read_tab`, `act_on_tab`, and `extract_from_tab` — and is documented to "figure out how to procure a remote browser session + perform tasks like clicking, typing, etc on it." Higher-level agents (such as a Google Sheets agent) are explicitly preferred over the browser agent when a targeted tool is available.

## System Architecture

Browserable is a polyglot monorepo. The Admin UI ships as an Electron-forge desktop application (see `make` scripts in [ui/package.json](https://github.com/browserable/browserable/blob/main/ui/package.json)), the SDK is published as `browserable-js` on npm ([sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)), and the runtime is split across multiple long-running services that communicate over HTTP and Postgres-backed message logs.

```mermaid
flowchart LR
    User["User / Client App"] --> SDK["browserable-js SDK"]
    User --> CLI["npx browserable CLI"]
    User --> UI["Admin UI (Electron)"]
    CLI --> Tasks["tasks service<br/>(Jarvis orchestrator + agents)"]
    UI --> Tasks
    SDK --> Tasks
    Tasks --> DB[("Postgres<br/>message_logs")]
    Tasks --> Browser["browser service<br/>(Playwright session)"]
    Tasks --> LLM["OpenAI-compatible LLM<br/>(gemini, gpt-4o, claude, deepseek, qwen)"]
    Browser -->|screenshots / DOM| Tasks
    LLM -->|tool calls| Tasks
```

### Core Components

| Component | Source | Responsibility |
|---|---|---|
| `Jarvis` orchestrator | [tasks/agents/jarvis.js](https://github.com/browserable/browserable/blob/main/tasks/agents/jarvis.js) | Splits a user request into a flow of sub-tasks, schedules node loopers, and aggregates results into a structured `outputGenerated` object via [richOutputPrompt.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/jarvis/richOutputPrompt.js) |
| `BaseAgent` | [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js) | Provides shared lifecycle helpers (`_action_end`, error reporting) for every concrete agent |
| `BROWSER_AGENT` | [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js) | The browser-driving agent; emits `agent`, `user`, and `debug` log segments, and persists screenshots after each chunk |
| Action / extract prompts | [actionPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/actionPrompts.js), [extractPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/extractPrompts.js) | JSON-only prompts that ask the LLM to choose `doAction`, `skipSection`, or `actionCompleted` |
| Admin UI shell | [ui/src/containers/FlowContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/FlowContainer.jsx), [ui/src/routes/NotFound.jsx](https://github.com/browserable/browserable/blob/main/ui/src/routes/NotFound.jsx) | Renders the live run timeline, message logs, screenshots, and code/markdown payloads |
| JavaScript SDK | [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts) | Typed client (`BrowserableConfig`, `Task`, `TaskRunStatus`, `TaskRunGifResult`) that wraps the REST API |

### Agent Execution Loop

The browser agent calls a shared `callOpenAICompatibleLLMWithRetry` helper with a fallback chain of `gemini-2.0-flash`, `deepseek-chat`, `gpt-4o-mini`, `claude-3-5-haiku`, and `qwen-plus` (see [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js)). For each chunk of a long page it: scrolls, waits for a settled DOM, takes a screenshot, asks the LLM to extract structured content, and recursively calls `textExtractHelper` until the schema is satisfied or the run is no longer active. A separate refine step (`buildRefineExtractedContentPrompt`) consolidates the chunked extractions before returning.

## Getting Started

Browserable is distributed primarily through the `npx browserable` CLI. Per [community issue #8](https://github.com/browserable/browserable/issues/8), the CLI supports a `--help` flag and a `down` subcommand to tear the stack down; the canonical `npx browserable` flow boots the Docker Compose stack defined under `deployment/`.

A typical first run:

1. Ensure Docker and a working `node`/`npm` are on `PATH`. (See the troubleshooting note below for the common WSL failure mode.)
2. Run `npx browserable` from any directory; the CLI pulls and starts the Admin UI, the `tasks` service, the browser service, and Postgres.
3. Open the Admin UI; the dashboard route renders the live run timeline implemented in [ui/src/containers/FlowContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/FlowContainer.jsx).
4. Author a task in natural language — Jarvis breaks it into nodes, dispatches the `BROWSER_AGENT` for browser work, and writes structured output back to the message log ([tasks/agents/jarvis.js](https://github.com/browserable/browserable/blob/main/tasks/agents/jarvis.js)).
5. From a separate Node project, install the SDK:

```bash
npm install browserable-js
```

The SDK exposes the typed surface from [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts) (`CreateTaskOptions`, `TaskRunStatus`, `WaitForRunOptions`) and is the recommended integration path for headless automation.

## Common Setup Issues

The community has surfaced three recurring first-run failure modes that are worth documenting up front:

- **"Initial setup is in progress" hang** ([issue #6](https://github.com/browserable/browserable/issues/6)). The maintainer confirmed this almost always means the `tasks` service did not come up healthy. `docker ps` will surface the unhealthy container; `docker exec -it browserable` then inspect the logs to identify the cause.
- **`/usr/bin/env: 'node --no-warnings': No such file or directory`** ([issue #20](https://github.com/browserable/browserable/issues/20)). Reported on Ubuntu WSL when `node` resolves through `fnm`/`nvm` shims that the shebang cannot locate. The workaround is to invoke the CLI with an absolute path to `node`, e.g. `node $(which npx) browserable`, or to ensure `node` is on a stable `PATH`.
- **LLM provider coverage.** Groq and OpenRouter are tracked as `roadmap: planned` ([issue #5](https://github.com/browserable/browserable/issues/5)); Ollama/local LLMs are `roadmap: requests` ([issue #9](https://github.com/browserable/browserable/issues/9)). Until first-class support lands, [issue #3](https://github.com/browserable/browserable/issues/3) documents the manual workaround: replace `https://api.openai.com/v1/chat/completions` with the provider's OpenAI-compatible endpoint in the source.

## See Also

- Custom Tools / Functions guide (closed in [issue #10](https://github.com/browserable/browserable/issues/10))
- Task GIF generation via REST API and JS SDK ([issue #13](https://github.com/browserable/browserable/issues/13))
- Local browser support ([issue #4](https://github.com/browserable/browserable/issues/4))
- Troubleshooting documentation ([issue #15](https://github.com/browserable/browserable/issues/15))

---

<a id='page-agents-llm'></a>

## AI Agents, Prompts & LLM Integration

### Related Pages

Related topics: [Overview, Architecture & Getting Started](#page-overview), [REST API, JavaScript SDK & Custom Functions](#page-api-sdk)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js)
- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js)
- [tasks/agents/generative.js](https://github.com/browserable/browserable/blob/main/tasks/agents/generative.js)
- [tasks/prompts/agents/browserable/extractPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/extractPrompts.js)
- [tasks/prompts/agents/browserable/actionPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/actionPrompts.js)
- [tasks/prompts/agents/jarvis/index.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/jarvis/index.js)
- [tasks/prompts/agents/jarvis/datatablePrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/jarvis/datatablePrompts.js)
- [tasks/prompts/agents/jarvis/richOutputPrompt.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/jarvis/richOutputPrompt.js)
- [tasks/prompts/agents/deepresearch/processSerpsPrompt.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/deepresearch/processSerpsPrompt.js)
- [README.md](https://github.com/browserable/browserable/blob/main/README.md)
- [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)
</details>

# AI Agents, Prompts & LLM Integration

## Overview

Browserable is an open-source browser automation library for AI agents that currently reaches 90.4% on the Web Voyager benchmarks [Source: [README.md]()](). The system is built around a multi-agent architecture in which each agent is responsible for a distinct capability (browser interaction, LLM passthrough, orchestration, research), and every LLM-backed step is driven by structured prompts that produce JSON-shaped tool calls.

The "AI Agents, Prompts & LLM Integration" subsystem therefore covers three concerns:

1. The **agent classes** that define what each agent can do and how it terminates.
2. The **prompt library** that instructs the underlying LLM at each step (extraction, action selection, routing, output formatting).
3. The **LLM integration layer** that talks to OpenAI-compatible endpoints with a model-fallback list per use case.

## Agent Architecture

All agents extend a shared `BaseAgent` defined in `tasks/agents/base.js`. The base class provides three reusable action handlers:

| Action | Purpose |
| --- | --- |
| `error` | Logs an irrecoverable error to user/debug logs and calls `errorAtNode` [Source: [tasks/agents/base.js]()](). |
| `end` | Writes final `output` and `reasoning` markdown to the user log and closes the node as `completed` [Source: [tasks/agents/base.js]()](). |
| `getBaseActions` | Inherited by all subclasses to register custom actions. |

Four concrete agent implementations ship in the repository:

- **BrowserableAgent** (`CODE = "BROWSER_AGENT"`) — the headline browser automation agent. It exposes four high-level actions — `open_new_tab`, `read_tab`, `act_on_tab`, and `extract_from_tab` — and is described in its system prompt as the agent to use "if the user explicitly asks you to do something on the browser" [Source: [tasks/agents/browserable.js]()](). It always confirms results with a `read_tab` after each `act_on_tab`.
- **GenerativeAgent** (`CODE = "GENERATIVE_AGENT"`) — a thin wrapper that passes a `task` string to an LLM and returns an `output` string. The system prompt explicitly notes it is a "simple vanilla dumb agent" intended for trivial text-to-text calls [Source: [tasks/agents/generative.js]()]().
- **JarvisAgent** — the orchestrator/router that decides which sub-agent handles a row of a data table; its prompts live in `tasks/prompts/agents/jarvis/` and are exported through `index.js` [Source: [tasks/prompts/agents/jarvis/index.js]()]().
- **DeepResearchAgent** — drives multi-step web research, including SERP processing prompts such as `buildProcessSerpsPrompt` [Source: [tasks/prompts/agents/deepresearch/processSerpsPrompt.js]()]().

```mermaid
flowchart TB
    User[User / SDK Caller] --> Tasks[tasks service]
    Tasks --> Jarvis[JarvisAgent<br/>orchestrator]
    Jarvis -->|route sub-task| Browserable[BrowserableAgent<br/>BROWSER_AGENT]
    Jarvis -->|route sub-task| Gen[GenerativeAgent<br/>GENERATIVE_AGENT]
    Jarvis -->|route sub-task| Research[DeepResearchAgent]
    Browserable --> LLM[(OpenAI-compatible<br/>LLM endpoint)]
    Gen --> LLM
    Jarvis --> LLM
    Research --> LLM
    LLM -->|tool/function call JSON| Agent[Selected Agent]
    Agent -->|updateNodeUserLog<br/>endNode| Tasks
```

## LLM Integration

The LLM layer is OpenAI-compatible and accepts a list of candidate models that are tried in order, enabling graceful fallback. Two example call sites illustrate the pattern:

- Refining extracted page content uses the cascade `["gemini-2.0-flash", "deepseek-chat", "gpt-4o-mini", "claude-3-5-haiku", "qwen-plus"]` with `max_attempts: 3` [Source: [tasks/agents/browserable.js]()]().
- Deciding the next Playwright action uses `["gemini-2.0-flash", "deepseek-chat", "claude-3-5-sonnet", "gpt-4o", "qwen-plus"]` with the same retry budget [Source: [tasks/agents/browserable.js]()]().

Every call carries a `metadata` object with `runId`, `nodeId`, `agentCode`, `usecase`, `flowId`, `accountId`, and `threadId`, which the orchestrator uses to attribute logs back to the right node. Callers can short-circuit a long-running run by checking `jarvis.isRunActive({ runId, flowId })` between LLM calls; if the run was cancelled, the agent returns early with `completed: false` and a descriptive message [Source: [tasks/agents/browserable.js]()]().

### Community notes on LLM providers

- The default deployment targets OpenAI directly; the maintainers have noted that Groq and OpenRouter can be enabled by replacing `https://api.openai.com/v1/chat/completions` with a Groq-compatible URL [Source: issue [#3](https://github.com/browserable/browserable/issues/3), [#5](https://github.com/browserable/browserable/issues/5)]().
- Local LLM support (Ollama) is on the roadmap as a request [Source: issue [#9](https://github.com/browserable/browserable/issues/9)]().

## Prompt System

Prompts are colocated with their agents under `tasks/prompts/agents/<agent-name>/` and exported as builder functions that return OpenAI-style `messages` arrays. Major prompt modules include:

- `extractPrompts.js` — `buildExtractLLMPrompt` instructs the LLM to print exact text from a rendered webpage or DOM slice and to emit JSON with a `justification` field. It is sensitive to whether the input is a text rendering or a raw DOM list [Source: [tasks/prompts/agents/browserable/extractPrompts.js]()]().

- `actionPrompts.js` — defines the function-calling schema for `doAction`, `skipSection`, and `actionCompleted`. The LLM must emit exactly one of these three JSON shapes, each carrying a `reason` plus optional Playwright `method`/`args`/`element` [Source: [tasks/prompts/agents/browserable/actionPrompts.js]()]().

- `richOutputPrompt.js` — assembles the final structured answer for a user. It enforces a hard ceiling of 4000 words for the entire `outputGenerated` object and reminds the model to honor per-field word limits [Source: [tasks/prompts/agents/jarvis/richOutputPrompt.js]()]().

- `datatablePrompts.js` — describes how Jarvis decomposes a user request into rows, delegates each row to a sub-agent, and merges results back. It codifies the `work_on_subtask_before_deciding` action code used when a row's prerequisites are missing [Source: [tasks/prompts/agents/jarvis/datatablePrompts.js]()]().

- `processSerpsPrompt.js` — directs the deep-research model to extract at most eight unique, dense learnings and three follow-up questions from SERP content [Source: [tasks/prompts/agents/deepresearch/processSerpsPrompt.js]()]().

## Common Failure Modes and Workarounds

- **`/usr/bin/env: 'node --no-warnings': No such file or directory`** when running `npx browserable` inside WSL/Ubuntu with a `fnm` multishell. The shebang is interpreted by a shell that splits on spaces; the workaround is to launch `npx` from a regular login shell where `node` resolves to a single path (for example, `which node` returning `/run/user/0/fnm_multishells/.../node` confirms the environment quirk) [Source: issue [#20](https://github.com/browserable/browserable/issues/20)]().

- **Stuck "Initial setup is in progress"** on the admin UI. The maintainers recommend `docker ps` to check for an unhealthy `tasks` service, then `docker exec -it browserable ...` to inspect logs; the frontend is waiting for the backend health check to succeed [Source: issue [#6](https://github.com/browserable/browserable/issues/6)]().

- **No Groq / OpenRouter entry in the Admin UI**. There is no first-class toggle yet, so users must point the OpenAI base URL at a compatible endpoint by editing the configuration [Source: issues [#3](https://github.com/browserable/browserable/issues/3), [#5](https://github.com/browserable/browserable/issues/5)]().

## See Also

- [Task Runs & Status Polling](task-runs-and-status.md) — covers `TaskRunStatus` and the `WaitForRunOptions` poll loop [Source: [sdk/browserable-js/src/types.ts]()]().

- [Custom Tools & Functions](custom-tools.md) — the public guide for registering user-defined tool calls alongside the built-in agent actions [Source: issue [#10](https://github.com/browserable/browserable/issues/10)]().

- [Local Browser & Deployment](local-browser-and-deployment.md) — running the `browserable` Docker stack and CLI (`npx browserable --help`, `npx browserable down`) for local browser support [Source: issues [#4](https://github.com/browserable/browserable/issues/4), [#8](https://github.com/browserable/browserable/issues/8)]().

- [Troubleshooting](troubleshooting.md) — covers the `Initial setup is in progress` symptom and other deployment pitfalls [Source: issue [#15](https://github.com/browserable/browserable/issues/15)]().

---

<a id='page-api-sdk'></a>

## REST API, JavaScript SDK & Custom Functions

### Related Pages

Related topics: [Overview, Architecture & Getting Started](#page-overview), [AI Agents, Prompts & LLM Integration](#page-agents-llm)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/browserable/browserable/blob/main/README.md)
- [sdk/browserable-js/README.md](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/README.md)
- [sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)
- [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)
- [sdk/examples/js-sdk-test/README.md](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/README.md)
- [sdk/examples/js-sdk-test/package.json](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/package.json)
- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js)
- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js)
- [tasks/prompts/agents/browserable/extractPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/extractPrompts.js)
- [tasks/prompts/agents/browserable/actionPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/actionPrompts.js)
- [ui/src/containers/SettingsContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/SettingsContainer.jsx)
</details>

# REST API, JavaScript SDK & Custom Functions

## Overview

Browserable exposes its browser-automation capabilities through a REST API, a typed JavaScript/TypeScript SDK, and an extension point for custom tools/functions. Together these form the developer-facing surface for programmatically creating tasks, polling run status, retrieving run results, generating task GIFs, and extending the built-in `BROWSER_AGENT` with user-defined functions.

The project positions itself as open-source and self-hostable. The README directs users to a hosted REST endpoint and a JS SDK guide ([README.md](https://github.com/browserable/browserable/blob/main/README.md)). The SDK is published as `browserable-js` ([sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)) and depends on `axios ^1.6.7` for HTTP transport.

A bundled example project at `sdk/examples/js-sdk-test` ([sdk/examples/js-sdk-test/README.md](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/README.md)) demonstrates the typical user flow against a local API at `http://localhost:2003/api/v1`.

## REST API Surface

The JavaScript SDK is a thin wrapper over the REST API, so the SDK method list effectively documents the supported endpoints. The SDK methods are typed in [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts) and demonstrated in [sdk/browserable-js/README.md](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/README.md).

| SDK Method | Purpose | Returns |
|---|---|---|
| `createTask({ task, agent, triggers })` | Submit a new task for the default or specified agent. | `{ taskId }` |
| `listTasks({ page, limit })` | List tasks for the authenticated account. | Paginated `Task[]` |
| `getTaskRunStatus(taskId, runId?)` | Poll the lifecycle state of a run. | `TaskRunStatus` |
| `getTaskRunResult(taskId, runId?)` | Fetch final output of a run. | `TaskRunResult` |
| `getTaskRunGif(taskId, runId)` | Retrieve a rendered GIF of the run. | `TaskRunGifResult` |
| `stopRun(taskId, runId?)` | Cancel a running task. | API envelope |
| `waitForRun(taskId, options?)` | Block-poll until status is terminal. | Final status |
| `getUserProfile()` | Return the authenticated user. | Profile object |
| `listBrowsers()` | Enumerate registered browser providers. | Browser list |

All responses share a common envelope `ApiResponse<T>` defined in [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts), with `success: boolean`, optional `data`, optional `error`, and pagination fields (`total`, `page`, `limit`).

```mermaid
sequenceDiagram
    participant App as Caller (SDK / curl)
    participant API as REST API
    participant Tasks as tasks service
    participant Agent as BROWSER_AGENT
    App->>API: POST /tasks (createTask)
    API->>Tasks: enqueue
    Tasks->>Agent: schedule node
    Agent-->>Tasks: status updates
    App->>API: GET /tasks/:id/runs/:runId (getTaskRunStatus)
    App->>API: GET /tasks/:id/runs/:runId/result
    App->>API: GET /tasks/:id/runs/:runId/gif
```

Task run lifecycle values shown in the type definitions are `scheduled | running | completed | error` ([sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)). The GIF endpoint mirrors this with `pending | completed | error` and a `url` field for the rendered asset.

The example test harness iterates the lifecycle by listing tasks and creating a browser session ([sdk/examples/js-sdk-test/README.md](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/README.md)), confirming the practical order of calls a developer is expected to make.

## JavaScript SDK

The SDK is implemented in TypeScript and built with `tsc` ([sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)). It exports a `Browserable` class initialized with an API key and optional `baseURL` ([sdk/browserable-js/README.md](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/README.md)). The default base URL points to the hosted service; the example project overrides it to `http://localhost:2003/api/v1` ([sdk/examples/js-sdk-test/README.md](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/README.md)).

```typescript
import { Browserable } from 'browserable-js';

const browserable = new Browserable({
  apiKey: 'your-api-key',
});

const { data } = await browserable.createTask({
  task: 'Visit example.com and extract all links',
  agent: 'BROWSER_AGENT',
});
```

`createTask` accepts an optional `agent` selector, allowing callers to route a task to a specific agent implementation. The built-in `BROWSER_AGENT` is defined in [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js) with the constant `this.CODE = "BROWSER_AGENT"`.

The SDK ships convenience helpers for long-running runs. `waitForRun` accepts `pollInterval` (default 1000 ms) and `timeout` (default 300000 ms) and an optional `onStatusChange` callback ([sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)). Combined with `stopRun`, this lets a caller implement cancellation, progress streaming, and dead-letter handling entirely from JavaScript.

## Custom Functions

Custom tools/functions are the extension point of `BROWSER_AGENT`. They are reached through the `customFunctions` and `end` actions on the base agent ([tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js)). The base agent's `_action_end` writes a user-visible "Agent completed." message along with `output` and `reasoning` markdown, and signals node completion via `jarvis.endNode(...)`.

For browser tasks, the `BROWSER_AGENT` is built on top of a stable set of LLM-driven actions defined in [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js):

- `open_new_tab` — opens a URL in a fresh browser session and returns a list of tabs.
- `read_tab` — converts a tab's HTML to markdown for the LLM context.
- `act_on_tab` — performs a click, type, or other interaction, verified by a vision-capable model.
- `extract_from_tab` — runs schema-guided extraction over DOM or text, refining the result through `buildRefineExtractedContentPrompt`.

The refinement step is the natural place to plug in custom functions: the prompt builder in [tasks/prompts/agents/browserable/extractPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/extractPrompts.js) accepts `instructions`, `schema`, `previouslyExtractedContent`, and `domElements`, which can be populated by user-defined helpers. Action selection itself is driven by [tasks/prompts/agents/browserable/actionPrompts.js](https://github.com/browserable/browserable/blob/main/tasks/prompts/agents/browserable/actionPrompts.js), which constrains the LLM to emit `doAction`, `skipSection`, or `actionCompleted` JSON — a contract that custom function authors can rely on when registering new callable tools.

> Community note: Custom tools/functions V1 is documented as live, and `https://docs.browserable.ai/guides/custom-functions` is the canonical guide referenced from the issue tracker. Task GIFs are also exposed via both the REST API and JS SDK (see `getTaskRunGif`) ([sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)).

## LLM Provider Configuration

Both the REST API and the SDK ultimately call the same LLM abstraction. The Admin UI exposes the underlying provider list in [ui/src/containers/SettingsContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/SettingsContainer.jsx), which collects `openai`, `claude`, and `gemini` API keys and stores them under `userApiKeys` on the account. Browser-side providers (`hyperBrowser`, `steel`) are stored under `userBrowserApiKeys`.

The agent layer further downgrades between providers on errors. The refinement call in [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js) iterates over `gemini-2.0-flash`, `deepseek-chat`, `gpt-4o-mini`, `claude-3-5-haiku`, and `qwen-plus` through `callOpenAICompatibleLLMWithRetry`, which means any provider offering an OpenAI-compatible endpoint (e.g. Groq) can be wired in by changing the upstream URL — a workaround the maintainers point to in the issue tracker for Groq/OpenRouter support.

> Community note: Local LLMs (Ollama) and full Groq/OpenRouter support remain open roadmap items. Until they land, the supported configuration path is via the Admin UI providers list, with OpenAI-compatible endpoints supported through code-level URL substitution.

## See Also

- [README.md](https://github.com/browserable/browserable/blob/main/README.md) — project overview, links to hosted REST docs and JS SDK guide.
- [sdk/browserable-js/README.md](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/README.md) — full SDK reference and examples.
- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js) — base agent lifecycle, `end` action, error reporting.
- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js) — `BROWSER_AGENT` actions and LLM fallback chain.
- [ui/src/containers/SettingsContainer.jsx](https://github.com/browserable/browserable/blob/main/ui/src/containers/SettingsContainer.jsx) — provider and browser API key configuration.

---

<a id='page-deployment-ops'></a>

## Deployment, Configuration & Troubleshooting

### Related Pages

Related topics: [Overview, Architecture & Getting Started](#page-overview), [REST API, JavaScript SDK & Custom Functions](#page-api-sdk)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/browserable/browserable/blob/main/README.md)
- [deployment/docker-compose.dev.yml](https://github.com/browserable/browserable/blob/main/deployment/docker-compose.dev.yml)
- [deployment/.env](https://github.com/browserable/browserable/blob/main/deployment/.env)
- [deployment/supabase-docker/docker-compose.yml](https://github.com/browserable/browserable/blob/main/deployment/supabase-docker/docker-compose.yml)
- [deployment/supabase-docker/docker-compose.s3.yml](https://github.com/browserable/browserable/blob/main/deployment/supabase-docker/docker-compose.s3.yml)
- [docs/development/environment-variables.md](https://github.com/browserable/browserable/blob/main/docs/development/environment-variables.md)
- [docs/development/troubleshooting.mdx](https://github.com/browserable/browserable/blob/main/docs/development/troubleshooting.mdx)
- [sdk/browserable-js/package.json](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/package.json)
- [sdk/browserable-js/src/types.ts](https://github.com/browserable/browserable/blob/main/sdk/browserable-js/src/types.ts)
- [sdk/examples/js-sdk-test/README.md](https://github.com/browserable/browserable/blob/main/sdk/examples/js-sdk-test/README.md)
- [ui/package.json](https://github.com/browserable/browserable/blob/main/ui/package.json)
- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js)
- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js)
- [tasks/agents/jarvis.js](https://github.com/browserable/browserable/blob/main/tasks/agents/jarvis.js)
</details>

# Deployment, Configuration & Troubleshooting

Browserable is shipped as a self-hostable, Docker-based platform for running AI browser agents. The repository contains three runtime layers that must come up together: the **admin UI** (React/electron-forge dashboard), the **tasks service** (the Node.js agent runtime that drives Playwright sessions and calls LLMs), and the **data layer** (Postgres/Supabase + S3-compatible storage). This page covers the supported deployment paths, the configuration surface you need to know about, and the most common failure modes reported by users on GitHub.

## Deployment Paths

### Quick Start: `npx browserable`

The README documents `npx browserable` as the fastest onboarding path. It bootstraps a local Docker Compose stack and opens the admin UI on `http://localhost:2001`, where you enter your LLM and remote-browser API keys to begin running tasks [Source: [README.md:18-32]()]. This is the recommended path for new users and matches the `down` and `--help` commands requested in issue #8.

### Manual Docker Compose Deployment

For contributors and self-hosters, the canonical path is:

1. Install Docker and Docker Compose.
2. Clone the repository and `cd deployment`.
3. Start the dev stack:
   ```bash
   docker-compose -f docker-compose.dev.yml up
   ```
4. Open the admin dashboard at `http://localhost:2001` to set your LLM and remote-browser keys [Source: [README.md:34-50]()].

Under the hood, the deployment folder orchestrates several compose files. The base stack (`deployment/docker-compose.dev.yml`) brings up the UI, the tasks service, and the agent worker. Optional companion files (`deployment/supabase-docker/docker-compose.yml` and `docker-compose.s3.yml`) provide the local Postgres/Supabase backend and an S3-compatible object store used for screenshots and run GIFs.

### High-Level Architecture

```mermaid
flowchart LR
    User([Operator]) --> Admin[Admin UI<br/>localhost:2001]
    Admin -->|API keys & config| Tasks[Tasks Service<br/>Node.js agents]
    Tasks -->|Playwright CDP| Browser[(Remote Browser<br/>e.g. BrowserBase)]
    Tasks -->|Chat completions| LLM[(LLM Provider<br/>OpenAI-compatible)]
    Tasks -->|Run state & logs| DB[(Postgres / Supabase)]
    Tasks -->|Screenshots & GIFs| S3[(S3-compatible store)]
```

## Configuration

### API Keys

All sensitive credentials are entered through the admin UI after the stack is up; they are persisted in the database rather than only in `.env` files. The two credential classes you must provide are:

- **LLM API key** — used by the `callOpenAICompatibleLLMWithRetry` helper, which fans out across a default model roster such as `gemini-2.0-flash`, `deepseek-chat`, `gpt-4o-mini`, `claude-3-5-haiku`, and `qwen-plus` [Source: [tasks/agents/browserable.js:33-49]()].
- **Remote browser key** — used by `browserService.getPlaywrightBrowser()` to acquire a Playwright `connectUrl` and `sessionId` for each run [Source: [tasks/agents/browserable.js:91-101]()].

### Environment Variables

The most commonly referenced environment variables live in `deployment/.env` and are documented in `docs/development/environment-variables.md`. The key categories are summarized below.

| Category | Purpose | Example |
|---|---|---|
| Database | Postgres / Supabase connection for the tasks service | `DATABASE_URL`, `SUPABASE_URL` |
| Object storage | S3-compatible endpoint for screenshots and task-run GIFs | `S3_ENDPOINT`, `S3_BUCKET`, `S3_ACCESS_KEY`, `S3_SECRET_KEY` |
| LLM | OpenAI-compatible base URL and key (overrideable in code) | `OPENAI_API_KEY`, `OPENAI_BASE_URL` |
| Browser | Remote browser provider credentials | `BROWSERBASE_API_KEY`, `BROWSERBASE_CONNECT_URL` |
| Ports | UI and API ports | `2001` (UI), `2003` (REST API) |

The REST API base URL is also reflected in the JS SDK examples, which target `http://localhost:2003/api/v1` by default [Source: [sdk/examples/js-sdk-test/README.md:7-9]()].

### Using Non-OpenAI Providers (Groq, OpenRouter, Ollama)

The admin UI / Docker compose flow is wired to OpenAI today. For Groq, OpenRouter, or any other OpenAI-compatible endpoint, the maintainers' guidance in issue #3 is to replace the OpenAI chat-completions URL in code with the provider's OpenAI-compatible URL while reusing the same key plumbing [Source: GitHub issue #3]. A native admin-UI selector for Groq and OpenRouter is tracked as planned work in issue #5. Local LLMs such as Ollama follow the same pattern and are tracked in issue #9.

### JavaScript SDK

A published SDK ships as `browserable-js` (v1.0.1) and exposes a typed `BrowserableConfig` (`apiKey`, `baseURL`), `CreateTaskOptions`, `TaskRunStatus`, and a `WaitForRunOptions` helper with `pollInterval`, `timeout`, and `onStatusChange` callback [Source: [sdk/browserable-js/package.json:1-25](), [sdk/browserable-js/src/types.ts:1-50]()].

## Troubleshooting Common Issues

### "Initial setup is in progress" never resolves

This typically means the **tasks** service has not finished booting. Diagnose it from the host:

```bash
docker ps                  # look for unhealthy / restarting tasks container
docker exec -it browserable-<container> ...
```

[Source: GitHub issue #6]. If the tasks container is unhealthy, check its logs and the `DATABASE_URL` / S3 reachability before retrying.

### `npx browserable` fails with `node --no-warnings` not found

The CLI shebang uses `env node --no-warnings`. On WSL/Ubuntu with shell shims such as `fnm_multishells`, `/usr/bin/env` expands the whole string as a single binary name, producing `/usr/bin/env: 'node --no-warnings': No such file or directory` [Source: GitHub issue #20]. Workarounds reported in the thread include invoking `npx browserable` from a shell where `node` resolves through a wrapper that supports `-S`-style option splitting, or running the manual Docker Compose path instead.

### Custom LLM endpoint not being picked up

If the admin UI rejects a non-OpenAI key or silently falls back, remember that the URL and key are read from the OpenAI-compatible client directly. Confirm the base URL was updated in code, then restart the tasks container so the change is loaded [Source: GitHub issue #3].

### Agent runs that hang or fail mid-flow

The agent runtime short-circuits gracefully when a run is cancelled via `jarvis.isRunActive()` checks in both the text-extraction and DOM-extraction helpers [Source: [tasks/agents/browserable.js:7-25](), [tasks/agents/browserable.js:309-340]()]. If a run stays in `running` indefinitely, inspect the agent and debug logs surfaced through the admin UI; these correspond to `updateNodeAgentLog` and `updateNodeDebugLog` calls in `base.js` and `jarvis.js`.

### "Task is not active" or "Tab with ID ... not found"

These errors are raised when a run has been stopped between scheduling and execution, or when a requested `tabId` no longer matches a live Playwright page. The `extractHelper` and `textExtractHelper` both wrap their work in `isRunActive` guards and return a structured failure rather than throwing [Source: [tasks/agents/browserable.js:255-275]()]. Re-run the task from the admin UI; if the error persists, the browser session likely lost its CDP connection and a new session must be provisioned.

A consolidated troubleshooting reference lives at `docs/development/troubleshooting.mdx` (issue #15, live). For unresolved issues, the maintainers track new reports in the GitHub issue tracker as the public roadmap (issue #7).

## See Also

- [README.md](https://github.com/browserable/browserable/blob/main/README.md) — quick start and architecture overview
- [docs/development/environment-variables.md](https://github.com/browserable/browserable/blob/main/docs/development/environment-variables.md) — full env-var reference
- [docs/development/troubleshooting.mdx](https://github.com/browserable/browserable/blob/main/docs/development/troubleshooting.mdx) — detailed troubleshooting guide
- [sdk/browserable-js](https://github.com/browserable/browserable/tree/main/sdk/browserable-js) — JavaScript SDK source
- [tasks/agents/browserable.js](https://github.com/browserable/browserable/blob/main/tasks/agents/browserable.js) — browser agent implementation
- [tasks/agents/base.js](https://github.com/browserable/browserable/blob/main/tasks/agents/base.js) — base agent lifecycle
- [tasks/agents/jarvis.js](https://github.com/browserable/browserable/blob/main/tasks/agents/jarvis.js) — orchestrator / run scheduling

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: browserable/browserable

Summary: Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/browserable/browserable/issues/20

## 2. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/browserable/browserable

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/browserable/browserable

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/browserable/browserable

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/browserable/browserable

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/browserable/browserable

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/browserable/browserable

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/browserable/browserable

<!-- canonical_name: browserable/browserable; human_manual_source: deepwiki_human_wiki -->
