# https://github.com/lavague-ai/LaVague Project Manual

Generated at: 2026-06-23 14:03:25 UTC

## Table of Contents

- [Overview & System Architecture](#page-1)
- [Core Engines: World Model, Navigation, Action & Python Engines](#page-2)
- [Drivers, Contexts & Integrations Ecosystem](#page-3)
- [Developer Tooling: LaVague QA, Test Runner, Server & Gradio](#page-4)

<a id='page-1'></a>

## Overview & System Architecture

### Related Pages

Related topics: [Core Engines: World Model, Navigation, Action & Python Engines](#page-2), [Drivers, Contexts & Integrations Ecosystem](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)
- [lavague-core/lavague/core/__init__.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/__init__.py)
- [lavague-core/lavague/core/agents.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/agents.py)
- [lavague-core/lavague/core/world_model.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/world_model.py)
- [lavague-core/lavague/core/action_engine.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/action_engine.py)
- [lavague-core/lavague/core/utilities/telemetry.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/utilities/telemetry.py)
- [lavague-tests/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)
- [lavague-qa/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)
- [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)
- [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)
- [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)
- [extension_chrome/package.json](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/package.json)
- [extension_chrome/src/actionSchemas.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/actionSchemas.ts)
- [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)
- [extension_chrome/src/parseactions.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/parseactions.ts)
- [extension_chrome/src/app/component/Logs.tsx](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/app/component/Logs.tsx)
</details>

# Overview & System Architecture

## Purpose and Scope

LaVague is an open-source framework for building **Large Action Models** that drive a web browser from a natural-language objective. Rather than hard-coding selectors, the framework lets an LLM decide what to do next on a page; LaVague then translates that decision into real browser automation. As stated in the project README, the system is built around two cooperating parts: *“A World Model that takes an objective and the current state (aka the current web page) and outputs an appropriate set of instructions”* and *“An Action Engine which 'compiles' these instructions into action code, e.g., Selenium or Playwright & executes them.”* Source: [README.md]()

The framework targets three audiences:

- **Application developers** who want a programmatic `WebAgent` they can call from Python (`agent.run("Go on the quicktour of PEFT")`) — Source: [README.md]()
- **QA engineers**, served by the dedicated `lavague-qa` CLI that converts Gherkin specs into pytest tests — Source: [lavague-qa/README.md]()
- **End users / demos** that interact through a Gradio UI or a Chrome extension — Source: [README.md]()

## Core Architecture

At runtime, every agent iteration follows the same loop: observe the page, decide what to do, then execute the action and repeat. The high-level data flow is shown below.

```mermaid
flowchart LR
    User([User objective]) --> Agent[WebAgent]
    Agent --> WM[WorldModel<br/>LLM + multi-modal LLM]
    Driver[Driver<br/>Selenium / Playwright / Chrome Ext.] -->|HTML, screenshot| WM
    WM -->|Instructions| AE[ActionEngine]
    AE -->|Compiled code or XPath ops| Driver
    Driver -->|New page state| Agent
    Agent --> Context[Context<br/>prompts, models, token counter]
    Context -.config.-> WM
    Context -.config.-> AE
    Telemetry[Telemetry<br/>anonymous usage] -.opt-in.-> Agent
```

The `WebAgent` is the orchestrator exposed from `lavague.core` — Source: [lavague-core/lavague/core/__init__.py]() — and wires a `WorldModel`, an `ActionEngine`, and a driver together. The `WorldModel` consumes the current page (HTML/screenshot) plus the objective and emits high-level instructions — Source: [README.md](). The `ActionEngine` then turns those instructions into executable operations. The Chrome-extension code path shows what those operations look like in practice: structured `click`, `type`, etc., each carrying an XPath and a value, validated against a Zod `toolSchemaUnion` — Source: [extension_chrome/src/actionSchemas.ts]() and [extension_chrome/src/parseactions.ts]().

Drivers are the only layer that touches the browser. The three currently supported are **Selenium**, **Playwright**, and a **Chrome extension**, and not every feature is implemented on every driver — Source: [README.md](). The server-side driver example illustrates that the Chrome-extension path uses XPath-based interactions (e.g. `click` on a dropdown tab) rather than arbitrary code execution — Source: [lavague-server/lavague/server/driver.py](). This is consistent with community issue #352, which argues that the Navigation Engine should output *“XPath, type of action and other arguments”* rather than arbitrary code — Source: GitHub issue #352 referenced in the community context.

## Key Components

The repository is organized as a small monorepo. The most important modules are:

| Module | Purpose |
|---|---|
| `lavague-core` | Defines `WebAgent`, `WorldModel`, `ActionEngine`, retriever pipelines, extractors, and telemetry. — Source: [README.md]() |
| `lavague-drivers` (Selenium / Playwright / Chrome) | Pluggable browser back-ends. — Source: [README.md]() |
| `lavague-integrations/contexts` | Built-in and custom "contexts" (prompt + model bundles). Includes the cache context that wraps LLMs / multi-modal LLMs / embeddings to make runs deterministic and cheaper. — Source: [lavague-integrations/contexts/lavague-contexts-cache/README.md]() |
| `lavague-server` | Long-lived agent sessions over a channel; `AgentSession` exposes `run`, `run_step`, `get`, `prepare_run`, etc., and emits `start`/`stop` events — Source: [lavague-server/lavague/server/channel.py]() |
| `lavague-tests` | `lavague-test` CLI that runs YAML-defined tasks against real sites and reports success/failure — Source: [lavague-tests/README.md]() |
| `lavague-qa` | `lavague-qa` CLI that turns Gherkin features into pytest files — Source: [lavague-qa/README.md]() |
| `extension_chrome` | A React/Chakra UI that runs the same agent loop inside the browser, with a Zod-validated action schema, YAML/JSON response parsing, and a `Logs` panel — Source: [extension_chrome/package.json](), [extension_chrome/src/parseactions.ts](), [extension_chrome/src/app/component/Logs.tsx]() |

A `Context` is the configuration bundle for an agent: it supplies the LLM, the multi-modal LLM, the embedding model, and the prompt templates used by the World Model and Action Engine — Source: [README.md](). The cache context shown in `lavague-integrations/contexts/lavague-contexts-cache/README.md` wraps those models so that *"cached scenario can be replayed offline"*, which is valuable for repeatable testing — Source: [lavague-integrations/contexts/lavague-contexts-cache/README.md]().

## Telemetry, Logging, and Community Concerns

LaVague ships with **anonymous telemetry** that records things like the model in use, the objective, the chain of thoughts, the bounding box of the interaction zone, the URL, token usage, and whether the action succeeded. The `LAVAGUE_TELEMETRY` environment variable controls it; setting it to `"NONE"` disables all telemetry — Source: [README.md]().

Because the system is LLM-driven, observability is a recurring community theme. GitHub issue #241 requests structured *“logging of the agent flow / experiments”* so that users can review reasoning and improve the world model — referenced in the community context above. LaVague already exposes multiple logging touchpoints:

- The Chrome-extension `Logs` component, which classifies events into `network`, `cmd`, `userprompt`, and `agent_log` and renders a live stream with deduplicated, counted entries — Source: [extension_chrome/src/app/component/Logs.tsx]()
- The server `AgentSession`, which emits `start`/`stop` events around each `agent.run` call — Source: [lavague-server/lavague/server/channel.py]()
- The test runner, which prints a pass/fail report per task with the observed URL and status — Source: [lavague-tests/README.md]()

The community also flags architectural concerns that the codebase is actively addressing. Issue #440 argues that *“the engines under the ActionEngine generate and execute the code”* and that execution should be separated from generation for observability and remote execution — community context. Issue #1 tracks **Playwright** parity, which is shown as “⏳ coming soon” for several features in the driver matrix — Source: [README.md](). Issue #272 explores swapping the default OpenAI back-end for a fully local multi-modal model such as **Phi-3** — community context.

## See Also

- [Customization & Contexts](https://docs.lavague.ai/en/latest/docs/get-started/customization/)
- [Test Runner (`lavague-test`)](https://docs.lavague.ai/en/latest/docs/get-started/testing/)
- [Token Usage & Cost Estimation](https://docs.lavague.ai/en/latest/docs/get-started/token-usage/)
- [Gradio Interactive Demo](https://docs.lavague.ai/en/latest/docs/get-started/gradio/)
- [LaVague QA (Gherkin → pytest)](https://docs.lavague.ai/en/latest/docs/lavague-qa/quick-tour/)
- [Troubleshooting Guide](https://docs.lavague.ai/en/latest/docs/get-started/troubleshoot/)

---

<a id='page-2'></a>

## Core Engines: World Model, Navigation, Action & Python Engines

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Drivers, Contexts & Integrations Ecosystem](#page-3), [Developer Tooling: LaVague QA, Test Runner, Server & Gradio](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)
- [lavague-core/lavague/core/world_model.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/world_model.py)
- [lavague-core/lavague/core/navigation.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/navigation.py)
- [lavague-core/lavague/core/action_engine.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/action_engine.py)
- [lavague-core/lavague/core/python_engine.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/python_engine.py)
- [lavague-core/lavague/core/base_engine.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/base_engine.py)
- [lavague-core/lavague/core/base_driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/base_driver.py)
- [lavague-core/lavague/core/action_template.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/action_template.py)
- [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)
- [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)
- [extension_chrome/src/actionSchemas.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/actionSchemas.ts)
- [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)
- [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)
</details>

# Core Engines: World Model, Navigation, Action & Python Engines

## Overview

LaVague's web-automation stack is built around four cooperating engines. Together they translate a high-level natural-language objective into low-level browser actions. The architecture is summarized as:

> A **World Model** that takes an objective and the current state (aka the current web page) and outputs an appropriate set of instructions. An **Action Engine** which "compiles" these instructions into action code (e.g., Selenium or Playwright) and executes them. Source: [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)

The full chain is mediated by the `WebAgent`, which orchestrates a `WorldModel`, an `ActionEngine` (hosting a `NavigationEngine` and a `PythonEngine`), and a `BaseDriver` that abstracts away the concrete browser backend (Selenium, Playwright, or Chrome extension).

```mermaid
flowchart LR
    Obj[User Objective] --> WM[WorldModel]
    State[Current Page<br/>HTML / Screenshot] --> WM
    WM -->|natural-language<br/>instructions| Nav[NavigationEngine]
    Nav -->|structured action<br/>name + args| Act[ActionEngine]
    Py[PythonEngine<br/>tool calls] --> Nav
    Act -->|driver commands| Drv[BaseDriver]
    Drv -->|Selenium / Playwright /<br/>Chrome Extension| Browser[(Browser)]
    Browser --> State
```

## World Model

The `WorldModel` is the reasoning layer. It receives two inputs:

- The user's free-form objective (e.g., "Go on the quicktour of PEFT").
- A description of the current page state, typically produced from the HTML retrieved by the driver.

It emits a plan — a set of natural-language instructions. The format of these instructions is parsed on the client side by a helper called `extractWorldModelInstruction`, which recognizes hyphenated, numbered, and code-fenced blocks, plus single-line variants. Source: [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)

By default, the World Model relies on OpenAI's `gpt-4o`, but the model is fully customizable, and the community has explored local OSS replacements such as Phi-3-Vision for fully local agents (see issue #272). A separate `MultiModalLLM` is typically used to interpret screenshots and other visual inputs.

## Navigation Engine

The `NavigationEngine` consumes the World Model's instructions and decides *what* should happen on the page: clicking an element, filling a field, navigating to a URL, etc. Today it emits executable Python that calls into the active driver.

The community has flagged this as a design weakness (issue #352): arbitrary code execution is harder to observe, audit, and replay. The proposed direction is for the `NavigationEngine` to output a **structured action** — an XPath, an action type (e.g., `click`), and the relevant arguments — rather than free-form code. Concretely, the on-the-wire format already used elsewhere in the project looks like:

```yaml
- action:
    name: "click"
    args:
      xpath: "/html/body/.../a"
      value: ""
```

Source: [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)

This shape is also the natural fit for the Chrome extension driver, which serializes available tools via Zod schemas exposing `name`, `description`, and typed `args`. Source: [extension_chrome/src/actionSchemas.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/actionSchemas.ts)

## Action Engine & Python Engine

The `ActionEngine` is the execution boundary. It receives structured actions from the `NavigationEngine` and dispatches them to the appropriate `BaseDriver` implementation. The engine reuses the same primitives (`InteractionType`, `PossibleInteractionsByXpath`) defined on `BaseDriver`. Source: [lavague-core/lavague/core/base_driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-core/lavague/core/base_driver.py)

Issue #440 highlights a related refactor goal: **separate action generation from action execution** so that generation and execution share a single observable module. This is useful for remote execution scenarios, such as the `DriverServer` in the `lavague-server` package, which bridges a remote browser to the agent through `AgentSession` over an asynchronous message channel. Source: [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)

The `PythonEngine` is a sibling engine inside the action layer. It is specialized in **calling tools** (Python-side functions, external APIs, etc.) and was carved out from the original monolithic `NavigationEngine` so that tool invocation and page navigation no longer share a code path. Source: [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)

## Orchestration, Logging & Drivers

The `WebAgent.run` / `WebAgent.run_step` methods are the public entry points. The remote session protocol maps cleanly to them:

| Channel command | Agent method            |
|-----------------|-------------------------|
| `run`           | `agent.run(args)`       |
| `run_step`      | `agent.run_step(args)`  |
| `get`           | `agent.get(args)`       |
| `prepare_run`   | `agent.prepare_run(args)` |

Source: [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)

Both `run` and `run_step` are wrapped in `exe_start_stop`, which emits `start` / `stop` messages around the call so consumers can render UI state. Logging is a recurring community ask (issue #241): users want per-step traces (objective, observations, instructions, generated code, driver URL, success/failure) persisted for offline review. Today, telemetry covers most of these fields automatically — including chain-of-thought, interaction bounding boxes, viewport, source HTML chunks, and token usage — and can be fully disabled via the `LAVAGUE_TELEMETRY=NONE` env var. Source: [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)

Engine prompts and responses can also be cached through the optional `lavague-contexts-cache` package, which wraps an `LLM`, `MultiModalLLM`, or `Embedding` with a YAML-backed cache for deterministic replay and lower token spend. Source: [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)

Driver support differs across engines: Selenium supports headless mode, iframes, multi-tab, and element highlighting; Playwright supports iframes and highlighting (headless and multi-tab coming soon); the Chrome extension driver supports multi-tab and highlighting but not iframes. Source: [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)

## See Also

- [WebAgent](web-agent.md) — top-level agent orchestrator
- [Drivers: Selenium, Playwright & Chrome Extension](drivers.md)
- [LaVague QA: Gherkin-driven test generation](lavague-qa.md)
- [Test Runner (`lavague-test`)](test-runner.md)

---

<a id='page-3'></a>

## Drivers, Contexts & Integrations Ecosystem

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Core Engines: World Model, Navigation, Action & Python Engines](#page-2), [Developer Tooling: LaVague QA, Test Runner, Server & Gradio](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)
- [lavague-tests/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)
- [extension_chrome/src/actionSchemas.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/actionSchemas.ts)
- [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)
- [extension_chrome/package.json](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/package.json)
- [lavague-qa/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)
- [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)
- [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)
- [extension_chrome/src/app/component/Logs.tsx](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/app/component/Logs.tsx)
- [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)
</details>

# Drivers, Contexts & Integrations Ecosystem

## Overview

LaVague is structured around a modular ecosystem that decouples the **reasoning layer** (World Model + Action Engine) from the **execution layer** (Drivers) and the **configuration layer** (Contexts). The core philosophy is that swapping a browser backend, a model provider, or a deployment target should not require modifying agent logic.

As described in the [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md), LaVague ships with three Driver options — **Selenium**, **Playwright**, and a **Chrome Extension** — and exposes a Context system that lets users customize LLM choices, prompts, and token counters without touching the core agent loop. Community discussion around Playwright support ([issue #1](https://github.com/lavague-ai/LaVague/issues/1)) and fully-local models ([issue #272](https://github.com/lavague-ai/LaVague/issues/272)) both reflect the importance of this plug-in architecture.

## Drivers: The Execution Layer

Drivers translate abstract navigation instructions (XPath + action name) into actual browser commands. LaVague currently supports three backends, and the feature matrix in the [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md) summarizes their capabilities:

| Feature | Selenium | Playwright | Chrome Extension |
|---|---|---|---|
| Headless agents | ✅ | ⏳ (coming soon) | N/A |
| Handle iframes | ✅ | ✅ | ❌ |
| Open several tabs | ✅ | ⏳ | ✅ |
| Highlight elements | ✅ | ✅ | ✅ |

The Chrome Extension driver uses a TypeScript-based action schema system. In [extension_chrome/src/actionSchemas.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/actionSchemas.ts), Zod schemas define tools with `name`, `description`, and typed `args`, and the schema-to-description converter emits a textual catalog consumed by the World Model. A separate extraction helper in [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts) parses multi-line, numbered, and code-fenced instruction patterns from the LLM's response, which keeps the extension's instruction parser resilient to minor prompt variations.

For server-side deployments, [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py) implements a `DriverServer` that wraps a `BaseDriver` and forwards commands such as `get_html`, `get_url`, and `get` to a remote session through a synchronous `send_command_and_get_response_sync` channel. This is the bridge that allows the LaVague server to drive a browser hosted in another process — relevant to issue #440's request to separate action generation from execution.

```mermaid
flowchart LR
    A[World Model] --> B[Action Engine]
    B --> C{Driver Backend}
    C -->|Selenium| D[Selenium WebDriver]
    C -->|Playwright| E[Playwright Runtime]
    C -->|Chrome Ext| F[Browser Extension]
    C -->|Server| G[DriverServer via Channel]
    G --> H[Remote Session]
```

## Contexts: The Configuration Layer

Contexts bundle together the model choices, prompts, and utilities that the World Model and Action Engine consume. They make the agent **swappable** — replacing GPT-4o with a local Phi-3 model, for example, is a context change rather than a code change.

The caching integration documented in [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md) introduces `LLMCache`, `MultiModalLLMCache`, and `EmbeddingCache` wrappers. These wrappers key cached responses on the prompt payload so that repeated runs over the same objective produce deterministic results, reduce API costs, and — critically for issue #272's fully-local use case — let developers replay a scenario offline once results have been captured.

```python
from lavague.contexts.cache import LLMCache
from llama_index.llms.openai import OpenAI

llm = LLMCache(yml_prompts_file="llm.yml", fallback=OpenAI(model="gpt-4o"))
```

Community interest in issue #241 (logging agent flow) ties into contexts: a logging context can be composed alongside a cache context so that the same run can be replayed and inspected without rerunning the LLM.

## Integrations: QA, Tests, Server, and Extension

LaVague's integrations live in sibling packages and share the core abstractions:

- **LaVague QA** ([lavague-qa/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)) — a CLI (`lavague-qa`) that turns Gherkin `.feature` files into pytest tests. It accepts `--url`, `--feature`, `--context`, and `--log-to-db` flags, allowing a QA engineer to point the same agent at any website using a pre-built context file.
- **LaVague Test Runner** ([lavague-tests/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)) — a benchmarking CLI that walks a directory of `config.yml` task files, runs each task, and emits a pass/fail report with exit codes (`0` for success, `-1` for any failure). Each task supports per-step `max_steps` and per-task `n_attempts` overrides.
- **LaVague Server** — built around the `AgentSession` abstraction in [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py), it wraps a `WebAgent` and dispatches incoming socket messages to handler methods (`run`, `run_step`, `get`, etc.). The `exe_start_stop` helper emits a `start` event before invoking the agent and a `stop` event after, even on exception, so remote clients can always observe terminal state.
- **Chrome Extension** — the React/TypeScript UI in [extension_chrome/package.json](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/package.json) consumes Chakra UI and bundles a logging panel ([extension_chrome/src/app/component/Logs.tsx](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/app/component/Logs.tsx)) that labels commands like `get_url`, `get_html`, `execute_script`, and `is_visible`. The panel deduplicates consecutive identical logs by incrementing a `count` field, addressing issue #241's request for traceable agent flow.

## Common Failure Modes and Configuration Tips

1. **OpenAI-only defaults** — Out of the box, examples assume `OPENAI_API_KEY` is set ([README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)). For local models (issue #272), supply a custom LLM through a Context and pair it with the `MultiModalLLMCache` wrapper for deterministic replays.
2. **Action generation/execution coupling** — The `DriverServer` already isolates execution in a separate process, so deployments that need stronger isolation should route the WebAgent through the server channel rather than the in-process driver.
3. **Playwright feature gaps** — Headless mode and multi-tab handling are flagged as "coming soon" in the README; teams needing those today should use Selenium.
4. **Chrome Extension instruction parsing** — The parser in [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts) tries eight regex variants; prompts that deviate significantly from the World Model's expected format may fall through to a short match, so keep instruction formatting consistent.
5. **Telemetry** — Set `LAVAGUE_TELEMETRY=NONE` to disable anonymous usage reporting before running any objective that might include sensitive data.

## See Also

- [Quick Tour & Installation](https://docs.lavague.ai/en/latest/docs/get-started/quick-tour/)
- [Customization & Contexts](https://docs.lavague.ai/en/latest/docs/get-started/customization/)
- [Testing & Benchmarking](https://docs.lavague.ai/en/latest/docs/get-started/testing/)
- [LaVague QA Documentation](https://docs.lavague.ai/en/latest/docs/lavague-qa/quick-tour/)
- [Troubleshooting Guide](https://docs.lavague.ai/en/latest/docs/get-started/troubleshoot/)

---

<a id='page-4'></a>

## Developer Tooling: LaVague QA, Test Runner, Server & Gradio

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Core Engines: World Model, Navigation, Action & Python Engines](#page-2), [Drivers, Contexts & Integrations Ecosystem](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/lavague-ai/LaVague/blob/main/README.md)
- [lavague-tests/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)
- [lavague-qa/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)
- [lavague-server/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/README.md)
- [lavague-server/lavague/server/driver.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)
- [lavague-server/lavague/server/channel.py](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)
- [lavague-integrations/contexts/lavague-contexts-cache/README.md](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)
- [extension_chrome/package.json](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/package.json)
- [extension_chrome/src/tools.ts](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)
</details>

# Developer Tooling: LaVague QA, Test Runner, Server & Gradio

## Overview

Beyond the core `WebAgent` execution path, the LaVague repository ships a family of developer-facing tools that target distinct stages of the agent lifecycle: writing tests, running them at scale, exposing the agent over the network, and giving humans an interactive UI. The README positions these as "Key Features" alongside the core framework: built-in contexts, a test runner, a token counter, logging tools, a Gradio interface, debugging tools, and a Chrome extension ([README.md:60-69](https://github.com/lavague-ai/LaVague/blob/main/README.md)). This page covers the four tools most relevant to engineering workflows: `lavague-qa` (Gherkin-to-pytest generation), `lavague-test` (test runner), the Agent Server (WebSocket exposure), and the interactive surfaces (Gradio and the Chrome extension). A separate caching integration is also documented because it underpins deterministic development of all of the above.

## LaVague QA: Gherkin-to-pytest Generation

`lavague-qa` is a CLI for QA engineers that turns human-readable Gherkin `.feature` files into executable pytest files. The `README.md` in `lavague-qa/` enumerates the command-line options: `--url` and `--feature` are required to point at a target site and the Gherkin spec, `--full-llm` enables LLM-driven pytest generation, `--context` allows injecting a custom initialized context and token counter (defaults to OpenAI GPT-4o), `--headless` toggles headless browser mode, and `--log-to-db` enables SQLite logging ([lavague-qa/README.md:7-19](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)). The intent, captured in the top-level README, is to "automate test writing by turning Gherkin specs into easy-to-integrate tests" and to make web testing "10x more efficient" ([README.md:23-27](https://github.com/lavague-ai/LaVague/blob/main/README.md)). A runnable example is provided:

```bash
lavague-qa --url https://amazon.fr/ --feature features/demo_amazon.feature
```

Running `lavague-qa` without arguments executes a default Wikipedia login example, which gives new users a working baseline before they write their own features ([lavague-qa/README.md:21-25](https://github.com/lavague-ai/LaVague/blob/main/lavague-qa/README.md)). Community interest in local/offline LLMs (e.g. Phi-3) expressed in [issue #272](https://github.com/lavague-ai/LaVague/issues/272) is particularly relevant here, because the default `--context` ships as OpenAI GPT-4o and swapping it for an OSS model is one of the supported extension points.

## Test Runner: `lavague-test`

The `lavague-tests` package provides a benchmark-style test runner launched via the `lavague-test` command. It scans a directory (default `./lavague-tests/sites`) for per-site folders, each containing a `config.yml` that defines tasks, expected outcomes, and step limits ([lavague-tests/README.md:13-21](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)). The CLI accepts `--directory/-d`, `--site/-s` (repeatable), and a `--display` flag to keep the browser visible during a run. Each task in `config.yml` supports a `name`, `max_steps`, `n_attempts`, an optional `user_data` map, plus a list of `expect` assertions against `URL`, `Status`, and `HTML` substrings ([lavague-tests/README.md:25-65](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)). A typical task block looks like:

```yaml
tasks:
  - name: HuggingFace navigation
    url: https://huggingface.co/docs
    prompt: Go on the quicktour of PEFT
    expect:
      - URL is https://huggingface.co/docs/peft/quicktour
      - Status is success
      - HTML contains PEFT offers parameter-efficient methods for finetuning large pretrained models
```

The runner prints a `[o]`/`[x]` report per assertion and returns exit code `0` only if every assertion passes ([lavague-tests/README.md:69-87](https://github.com/lavague-ai/LaVague/blob/main/lavague-tests/README.md)). This makes it suitable for CI gating of agent regressions, complementing the QA authoring flow.

## Agent Server

The `lavague-server` package exposes a `WebAgent` over WebSockets, allowing a browser extension or remote UI to drive the agent without re-implementing the framework. The minimal setup wires a `DriverServer` (which implements `BaseDriver` and forwards commands to a `BaseDriver` instance) into an `ActionEngine` and `WebAgent`, then starts the server ([lavague-server/README.md:8-22](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/README.md)). Internally, `AgentSession.exe_start_stop` runs the user-supplied callable while emitting `start`/`stop` JSON messages over the channel, and `handle_prompt_agent_action` dispatches `run`, `run_step`, `get`, and `prepare_run` events to the agent ([lavague-server/lavague/server/channel.py:24-46](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/channel.py)). The driver side delegates `get_html`, `get_url`, and other primitives to the remote session via `send_command_and_get_response_sync`, returning the current page state for the World Model to consume ([lavague-server/lavague/server/driver.py:60-110](https://github.com/lavague-ai/LaVague/blob/main/lavague-server/lavague/server/driver.py)). The architecture is summarised below.

```mermaid
flowchart LR
    UI[Browser / Gradio / Chrome Extension] -- WebSocket --> Server[AgentServer]
    Server --> Session[AgentSession]
    Session --> Agent[WebAgent]
    Agent --> WM[WorldModel]
    Agent --> AE[ActionEngine]
    AE --> Driver[DriverServer]
    Driver -- WebSocket --> BrowserDriver[Remote Browser]
```

This separation is exactly what community [issue #440](https://github.com/lavague-ai/LaVague/issues/440) ("Separate action execution from action generation") and [issue #352](https://github.com/lavague-ai/LaVague/issues/352) (Navigation Engine emitting XPath instead of arbitrary code) advocate: the server can mediate between generation and execution, improving observability and security.

## Interactive Surfaces: Gradio and the Chrome Extension

Two consumer-facing UIs are bundled. The Gradio demo is launched with a single call: `agent.demo("Go on the quicktour of PEFT")` ([README.md:113-121](https://github.com/lavague-ai/LaVague/blob/main/README.md)). The Chrome extension (`extension_chrome`) is a Webpack-bundled React/Chakra UI app built from `package.json` ([extension_chrome/package.json:1-50](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/package.json)). On the data plane, `tools.ts` parses the World Model's raw output and `extractWorldModelInstruction` uses a sequence of regexes to pull the canonical `Instruction:` block from formats such as hyphenated lists, numbered lists, fenced code blocks, or single-line strings ([extension_chrome/src/tools.ts:1-40](https://github.com/lavague-ai/LaVague/blob/main/extension_chrome/src/tools.ts)). The extension is also a driver target: the README's "Supported Drivers" table shows that the Chrome extension driver supports multi-tab navigation and element highlighting but does not support iframes or headless mode ([README.md:75-90](https://github.com/lavague-ai/LaVague/blob/main/README.md)). The Gradio path, by contrast, is headless-friendly when paired with Selenium.

## Caching Layer for Deterministic Tooling

The `lavague-contexts-cache` integration is a thin wrapper around `OpenAI`, `OpenAIMultiModal`, and embedding models that records prompts and responses to YAML, guaranteeing deterministic replays and reducing token spend during agent iteration ([lavague-integrations/contexts/lavague-contexts-cache/README.md:1-12](https://github.com/lavague-ai/LaVague/blob/main/lavague-integrations/contexts/lavague-contexts-cache/README.md)). It exposes `LLMCache`, `MultiModalLLMCache`, and `EmbeddingCache` wrappers that take a `yml_prompts_file` and a `fallback` model. This is invaluable when combined with the test runner: a fixed cache file plus a fixed task list produces reproducible results, which is a prerequisite for the kind of experiment logging described in [issue #241](https://github.com/lavague-ai/LaVague/issues/241).

## See Also

- [LaVague Core Architecture (World Model & Action Engine)](#)
- [Customisation & Built-in Contexts](#)
- [Token Usage & Cost Estimation](#)
- [Driver Support Matrix (Selenium, Playwright, Chrome Extension)](#)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: lavague-ai/LaVague

Summary: Found 14 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Configuration risk - Configuration risk requires verification.

## 1. Configuration risk - Configuration risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/642

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/333

## 3. Runtime risk - Runtime risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/609

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/563

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/650

## 6. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/640

## 7. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/lavague-ai/LaVague

## 8. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/641

## 9. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/lavague-ai/LaVague

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/lavague-ai/LaVague

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/lavague-ai/LaVague

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/lavague-ai/LaVague/issues/648

## 13. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/lavague-ai/LaVague

## 14. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/lavague-ai/LaVague

<!-- canonical_name: lavague-ai/LaVague; human_manual_source: deepwiki_human_wiki -->
