# https://github.com/microsoft/agent-lightning Project Manual

Generated on: 2026-05-21 09:13:59 UTC

## Table of Contents

- [Introduction to Agent Lightning](#introduction)
- [Installation Guide](#installation)
- [System Architecture](#architecture)
- [Core Abstractions and Data Models](#core_abstractions)
- [Tutorial: Train Your First Agent](#train-first-agent)
- [Tutorial: Writing Agents](#write-agents)
- [Trainer Component](#trainer)
- [Runner Component](#runner)
- [LightningStore](#store)
- [Algorithm Zoo](#algorithm-zoo)

<a id='introduction'></a>

## Introduction to Agent Lightning

### Related Pages

Related topics: [System Architecture](#architecture), [Installation Guide](#installation)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/microsoft/agent-lightning/blob/main/README.md)
- [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)
- [AGENTS.md](https://github.com/microsoft/agent-lightning/blob/main/AGENTS.md)
- [examples/minimal/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)
- [examples/claude_code/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/claude_code/README.md)
- [agentlightning/types/core.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)
- [agentlightning/runner/base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)
- [agentlightning/semconv.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/semconv.py)
- [agentlightning/cli/__init__.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/cli/__init__.py)
- [contrib/recipes/agentos/README.md](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/agentos/README.md)
- [dashboard/README.md](https://github.com/microsoft/agent-lightning/blob/main/dashboard/README.md)
- [mkdocs.yml](https://github.com/microsoft/agent-lightning/blob/main/mkdocs.yml)
</details>

# Introduction to Agent Lightning

Agent Lightning is a reinforcement learning framework designed to train any AI agent with RL algorithms. The project provides a unified execution stack, instrumentation capabilities, and training infrastructure that enables researchers and developers to improve agent behavior through reward-based learning. Source: [README.md:1](https://github.com/microsoft/agent-lightning/blob/main/README.md)

## What is Agent Lightning?

Agent Lightning bridges the gap between raw agent execution and RL-based training by providing:

- **Instrumentation Layer**: Transparent tracing and logging of agent interactions
- **Training Infrastructure**: Built-in support for RL algorithms like GRPO
- **Distributed Execution**: Multi-worker rollout management with state synchronization
- **Integration Points**: Adapters for popular agent frameworks and execution environments

The framework treats agent training as a continuous feedback loop where traces collected from agent execution are consumed by training algorithms to improve policy behavior over time. Source: [CLAUDE.md:3](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Architecture Overview

The Agent Lightning architecture follows a producer-consumer pattern centered around trace collection and consumption.

### Core Loop

```mermaid
graph TD
    A[Runner] -->|emits spans| B[Tracers]
    B -->|writes traces| C[LightningStore]
    C -->|serves traces| D[Algorithms]
    D -->|updates policy| A
    C -->|serves traces| E[Dashboard]
```

The continuous execution loop works as follows:

1. **Runners** execute agents and emit execution spans
2. **Tracers** capture and format these spans with semantic conventions
3. **LightningStore** maintains synchronized state across all components
4. **Algorithms** consume traces to compute rewards and update agent policies
5. **Dashboard** provides real-time visualization for debugging

Source: [CLAUDE.md:3](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

### Component Hierarchy

| Layer | Components | Responsibility |
|-------|-----------|----------------|
| Execution | `Runner`, `LitAgent` | Execute agent logic and manage lifecycle |
| Instrumentation | `Tracer`, `OtelTracer`, `AgentOpsTracer` | Capture execution traces |
| Storage | `LightningStore`, `LightningStoreClient` | Synchronized state management |
| Training | Algorithms in `agentlightning/algorithm/` | Process traces, compute rewards |
| CLI | `agl` command | User-facing interface |

Source: [agentlightning/cli/__init__.py:13-16](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/cli/__init__.py)

## Core Data Models

The framework defines several fundamental data structures in `agentlightning/types/core.py`.

### Task and Rollout

```mermaid
classDiagram
    class Task {
        +str task_id
        +Any input
        +Optional~str~ instance_id
        +Optional~str~ dataset
    }
    class Rollout {
        +str rollout_id
        +str status
        +Optional~str~ worker_id
        +List~Attempt~ attempts
    }
    class Attempt {
        +str attempt_id
        +str status
        +List~Span~ spans
        +Optional~float~ reward
    }
    class Triplet {
        +Any prompt
        +Any response
        +Optional reward
    }
    
    Task "1" --> "*" Rollout
    Rollout "1" --> "*" Attempt
    Attempt --> Triplet
```

### Core Type Exports

| Type | Purpose |
|------|---------|
| `Task` | Represents a unit of work to be executed by an agent |
| `Rollout` | Collection of attempts for a single task execution |
| `Attempt` | Single execution attempt with spans and reward |
| `Triplet` | Prompt-response-reward tuple for RL training |
| `LightningStore` | Synchronized state store for distributed execution |

Source: [agentlightning/types/core.py:1-60](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

## Runner System

The runner system provides the execution context for agents with integrated lifecycle management.

### Runner Lifecycle

```mermaid
sequenceDiagram
    participant User
    participant Runner
    participant Store
    participant Agent
    
    User->>Runner: async with Runner(agent, store)
    Runner->>Runner: init(agent)
    Runner->>Runner: init_worker(store)
    Runner->>Store: Register worker
    Loop Until event
        Runner->>Agent: Execute task
        Agent-->>Runner: Result
        Runner->>Store: Update state
    end
    Runner->>Runner: teardown_worker()
    Runner->>Runner: teardown()
```

### Runner Base Class

The `Runner` class provides context manager support for safe initialization and cleanup:

```python
async with runner:
    runner.init(agent=agent, hooks=hooks)
    runner.init_worker(worker_id=0, store=store)
    # Execute tasks...
```

Key runner responsibilities:
- **Initialization**: Set up agent and worker state
- **Execution**: Poll store for tasks and execute them
- **Cleanup**: Graceful teardown of worker and agent resources

Source: [agentlightning/runner/base.py:1-80](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)

## Tracing and Instrumentation

Agent Lightning provides multiple tracing backends for capturing agent execution.

### Supported Tracers

| Tracer | Use Case | Backend |
|--------|----------|---------|
| `OtelTracer` | OpenTelemetry-compatible tracing | OTLP endpoint |
| `AgentOpsTracer` | AgentOps platform integration | AgentOps service |
| Custom Tracer | Framework integration | Pluggable |

### Semantic Conventions

The framework defines semantic conventions in `agentlightning/semconv.py` for consistent span attributes:

| Attribute | Description |
|-----------|-------------|
| `LightningSpanAttributes.REWARD` | Reward values for RL spans |
| `LightningSpanAttributes.LINK` | Span linking relationships |
| `LightningSpanAttributes.TAG` | Custom span tagging |
| `LightningResourceAttributes.ROLLOUT_ID` | Rollout identification |
| `LightningResourceAttributes.ATTEMPT_ID` | Attempt identification |

Source: [agentlightning/semconv.py:1-40](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/semconv.py)

### Trace Writing Example

The minimal examples demonstrate trace writing with `LightningStore`:

```python
from agentlightning import AgentOpsTracer, LightningStoreClient, OtelTracer, Span

# Write traces directly to in-memory store
store = InMemoryLightningStore()
tracer = OtelTracer(store=store)

# Or connect to a server-side store
client = LightningStoreClient(endpoint="http://localhost:45993")
```

Source: [examples/minimal/write_traces.py:1-50](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## LightningStore

`LightningStore` is the central state management component that keeps all components synchronized.

### Store Capabilities

```mermaid
graph LR
    A[Runners] -->|enqueue/dequeue| B[Rollouts]
    A -->|register| C[Workers]
    D[Tracers] -->|write spans| B
    E[Algorithms] -->|query traces| B
    F[Dashboard] -->|inspect state| B
```

### Store Collections

| Collection | Data Type | Access Pattern |
|------------|-----------|----------------|
| `rollouts` | `Rollout` | Enqueue/dequeue by worker |
| `attempts` | `Attempt` | Link to rollout |
| `spans` | `Span` | Query by attempt |
| `workers` | `Worker` | Heartbeat management |
| `resources` | `ResourcesUpdate` | Model/prompt versioning |

Source: [dashboard/test-utils/python-server.py:1-100](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Training Algorithms

Agent Lightning integrates with reinforcement learning algorithms to improve agent behavior.

### Algorithm Integration

The framework supports pluggable algorithms defined in `agentlightning/algorithm/`. Algorithms consume traces from the LightningStore and compute policy updates.

### Agent-OS Integration

For production safety-critical deployments, Agent Lightning integrates with Agent-OS:

```python
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward

runner = AgentOSRunner(kernel, fail_on_violation=False, emit_violations=True)
reward_fn = PolicyReward(kernel)
```

This integration provides:
- **Policy enforcement**: Kernel-level safety during training
- **Violation penalties**: Unsafe actions convert to negative RL rewards
- **Audit trail**: Complete visibility from training to production

Source: [contrib/recipes/agentos/README.md:1-60](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/agentos/README.md)

## Minimal Component Showcase

The `examples/minimal/` directory provides isolated demonstrations of individual building blocks.

### Available Examples

| Component | File | Demonstrates |
|-----------|------|--------------|
| LightningStore + OTLP | `write_traces.py` | `OtelTracer`, `AgentOpsTracer`, rollout/span emission |
| MultiMetrics backend | `write_metrics.py` | Console and Prometheus metrics simultaneously |
| LLM proxying | `llm_proxy.py` | Request routing through `/rollout/<id>/attempt/<id>` |
| vLLM lifecycle | `vllm_server.py` | Server startup, readiness monitoring, teardown |

Each example is self-documenting with CLI arguments and environment variables embedded in module docstrings.

Source: [examples/minimal/README.md:1-30](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)

## Command-Line Interface

The `agl` CLI provides entry points for all major framework operations.

### Available Subcommands

| Command | Module | Description |
|---------|--------|-------------|
| `agl vllm` | `agentlightning.cli.vllm` | vLLM server with instrumentation |
| `agl store` | `agentlightning.cli.store` | LightningStore server |
| `agl prometheus` | `agentlightning.cli.prometheus` | Prometheus metrics endpoint |
| `agl agentops` | `agentlightning.cli.agentops_server` | AgentOps server manager |

### Starting a LightningStore Server

```bash
agl store --port 45993 --log-level DEBUG
```

The store server enables distributed execution where multiple workers can connect and synchronize state.

Source: [agentlightning/cli/__init__.py:1-35](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/cli/__init__.py)

## Dashboard

The Agent Lightning Dashboard is a React-based web application for inspecting store state and debugging experiments.

### Features

- **Real-time state inspection**: View rollouts, attempts, and spans
- **Worker monitoring**: Track worker status and heartbeat statistics
- **Resource visualization**: Inspect model configurations and prompts
- **Experiment debugging**: Analyze trace sequences and reward flows

### Technology Stack

| Layer | Technology |
|-------|------------|
| Framework | React |
| UI Components | Mantine UI |
| Documentation | Storybook |
| Testing | Vitest |

Source: [dashboard/README.md:1-35](https://github.com/microsoft/agent-lightning/blob/main/dashboard/README.md)

## Project Structure

```
agent-lightning/
├── agentlightning/          # Core library
│   ├── algorithm/           # RL training algorithms
│   ├── cli/                # Command-line interface
│   ├── contrib/            # Third-party integrations
│   ├── runner/             # Execution runners
│   ├── store/              # LightningStore implementations
│   ├── tracer/             # Tracing backends
│   ├── types/              # Data models
│   └── semconv.py          # Semantic conventions
├── contrib/
│   └── recipes/            # Integration examples (webshop, agentos)
├── dashboard/              # React web application
├── docs/                   # Documentation (mkdocs)
├── examples/               # Runnable workflows
├── scripts/                # Automation scripts
└── tests/                  # Test suite
```

Source: [CLAUDE.md:5-15](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Development Workflow

### Setup

```bash
uv sync --group dev
```

### Testing

```bash
# Full test suite
uv run --no-sync pytest -v

# Specific tests
uv run --no-sync pytest -v tests/path/to/test.py
uv run --no-sync pytest -v -k "test_pattern"
```

### Type Checking

```bash
uv run --no-sync pyright
```

### Pre-commit Checks

```bash
uv run --no-sync pre-commit run --all-files --show-diff-on-failure
```

### Documentation

```bash
uv run --no-sync mkdocs build --strict
```

Source: [CLAUDE.md:18-30](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Contributing

Agent Lightning welcomes contributions through a structured process:

1. **Branch naming**: `feature/<slug>`, `fix/<slug>`, `docs/<slug>`, or `chore/<slug>`
2. **Commits**: Imperative, scoped commits with issue references (e.g., `Fixes #123`)
3. **Pre-submission**: Run pre-commit hooks and relevant pytest/doc builds
4. **CLA**: Contributor License Agreement required (automatically prompted by CLA bot)

Source: [README.md:50-70](https://github.com/microsoft/agent-lightning/blob/main/README.md)

## Citation

If you use Agent Lightning in research, please cite:

```bibtex
@misc{luo2025agentlightningtrainai,
      title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
      author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
      year={2025},
      eprint={2508.03680},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.03680},
}
```

Source: [README.md:15-25](https://github.com/microsoft/agent-lightning/blob/main/README.md)

## Further Reading

- [Minimal Examples Guide](../examples/minimal/) - Hands-on with individual components
- [Claude Code Integration](../examples/claude_code/) - Example: Training with SWE-bench
- [Agent-OS Integration](../contrib/recipes/agentos/) - Safety-critical training
- [API Reference](../api/) - Detailed type and function documentation

---

<a id='installation'></a>

## Installation Guide

### Related Pages

Related topics: [Introduction to Agent Lightning](#introduction), [Tutorial: Train Your First Agent](#train-first-agent)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [docs/tutorials/installation.md](https://github.com/microsoft/agent-lightning/blob/main/docs/tutorials/installation.md)
- [pyproject.toml](https://github.com/microsoft/agent-lightning/blob/main/pyproject.toml)
- [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)
- [AGENTS.md](https://github.com/microsoft/agent-lightning/blob/main/AGENTS.md)
- [contrib/recipes/webshop/agl/requirements.txt](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/webshop/agl/requirements.txt)
- [README.md](https://github.com/microsoft/agent-lightning/blob/main/README.md)
</details>

# Installation Guide

This guide covers all supported methods for installing and configuring Agent Lightning in your environment. Agent Lightning is a reinforcement learning framework for training AI agents, with support for GPU acceleration, distributed training, and various algorithm backends.

## Prerequisites

### System Requirements

| Component | Minimum | Recommended |
|-----------|---------|-------------|
| Python | 3.10+ | 3.11 or 3.12 |
| OS | Linux (Ubuntu 20.04+), macOS | Linux with CUDA |
| RAM | 8 GB | 32 GB+ |
| GPU | Optional | NVIDIA GPU with CUDA 11.8+ |
| Disk Space | 5 GB | 20 GB+ |

Source: [contrib/recipes/webshop/agl/requirements.txt:1-3]()

### Required Tools

- **uv**: Modern Python package manager (recommended)
- **Git**: For cloning the repository
- **CUDA Toolkit** (for GPU training): Version 11.8 or later

## Installation Methods

### Method 1: Install from Source

This is the recommended approach for development and contributing.

```bash
# Clone the repository
git clone https://github.com/microsoft/agent-lightning.git
cd agent-lightning

# Install all dependencies including development tools
uv sync --group dev

# Install optional GPU dependencies
uv sync --group GPU
```

Source: [CLAUDE.md:20-22]()

### Method 2: Install with Specific Algorithm Backends

Agent Lightning supports multiple reinforcement learning algorithms through optional dependency groups:

```bash
# Install with VERL backend (recommended for GPU training)
uv sync --group VERL

# Install with APO backend
uv sync --group APO

# Install with GPU optimizations
uv sync --group GPU
```

Source: [CLAUDE.md:22]()

### Method 3: Using setup.sh for GPU Training

For GPU-accelerated training with the webshop recipe:

```bash
# From the contrib/recipes/webshop directory
./setup.sh
```

This script installs VERL extras for GPU training support. Source: [contrib/recipes/webshop/agl/requirements.txt:1-8]()

## Dependency Groups

The `pyproject.toml` defines several optional dependency groups:

| Group | Purpose | Installation Command |
|-------|---------|---------------------|
| `dev` | Development tools (pytest, pyright, pre-commit) | `uv sync --group dev` |
| `GPU` | GPU acceleration packages | `uv sync --group GPU` |
| `VERL` | VERL algorithm backend | `uv sync --group VERL` |
| `APO` | APO algorithm backend | `uv sync --group APO` |

Source: [CLAUDE.md:20-23]()

## Environment Setup

### Creating a Virtual Environment

Using `uv` (recommended):

```bash
# Create and activate a new virtual environment
uv venv
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows
```

### Verifying Installation

Run the test suite to verify your installation:

```bash
# Run all tests
uv run --no-sync pytest -v

# Run specific test
uv run --no-sync pytest -v tests/path/to/test.py

# Run tests matching a pattern
uv run --no-sync pytest -v -k "test_pattern"
```

Source: [CLAUDE.md:21]()

### Type Checking

Verify type annotations are correct:

```bash
uv run --no-sync pyright
```

Source: [CLAUDE.md:22]()

### Pre-commit Hooks

Before committing code, run pre-commit checks:

```bash
uv run --no-sync pre-commit run --all-files --show-diff-on-failure
```

Source: [CLAUDE.md:23]()

## Dashboard Installation

The Agent Lightning Dashboard is a separate React application:

```bash
cd dashboard

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build
```

Source: [dashboard/README.md:npm scripts section]()

### Dashboard npm Scripts

| Script | Purpose |
|--------|---------|
| `dev` | Start development server |
| `build` | Build production bundle |
| `preview` | Preview production build locally |
| `storybook` | Start Storybook dev server |
| `build-storybook` | Build Storybook bundle |
| `eslint` | Run ESLint |
| `stylelint` | Run Stylelint |
| `prettier` | Run Prettier |
| `typecheck` | Run TypeScript typecheck |
| `vitest` | Run vitest tests |

Source: [dashboard/README.md:npm scripts]()

## Recipe-Specific Installation

### Webshop Recipe

The webshop recipe has specific dependencies:

```bash
cd contrib/recipes/webshop/agl

# Install requirements
pip install -r requirements.txt

# For GPU training
./setup.sh
```

Required dependencies include:
- `pandas>=2.0.0` - Data manipulation
- `pyarrow>=14.0.0` - Parquet file support
- `rich>=13.0.0` - Terminal formatting
- `tqdm>=4.64.0` - Progress bars

Source: [contrib/recipes/webshop/agl/requirements.txt:1-15]()

## Development Workflow

### Branching Conventions

Create feature branches from a fresh `main`:

| Branch Type | Naming Convention |
|-------------|-------------------|
| Feature | `feature/<slug>` |
| Fix | `fix/<slug>` |
| Documentation | `docs/<slug>` |
| Maintenance | `chore/<slug>` |

Source: [CLAUDE.md:8](), [AGENTS.md:8]()

### Commit and PR Guidelines

1. Write imperative, scoped commit messages
2. Reference issues with `Fixes #123`
3. Rerun pre-commit and relevant pytest/doc builds before pushing
4. Include verification commands in PR descriptions
5. Update documentation via `mkdocs.yml` or `examples/README.md`

Source: [CLAUDE.md:9-13](), [AGENTS.md:9-13]()

## GPU Configuration

For optimal GPU training performance:

1. Install NVIDIA drivers (CUDA 11.8+)
2. Install the `GPU` dependency group
3. For VERL-based training, use `uv sync --group GPU`

GPU metrics are tracked via heartbeat statistics in worker nodes:

```python
heartbeat_stats={"queue_depth": 2, "gpu_utilization": 0.82}
```

Source: [dashboard/test-utils/python-server.py:Worker class]()

## Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| `uv` command not found | Install uv: `pip install uv` |
| CUDA not found | Ensure NVIDIA drivers and CUDA toolkit are installed |
| Import errors | Run `uv sync` to ensure all dependencies are installed |
| Type checking failures | Run `uv run --no-sync pyright` to identify issues |

Source: [CLAUDE.md:26-30]()

### Lock File Updates

When dependencies change, commit the refreshed `uv.lock`:

```bash
git add uv.lock
git commit -m "chore: update lock file"
```

Source: [CLAUDE.md:24]()

## Next Steps

After installation:

1. Explore [Minimal Component Showcase](examples/minimal/README.md) to understand individual components
2. Set up the [LightningStore](agentlightning/store/) for trace storage
3. Configure [tracers](agentlightning/tracer/) for your agent execution
4. Review the [Algorithm Documentation](docs/tutorials/) for training options

---

<a id='architecture'></a>

## System Architecture

### Related Pages

Related topics: [Trainer Component](#trainer), [Runner Component](#runner), [LightningStore](#store)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [dashboard/src/layouts/AppLayout.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/layouts/AppLayout.tsx)
- [dashboard/src/pages/Rollouts.page.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/pages/Rollouts.page.tsx)
- [dashboard/src/pages/Resources.page.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/pages/Resources.page.tsx)
- [dashboard/src/pages/Workers.page.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/pages/Workers.page.tsx)
- [dashboard/src/components/TracesTable.component.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/TracesTable.component.tsx)
- [dashboard/src/components/WorkersTable.component.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/WorkersTable.component.tsx)
- [dashboard/src/components/AppDrawer.component.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/AppDrawer.component.tsx)
- [dashboard/test-utils/python-server.py](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)
</details>

# System Architecture

## Overview

Agent Lightning is a reinforcement learning framework for training AI agents, with a distributed system architecture that supports multi-worker training orchestration, resource management, and distributed tracing. The system consists of three primary layers: a **Backend Training Engine**, a **State Store**, and a **Dashboard Frontend**.

The architecture enables parallel training across multiple workers, centralized resource configuration, and real-time monitoring of training workflows through traces and metrics.

Source: [dashboard/src/layouts/AppLayout.tsx:1-50](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/layouts/AppLayout.tsx)

## High-Level Architecture Components

The Agent Lightning system comprises the following core entities:

| Component | Description | Key Attributes |
|-----------|-------------|----------------|
| **Resources** | Configuration templates for prompts, models, and sampling parameters | `resources_id`, `version`, `resources` (dict with PromptTemplate/LLM) |
| **Workers** | Runner processes that execute training rollouts | `worker_id`, `status`, `heartbeat_stats`, `current_rollout_id` |
| **Rollouts** | Complete training episodes with multiple attempts | `rollout_id`, `status`, `mode`, `attempts` |
| **Attempts** | Individual training attempts within a rollout | `attempt_id`, `status`, `metrics` |
| **Spans** | Distributed tracing spans for observability | `trace_id`, `span_id`, `status`, `attributes`, `start_time`, `end_time` |

Source: [dashboard/test-utils/python-server.py:1-300](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Frontend Dashboard Architecture

The dashboard is a React-based frontend built with **Mantine UI** components that communicates with the backend via REST APIs.

### Navigation Structure

The application uses a sidebar navigation layout with the following sections:

```mermaid
graph TD
    A[AppLayout] --> B[Navbar]
    A --> C[Main Content Area]
    B --> D[Rollouts]
    B --> E[Workers]
    B --> F[Resources]
    B --> G[Traces]
    B --> H[Settings]
    C --> I[Outlet Component]
```

Source: [dashboard/src/layouts/AppLayout.tsx:20-50](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/layouts/AppLayout.tsx)

### Page Components

| Page | File Path | Purpose |
|------|-----------|---------|
| Rollouts | `dashboard/src/pages/Rollouts.page.tsx` | Display and manage training rollouts with status filtering |
| Workers | `dashboard/src/pages/Workers.page.tsx` | Monitor worker health and current assignments |
| Resources | `dashboard/src/pages/Resources.page.tsx` | View and manage configuration resources |
| Traces | `dashboard/src/components/TracesTable.component.tsx` | Analyze distributed tracing spans |

Source: [dashboard/src/pages/Rollouts.page.tsx:1-80](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/pages/Rollouts.page.tsx)

## Data Flow Architecture

### Worker Heartbeat Flow

Workers periodically send heartbeat signals to indicate their operational state. The dashboard monitors these heartbeats to determine worker availability.

```mermaid
sequenceDiagram
    participant W as Worker
    participant S as Store
    participant D as Dashboard
    
    W->>S: Heartbeat (status, queue_depth, gpu_utilization)
    S->>S: Update last_heartbeat_time
    D->>S: Poll /workers endpoint
    S-->>D: Worker list with status
```

Source: [dashboard/test-utils/python-server.py:100-150](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

### Rollout Execution Flow

Training rollouts follow a multi-attempt execution model:

```mermaid
graph LR
    A[Rollout Created] --> B[Attempt 1]
    B --> C{Success?}
    C -->|Yes| D[Rollout Complete]
    C -->|No| E[Attempt 2]
    E --> F{Success?}
    F -->|Yes| D
    F -->|No| G[Attempt N]
    G --> H[Max Attempts Reached]
```

Source: [dashboard/src/components/TracesTable.component.tsx:50-150](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/TracesTable.component.tsx)

## Core Entity Schemas

### Resources Entity

Resources define reusable configuration templates used by workers during training.

| Field | Type | Description |
|-------|------|-------------|
| `resources_id` | string | Unique identifier for the resource |
| `version` | integer | Version number for tracking changes |
| `create_time` | timestamp | Creation timestamp |
| `update_time` | timestamp | Last modification timestamp |
| `resources` | dict | Configuration dictionary (PromptTemplate, LLM configs) |

Source: [dashboard/test-utils/python-server.py:50-100](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

### Workers Entity

| Field | Type | Description |
|-------|------|-------------|
| `worker_id` | string | Unique worker identifier |
| `status` | enum | Current status: `idle`, `busy`, `offline` |
| `heartbeat_stats` | dict | Metrics including `queue_depth`, `gpu_utilization` |
| `last_heartbeat_time` | timestamp | Time of last heartbeat |
| `current_rollout_id` | string | Currently assigned rollout (if busy) |
| `current_attempt_id` | string | Currently executing attempt |

Source: [dashboard/src/components/AppDrawer.component.tsx:1-60](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/AppDrawer.component.tsx)

### Spans Entity (Distributed Tracing)

| Field | Type | Description |
|-------|------|-------------|
| `rollout_id` | string | Associated rollout |
| `attempt_id` | string | Associated attempt |
| `trace_id` | string | Distributed trace identifier |
| `span_id` | string | Unique span identifier |
| `parent_id` | string | Parent span ID for hierarchy |
| `name` | string | Operation name (e.g., `classification_pipeline`) |
| `status` | TraceStatus | Status with `status_code` (OK, ERROR) and description |
| `attributes` | dict | Key-value metadata (model, batch_size, accuracy) |
| `start_time` | timestamp | Span start time |
| `end_time` | timestamp | Span end time |

Source: [dashboard/src/components/TracesTable.component.tsx:50-120](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/TracesTable.component.tsx)

## Component Architecture (Frontend)

### Table Components Pattern

The dashboard uses a consistent table component pattern across all pages:

```mermaid
graph TD
    A[Page Component] --> B[Table Component]
    B --> C[Column Definitions]
    B --> D[Filtering Logic]
    B --> E[Pagination Controls]
    A --> F[useQuery Hook]
    F --> G[API Endpoints]
```

| Component | Props | Purpose |
|-----------|-------|---------|
| `RolloutTable` | `rollouts`, `totalRecords`, `statusFilters`, `onViewTraces` | Training rollout display |
| `WorkersTable` | `workers`, `onShowDetails` | Worker monitoring |
| `ResourcesTable` | `resourcesList`, `renderRowExpansion` | Resource configuration |
| `TracesTable` | `spans`, `onShowSpanDetail` | Trace analysis |

Source: [dashboard/src/components/WorkersTable.component.tsx:1-80](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/WorkersTable.component.tsx)

### Drawer Container Pattern

The application uses an `AppDrawerContainer` for displaying detailed information:

```mermaid
graph TD
    A[AppDrawerContainer] --> B[Redux State]
    B --> C{Content Type}
    C -->|worker-detail| D[WorkerDrawerTitle]
    C -->|rollout-detail| E[RolloutDrawer]
    C -->|span-detail| F[SpanDetailDrawer]
    D --> G[ConnectionIndicator]
    G --> H[baseUrl, status, isRefreshing]
```

Source: [dashboard/src/components/AppDrawer.component.tsx:60-120](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/AppDrawer.component.tsx)

## State Management

The frontend uses Redux for state management with the following key selectors:

| Selector | Purpose |
|----------|---------|
| `selectConfig` | Application configuration (baseUrl, autoRefreshMs) |
| `selectDrawerIsOpen` | Drawer visibility state |
| `selectDrawerContent` | Current drawer content type and data |
| `selectConnectionState` | Backend connection status |

Source: [dashboard/src/layouts/AppLayout.tsx:50-80](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/layouts/AppLayout.tsx)

## Connection Management

The dashboard includes a `ConnectionIndicator` component that displays the connection status to the backend:

| Status | Description |
|--------|-------------|
| `connected` | Successfully connected to backend |
| `disconnected` | Cannot reach backend |
| `refreshing` | Actively reconnecting |

Source: [dashboard/src/layouts/AppLayout.tsx:40-45](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/layouts/AppLayout.tsx)

## Training Workflow Integration

### Status Lifecycle

Rollouts and attempts follow a defined status lifecycle:

| Status | Description |
|--------|-------------|
| `pending` | Initial state, not yet started |
| `running` | Currently executing |
| `succeeded` | Completed successfully |
| `failed` | Execution failed |
| `cancelled` | Manually cancelled |

### Mode Types

| Mode | Description |
|------|-------------|
| `train` | Training mode with gradient updates |
| `eval` | Evaluation mode without updates |
| `inference` | Production inference mode |

Source: [dashboard/src/pages/Rollouts.page.tsx:30-60](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/pages/Rollouts.page.tsx)

## Observability Architecture

### Trace Hierarchy

Traces are organized in a hierarchical structure:

```
Trace
└── Span (root)
    ├── Span (child - preprocess)
    ├── Span (child - classifier)
    └── Span (child - formatter)
```

Each span captures:
- Execution timing (`start_time`, `end_time`, `duration`)
- Status and error information
- Custom attributes (model, batch_size, accuracy)
- Resource metadata (service name)

Source: [dashboard/test-utils/python-server.py:200-300](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

### Attribute Keys

Common span attributes include:

| Attribute | Example Value | Description |
|-----------|---------------|-------------|
| `type` | `classification` | Operation type |
| `model` | `bert-classifier` | Model used |
| `batch_size` | `10` | Processing batch size |
| `accuracy` | `0.95` | Achieved accuracy |
| `timeout` | `true` | Whether operation timed out |
| `retry` | `true` | Whether this was a retry attempt |

Source: [dashboard/src/components/TracesTable.component.tsx:30-50](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/TracesTable.component.tsx)

## Resource Configuration Templates

Resources support multiple template engines:

| Engine | Syntax | Example |
|--------|--------|---------|
| `f-string` | `{variable}` | `"Classify: {ticket}"` |
| `jinja` | `{{ variable }}` or `{% for %}` | `"{% for r in results %}{{ r }}{% endfor %}"` |

Source: [dashboard/test-utils/python-server.py:50-90](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Summary

The Agent Lightning system architecture provides:

1. **Distributed Training** - Multiple workers executing rollouts in parallel
2. **Centralized Configuration** - Versioned resource templates for prompts and models
3. **Real-time Monitoring** - Worker heartbeat tracking and status dashboards
4. **Full Observability** - Distributed tracing with hierarchical spans
5. **State Persistence** - Store-based architecture for maintaining system state

The architecture is designed for horizontal scalability, allowing additional workers to be added to increase training throughput while maintaining centralized configuration management and monitoring through the dashboard frontend.

---

<a id='core_abstractions'></a>

## Core Abstractions and Data Models

### Related Pages

Related topics: [System Architecture](#architecture), [Trainer Component](#trainer)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [agentlightning/types/core.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)
- [agentlightning/types/tracer.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)
- [agentlightning/types/resources.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/resources.py)
- [agentlightning/emitter/annotation.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/emitter/annotation.py)
- [agentlightning/tracer/weave.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/weave.py)
- [dashboard/test-utils/python-server.py](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)
</details>

# Core Abstractions and Data Models

The Agent Lightning framework relies on a set of foundational abstractions and data models that enable the coordination between runners, tracers, the LightningStore, and training algorithms. These core types are defined in `agentlightning/types/` and serve as the canonical data structures used throughout the system for representing tasks, rollouts, attempts, traces, and resources.

## Architecture Overview

Agent Lightning operates through a continuous execution loop where multiple components interact. The core abstractions facilitate:

1. **Trace Emission** - Runners and tracers emit spans during execution
2. **State Synchronization** - `LightningStore` maintains synchronized state
3. **Algorithm Consumption** - Training algorithms in `agentlightning/algorithm/` consume traces to improve agent behavior

```mermaid
graph TD
    A[Runners] -->|emit spans| B[Tracers]
    B --> C[LightningStore]
    C --> D[Algorithms]
    D -->|improve behavior| A
    C --> E[Dashboard]
    F[Resources] -->|configure| A
```

Source: [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Task and Rollout Models

### Task Representation

The `Task` and related classes define the fundamental unit of work in Agent Lightning. Tasks represent the objectives that agents attempt to accomplish during training and evaluation.

| Class | Purpose |
|-------|---------|
| `Task` | Core task definition containing input and configuration |
| `TaskInput` | Input data passed to a task |
| `TaskIfAny` | Conditional task input supporting optional parameters |
| `Dataset` | Collection of tasks for batch processing |

Source: [agentlightning/types/core.py:1-50](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

### Rollout Lifecycle

Rollouts represent complete execution attempts of a task. The rollout model captures the entire lifecycle from enqueue to completion.

```mermaid
stateDiagram-v2
    [*] --> Enqueued: EnqueueRolloutRequest
    Enqueued --> InProgress: Runner picks up
    InProgress --> Attempted: First attempt completes
    Attempted --> InProgress: Retry triggered
    InProgress --> [*]: Final attempt
    Attempted --> [*]: Success/Failure
```

| Class | Description |
|-------|-------------|
| `Rollout` | Represents a single task execution instance |
| `RolloutConfig` | Configuration for rollout execution |
| `RolloutMode` | Execution mode (training, evaluation, etc.) |
| `RolloutStatus` | Current state of the rollout |

Source: [agentlightning/types/core.py:50-100](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

### Attempt Model

Attempts represent individual tries within a rollout, enabling retry mechanisms and granular progress tracking.

| Property | Type | Description |
|----------|------|-------------|
| `attempt_id` | `str` | Unique identifier for the attempt |
| `rollout_id` | `str` | Parent rollout identifier |
| `status` | `AttemptStatus` | Current attempt status |
| `sequence_id` | `int` | Order within the rollout |

Source: [agentlightning/types/core.py:100-150](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

### AttemptedRollout

The `AttemptedRollout` class aggregates results from all attempts within a rollout:

```python
class AttemptedRollout(BaseModel):
    rollout: Rollout
    attempts: List[Attempt]
    # Aggregated metrics and results
```

Source: [agentlightning/types/core.py:150-180](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

## Tracing Abstractions

### OpenTelemetry Integration

Agent Lightning uses OpenTelemetry for distributed tracing. The tracer types provide serialization and interoperability with the broader observability ecosystem.

| Class | Purpose |
|-------|---------|
| `Span` | Single unit of work in a trace |
| `SpanCoreFields` | Core fields shared across span implementations |
| `OtelResource` | Serializable OpenTelemetry resource representation |
| `TraceStatus` | Span completion status with error information |

Source: [agentlightning/types/tracer.py:1-80](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)

### Span Structure

Spans form the atomic tracing unit, capturing timing, status, attributes, and relationships:

```mermaid
graph LR
    subgraph Span
        A[name] --> B[status]
        B --> C[attributes]
        C --> D[start_time/end_time]
        D --> E[parent_id/span_id]
        E --> F[resource]
    end
```

| Attribute | Description |
|-----------|-------------|
| `name` | Human-readable span identifier |
| `status` | `TraceStatus` with status_code and optional description |
| `attributes` | Key-value metadata dictionary |
| `parent_id` | Reference to parent span (None for root) |
| `resource` | `OtelResource` containing service metadata |

Source: [agentlightning/types/tracer.py:80-120](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)

### OtelResource Model

The `OtelResource` class provides a serializable representation of OpenTelemetry resources:

```python
class OtelResource(BaseModel):
    attributes: Attributes
    schema_url: str
```

This model avoids confusion with the application's `Resource` class and enables span serialization for store persistence.

Source: [agentlightning/types/tracer.py:120-150](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)

### Span Creation Patterns

#### SpanCoreFields for Lightweight Creation

For span creators that don't require the full span model, `SpanCoreFields` provides a minimal interface:

```python
class SpanCoreFields(BaseModel):
    name: str
    status: TraceStatus
    attributes: Attributes
    start_time: Optional[float]
    end_time: Optional[float]
```

Source: [agentlightning/types/tracer.py:150-180](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)

#### Weave Tracer Span Creation

The Weave tracer implementation demonstrates proper span construction with resource attributes:

```python
resource=OtelResource(
    attributes={
        LightningResourceAttributes.ROLLOUT_ID.value: rollout_id,
        LightningResourceAttributes.ATTEMPT_ID.value: attempt_id,
        LightningResourceAttributes.SPAN_SEQUENCE_ID.value: sequence_id,
        LightningResourceAttributes.TRACER_NAME.value: "weave",
    },
    schema_url="",
)
```

Source: [agentlightning/tracer/weave.py:1-50](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/tracer/weave.py)

## Resource Management

### ResourcesUpdate Model

Resources define configurable components that can be versioned and updated:

```python
class ResourcesUpdate(BaseModel):
    resources_id: str
    version: int
    create_time: float
    update_time: float
    resources: Dict[str, Any]
```

| Field | Type | Description |
|-------|------|-------------|
| `resources_id` | `str` | Unique identifier for the resource set |
| `version` | `int` | Version number for optimistic concurrency |
| `create_time` | `float` | Unix timestamp of creation |
| `update_time` | `float` | Unix timestamp of last update |
| `resources` | `Dict[str, Any]` | Arbitrary resource configuration |

Source: [dashboard/test-utils/python-server.py:1-80](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

### Resource Types

Resources support flexible configuration through templates and model definitions:

| Resource Type | Description |
|---------------|-------------|
| `PromptTemplate` | Templated prompts with jinja2 or f-string engines |
| `LLM` | Language model configuration with endpoint and sampling parameters |
| Custom `Dict[str, Any]` | Arbitrary configuration dictionaries |

Source: [dashboard/test-utils/python-server.py:80-150](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Worker Abstraction

Workers represent execution agents that process rollouts:

```mermaid
classDiagram
    class Worker {
        +worker_id: str
        +status: WorkerStatus
        +heartbeat_stats: Dict
        +last_heartbeat_time: float
        +current_rollout_id: Optional[str]
        +current_attempt_id: Optional[str]
    }
```

| Property | Type | Description |
|----------|------|-------------|
| `worker_id` | `str` | Unique worker identifier |
| `status` | `WorkerStatus` | Current status (busy, idle, etc.) |
| `heartbeat_stats` | `Dict` | Runtime metrics (queue_depth, gpu_utilization) |
| `last_heartbeat_time` | `float` | Last check-in timestamp |
| `current_rollout_id` | `Optional[str]` | Currently executing rollout |

Source: [agentlightning/types/core.py:180-220](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

### Worker Status States

```mermaid
stateDiagram-v2
    [*] --> Idle: Startup
    Idle --> Busy: Dequeue rollout
    Busy --> Idle: Complete
    Busy --> Busy: Heartbeat
    Idle --> [*]: Shutdown
    Busy --> [*]: Shutdown
```

Source: [dashboard/test-utils/python-server.py:150-200](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Filtering and Pagination

### Query Models

The store supports filtered and paginated queries for efficient data access:

| Class | Purpose |
|-------|---------|
| `FilterOptions` | Criteria for filtering results |
| `FilterField` | Individual filter condition |
| `SortOptions` | Sorting configuration |
| `PaginatedResult` | Paginated response wrapper |

Source: [agentlightning/types/core.py:220-260](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

## Operation Context

The `@operation` decorator provides a simplified span creation interface for user code:

```python
@operation(name="my_operation")
async def my_function():
    # Automatically creates and manages a span
    pass
```

### OperationContext Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `propagate` | `bool` | Whether spans should use active span processor |
| `name` | `Optional[str]` | Alias populating `OPERATION_NAME` attribute |

The decorator supports two usage patterns:
1. As a bare decorator: `@operation`
2. As a context manager factory: `with operation(name="custom"):`

Source: [agentlightning/emitter/annotation.py:1-60](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/emitter/annotation.py)

## Data Flow Summary

```mermaid
graph TD
    subgraph Input
        A[Dataset] --> B[Task]
        B --> C[EnqueueRolloutRequest]
    end
    
    subgraph Execution
        C --> D[Runner]
        D --> E[Worker]
        E --> F[Attempt]
        F --> G[Span]
    end
    
    subgraph Storage
        G --> H[LightningStore]
        H --> I[PaginatedResult]
    end
    
    subgraph Training
        H --> J[Algorithm]
        J --> K[Improved Policy]
    end
```

## Key Type Exports

The `agentlightning/types/core.py` module exports the following public API:

```python
__all__ = [
    "Triplet",
    "RolloutLegacy",
    "Task",
    "TaskInput",
    "TaskIfAny",
    "RolloutRawResultLegacy",
    "RolloutRawResult",
    "RolloutMode",
    "GenericResponse",
    "ParallelWorkerBase",
    "Dataset",
    "AttemptStatus",
    "RolloutStatus",
    "RolloutConfig",
    "Rollout",
    "Attempt",
    "AttemptedRollout",
    "EnqueueRolloutRequest",
    "Hook",
    "Worker",
    "WorkerStatus",
    "PaginatedResult",
    "FilterOptions",
    "SortOptions",
    "FilterField",
]
```

Source: [agentlightning/types/core.py:40-60](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

## Usage Patterns

### Creating a Rollout Request

```python
request = EnqueueRolloutRequest(
    task_id="task-001",
    config=RolloutConfig(mode=RolloutMode.TRAINING),
    priority=1
)
```

### Querying with Filters

```python
filters = FilterOptions(
    fields=[FilterField(name="status", operator="eq", value="completed")],
    sort=SortOptions(field="create_time", direction="desc"),
    offset=0,
    limit=50
)
```

Source: [agentlightning/types/core.py:260-300](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

---

<a id='train-first-agent'></a>

## Tutorial: Train Your First Agent

### Related Pages

Related topics: [Tutorial: Writing Agents](#write-agents), [Algorithm Zoo](#algorithm-zoo)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [examples/apo/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md)
- [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)
- [examples/apo/room_selector.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/room_selector.py)
- [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)
- [agentlightning/types/tracer.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/tracer.py)
- [examples/minimal/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)

**Note:** The file `docs/how-to/train-first-agent.md` referenced in the specification was not present in the retrieved context. The tutorial content below is synthesized from available APO example documentation and source code.

</details>

# Tutorial: Train Your First Agent

## Overview

This tutorial guides you through training your first AI agent using Agent Lightning's reinforcement learning framework. You will learn how to set up a training pipeline, define prompts and resources, create a dataset, and run the APO (Agent Prompt Optimization) algorithm to improve your agent's behavior through feedback-driven learning.

Agent Lightning provides a complete training loop where runners and tracers emit spans, `LightningStore` keeps them synchronized, and algorithms consume those traces to improve behavior. Source: [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Prerequisites

Before starting this tutorial, ensure you have:

- Python 3.10+ installed
- Agent Lightning installed following the [installation guide](../../docs/tutorials/installation.md)
- An OpenAI-compatible API service available
- APO extra dependencies installed

## Architecture Overview

Agent Lightning trains agents through a continuous feedback loop:

```mermaid
graph TD
    A[Runner - Executes Agent] --> B[Tracer - Emits Spans]
    B --> C[LightningStore - Synchronizes Data]
    C --> D[Algorithm - Consumes Traces]
    D --> E[Improved Agent Behavior]
    E --> A
    
    F[Dataset - Training Data] --> D
    G[Resources - Prompts/Models] --> A
```

Source: [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Step 1: Create Your Agent

Begin by defining a simple room booking agent that uses function calling. The agent receives a user request and selects an appropriate room from available options.

```python
# examples/apo/room_selector.py

from agentlightning import Runner, DataProto
from typing import Any
import json

class RoomSelector(Runner):
    """Room booking agent using function calling."""

    def run(self, task: str, context: dict | None = None) -> DataProto:
        # Define available rooms
        rooms = [
            {"id": "R001", "name": "Conference A", "capacity": 10},
            {"id": "R002", "name": "Meeting Room B", "capacity": 4},
            {"id": "R003", "name": "Board Room", "capacity": 20},
        ]
        
        # Mock LLM response selecting a room
        selected_room = rooms[1]  # Default to Meeting Room B
        
        return DataProto(
            data={
                "selected_room": selected_room["name"],
                "room_id": selected_room["id"],
            },
            raw_response=json.dumps(selected_room),
        )
```

Source: [examples/apo/room_selector.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/room_selector.py)

### Supported Agent Components

| Component | Description | Usage |
|-----------|-------------|-------|
| `Runner` | Base class for agent execution | Extend to define custom agent logic |
| `Trainer` | Training orchestration | Manages training loop and workers |
| `LightningStore` | Data synchronization | Stores traces and spans |
| `OtelTracer` | OpenTelemetry span emission | Records execution traces |

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

## Step 2: Prepare Your Dataset

Create a training dataset with room booking scenarios. Each task should include the user request and expected room selection.

```python
# examples/apo/room_selector_apo.py

from datasets import load_dataset

def create_room_dataset():
    """Create dataset for room booking tasks."""
    
    # Example tasks for room booking
    tasks = [
        {
            "task": "I need to schedule a meeting for 3 people tomorrow at 2 PM",
            "expected_room": "Meeting Room B",
        },
        {
            "task": "We are hosting a team event for 15 team members",
            "expected_room": "Board Room",
        },
        {
            "task": "Quick 1-on-1 sync needed this afternoon",
            "expected_room": "Meeting Room B",
        },
    ]
    
    return tasks
```

Source: [examples/apo/room_selector_apo.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/room_selector_apo.py)

## Step 3: Define Training Resources

Resources define the prompts and model configurations used by your agent during training. You can tune any resource—typically prompt templates—using reinforcement learning.

```python
from agentlightning.prompts import PromptTemplate
from agentlightning.models import LLM

# Define a tunable prompt template
main_prompt = PromptTemplate(
    template="""You are a helpful assistant that helps users book meeting rooms.
    
    Available rooms:
    - Conference A: capacity 10
    - Meeting Room B: capacity 4
    - Board Room: capacity 20
    
    User request: {user_request}
    
    Select the most appropriate room and explain your choice.""",
    engine="f-string",
)
```

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

### Resource Types

| Type | Description | Tunable |
|------|-------------|---------|
| `PromptTemplate` | Text templates with variable substitution | Yes |
| `LLM` | Model configuration (endpoint, sampling params) | No |
| `SystemPrompt` | System-level instructions | Yes |
| `SamplingParameters` | Temperature, top_p, max_tokens | No |

Source: [examples/apo/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md)

## Step 4: Configure the Trainer

The `Trainer` class orchestrates the training loop. It manages workers, coordinates with the LightningStore, and applies the optimization algorithm.

```python
from agentlightning import Trainer

# Initialize trainer with one worker
trainer = Trainer(
    n_workers=1,
    # Resources to tune - only these will be optimized
    initial_resources={
        "main_prompt": main_prompt,
    },
)

# Configure the APO algorithm
trainer.configure(
    algorithm="APO",
    lr=1e-3,
    epochs=10,
)
```

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

### Trainer Configuration Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `n_workers` | int | 1 | Number of parallel training workers |
| `initial_resources` | dict | Required | Resources to optimize |
| `algorithm` | str | Required | Optimization algorithm name |
| `lr` | float | 1e-3 | Learning rate |
| `epochs` | int | 10 | Number of training epochs |

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

## Step 5: Implement Reward Function

The reward function evaluates agent outputs and provides feedback signals for reinforcement learning.

```python
from typing import Any

def room_booking_reward(output: Any, expected: dict) -> float:
    """
    Calculate reward based on room selection accuracy.
    
    Args:
        output: Agent's room selection
        expected: Expected room from dataset
    
    Returns:
        float: Reward score between 0.0 and 1.0
    """
    if not output or not output.data:
        return 0.0
    
    selected_room = output.data.get("selected_room", "")
    expected_room = expected.get("expected_room", "")
    
    # Exact match gets full reward
    if selected_room == expected_room:
        return 1.0
    
    # Partial match gets partial reward
    if expected_room.lower() in selected_room.lower():
        return 0.5
    
    return 0.0
```

Source: [examples/apo/room_selector_apo.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/room_selector_apo.py)

## Step 6: Run the Training Loop

Execute the training with your runner, dataset, and reward function.

```python
import asyncio
from agentlightning import setup_logging

async def train_room_selector():
    setup_logging()
    
    # Initialize agent and trainer
    agent = RoomSelector()
    dataset = create_room_dataset()
    
    trainer = Trainer(
        n_workers=1,
        initial_resources={"main_prompt": main_prompt},
    )
    
    # Run training
    results = await trainer.train(
        runner=agent,
        dataset=dataset,
        reward_fn=room_booking_reward,
        max_iterations=100,
    )
    
    print(f"Training completed: {results}")

if __name__ == "__main__":
    asyncio.run(train_room_selector())
```

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

## Understanding the Training Flow

```mermaid
sequenceDiagram
    participant User as User Code
    participant Trainer as Trainer
    participant Runner as RoomSelector
    participant Store as LightningStore
    participant Algo as APO Algorithm
    
    User->>Trainer: train(runner, dataset, reward_fn)
    Trainer->>Runner: execute_task(task)
    Runner->>Runner: select_room()
    Runner-->>Trainer: output
    Trainer->>Store: record_span(rollout_id, attempt_id)
    Trainer->>Trainer: calculate_reward(output, expected)
    Trainer->>Algo: optimize_step(rewards, traces)
    Algo-->>Trainer: updated_resources
    Trainer->>Runner: update_resources()
    Note over Trainer,Algo: Repeat for max_iterations
```

## Debugging Your Training

Agent Lightning provides multiple debugging approaches:

### Approach 1: Runner Mode

Direct execution without training to verify agent logic:

```bash
python apo_debug.py --mode runner
```

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

### Approach 2: Hook Mode

Debug with tracing hooks enabled:

```bash
python apo_debug.py --mode hook
```

### Approach 3: Trainer Mode

Full training debug with detailed logging:

```bash
python apo_debug.py --mode trainer
```

## Viewing Training Traces

During and after training, spans are recorded to the LightningStore. View them in the dashboard:

```mermaid
graph LR
    A[Training Run] --> B[Spans Emitted]
    B --> C[LightningStore]
    C --> D[Dashboard]
    D --> E[Trace Visualization]
    D --> F[Span Details]
```

The dashboard displays:

| View | Description |
|------|-------------|
| Rollouts | Complete training iterations |
| Spans | Individual function calls and operations |
| Resources | Tunable prompt templates |
| Metrics | Reward scores and training statistics |

Source: [examples/minimal/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)

## Common Issues and Solutions

### Issue: Tracer Conflicts

Running multiple modes consecutively in one process may cause tracer conflicts.

**Solution:** Run each mode in a separate process or ensure proper tracer cleanup between runs.

Source: [examples/apo/apo_debug.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_debug.py)

### Issue: Missing Dependencies

APO requires additional dependencies not in the core installation.

**Solution:** Install with extras:
```bash
pip install agentlightning[apo]
```

Source: [examples/apo/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md)

## Next Steps

After completing this tutorial:

1. **Advanced Algorithms**: Explore custom algorithms in `apo_custom_algorithm.py`
2. **Integration**: Learn Agent-OS integration for policy-aware training
3. **Dashboard**: Use the dashboard to visualize training progress
4. **Production**: Scale training with multiple workers and distributed execution

## Summary

This tutorial covered the essential steps to train your first agent with Agent Lightning:

- Define a `Runner` implementing your agent logic
- Prepare a dataset with tasks and expected outputs
- Configure `PromptTemplate` resources for tuning
- Implement a reward function for RL feedback
- Use `Trainer` to orchestrate the training loop
- Debug with multiple modes and visualize traces in the dashboard

The training loop continuously improves your agent by optimizing prompt resources based on reward signals, enabling agents to learn from feedback without manual prompt engineering.

---

<a id='write-agents'></a>

## Tutorial: Writing Agents

### Related Pages

Related topics: [Tutorial: Train Your First Agent](#train-first-agent), [Runner Component](#runner)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)
- [AGENTS.md](https://github.com/microsoft/agent-lightning/blob/main/AGENTS.md)
- [examples/minimal/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)
- [examples/minimal/write_traces.py](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)
- [dashboard/test-utils/python-server.py](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)
- [dashboard/src/components/AppDrawer.component.tsx](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/AppDrawer.component.tsx)
- [agentlightning/store/](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/) (store module referenced via CLAUDE.md)
</details>

# Tutorial: Writing Agents

This tutorial provides a comprehensive guide to building AI agents using the Agent Lightning framework. It covers the core concepts, architecture, and practical implementation patterns for creating agents that can be trained with reinforcement learning.

## Overview

Agent Lightning is a framework designed to train AI agents using reinforcement learning. The framework provides a complete execution stack including tracing, storage, and algorithm components that work together in a continuous loop. Source: [CLAUDE.md:1-5](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

Agents in this framework are built using the `LightningStore` architecture, which synchronizes data between runners, tracers, and algorithms. The tracers emit spans that capture the agent's execution behavior, and these spans are consumed by algorithms to improve the agent's performance over time. Source: [AGENTS.md:1-5](https://github.com/microsoft/agent-lightning/blob/main/AGENTS.md)

## Architecture Overview

The Agent Lightning framework follows a continuous loop architecture where multiple components interact to enable training of AI agents.

```mermaid
graph TD
    A[Agent / Runner] -->|Emits Spans| B[Tracer]
    B -->|Traces| C[LightningStore]
    C -->|Synchronized Data| D[Algorithms]
    D -->|Training Signals| A
    E[Dashboard] -->|Inspect & Debug| C
```

### Core Components

| Component | Purpose | Location |
|-----------|---------|----------|
| `LightningStore` | Central data store for traces and rollouts | `agentlightning/store/` |
| `OtelTracer` | OpenTelemetry-based span emission | Via `OtelTracer` class |
| `AgentOpsTracer` | AgentOps integration for tracing | Via `AgentOpsTracer` class |
| `Span` | Individual trace unit | Data model |
| `emit_reward` | Reward signal emission | API function |

Source: [examples/minimal/write_traces.py:1-40](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## Writing Your First Agent

### Basic Agent Structure

An agent in Agent Lightning is built around the tracing and store infrastructure. The minimal component showcase in `examples/minimal/` demonstrates how individual building blocks behave in isolation. Source: [examples/minimal/README.md:1-10](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)

### Setting Up the Tracer

The framework supports two primary tracing mechanisms:

1. **OtelTracer**: OpenTelemetry-based tracing that can forward spans to a remote store client
2. **AgentOpsTracer**: AgentOps integration for agent operations tracking

```python
from agentlightning import OtelTracer, LightningStoreClient, setup_logging

# Initialize logging
setup_logging()

# Create tracer with optional remote store client
tracer = OtelTracer(
    rollout_id="ro-001",
    attempt_id="at-001",
    store_client=None  # Or LightningStoreClient(endpoint="...")
)
```

Source: [examples/minimal/write_traces.py:40-60](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

### Opening Rollouts and Emitting Spans

Rollouts represent a single execution attempt of an agent, and attempts within rollouts allow for retry logic and tracking.

```python
# Open a new rollout
tracer.open_rollout(rollout_id="ro-001", user_id="user-123")

# Open an attempt within the rollout
tracer.open_attempt(attempt_id="at-001", sequence_id=1)

# Emit spans during agent execution
tracer.emit_span(
    name="tool_execution",
    attributes={
        "tool": "web_search",
        "query": "onboarding summary"
    }
)

# Close attempt and rollout
tracer.close_attempt()
tracer.close_rollout()
```

Source: [examples/minimal/write_traces.py:60-85](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## Span Data Model

Spans are the fundamental unit of tracing in Agent Lightning. Each span captures a discrete unit of work within an agent's execution.

### Span Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `rollout_id` | string | Unique identifier for the rollout |
| `attempt_id` | string | Unique identifier for the attempt |
| `sequence_id` | integer | Order of the span within the attempt |
| `trace_id` | string | Trace grouping identifier |
| `span_id` | string | Unique span identifier |
| `parent_id` | string | Parent span ID for hierarchy |
| `name` | string | Human-readable span name |
| `status` | TraceStatus | Execution status (OK, ERROR) |
| `attributes` | dict | Key-value metadata |
| `start_time` | datetime | Span start timestamp |
| `end_time` | datetime | Span end timestamp |

Source: [dashboard/test-utils/python-server.py:1-100](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

### Example Span Creation

```python
from agentlightning import Span, TraceStatus
from datetime import datetime

span = Span(
    rollout_id="ro-story-001",
    attempt_id="at-story-010",
    sequence_id=3,
    trace_id="trace-001-main",
    span_id="span-003-tool",
    parent_id="span-001-root",
    name="tool_execution",
    status=TraceStatus(status_code="OK", description=None),
    attributes={"tool": "web_search", "query": "onboarding summary"},
    events=[],
    links=[],
    start_time=datetime.now(),
    end_time=datetime.now(),
    context=None,
    parent=None,
    resource=OtelResource(attributes={"service.name": "tool-service"}, schema_url="")
)
```

Source: [dashboard/test-utils/python-server.py:100-130](https://github.com/microsoft/agent-lightning/blob/main/dashboard/test-utils/python-server.py)

## Using Operations

The framework provides an `operation` decorator for recording synthetic operation spans with additional linking capabilities.

```python
from agentlightning.operation import operation
from agentlightning.utils.otel import make_link_attributes, make_tag_attributes

# Record an operation span
@operation(name="classify_ticket")
def classify_ticket(ticket: str):
    with make_link_attributes(linked_rollout_id="ro-001", linked_attempt_id="at-001"):
        # Operation execution
        result = llm.classify(ticket)
    
    # Tag the reward
    make_tag_attributes(tags={"accuracy": 0.95})
    emit_reward(reward=0.95, name="classification_accuracy")
    
    return result
```

Source: [examples/minimal/write_traces.py:20-35](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## LightningStore Integration

The `LightningStore` keeps tracers and runners synchronized, serving as the central data repository.

```python
from agentlightning.store import InMemoryLightningStore

# Use in-memory store for local development
store = InMemoryLightningStore()

# Or connect to a remote store server
store = LightningStoreClient(endpoint="http://localhost:45993")
```

Source: [examples/minimal/write_traces.py:25-35](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

### Store Server CLI

Start a LightningStore server with OTLP enabled:

```bash
agl store --port 45993 --log-level DEBUG
```

Source: [examples/minimal/write_traces.py:15-20](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## Workflow Execution Model

Agents in Agent Lightning follow a structured execution model with rollouts, attempts, and spans.

```mermaid
graph LR
    subgraph Rollout[Rollout: ro-001]
        subgraph Attempt1[Attempt: at-001]
            S1[Span: root]
            S2[Span: preprocess]
            S3[Span: classify]
            S1 --> S2
            S2 --> S3
        end
        subgraph Attempt2[Attempt: at-002]
            S4[Span: root]
            S5[Span: preprocess]
            S6[Span: classify]
            S4 --> S5
            S5 --> S6
        end
    end
```

### State Transitions

| State | Description |
|-------|-------------|
| `pending` | Rollout/attempt created but not started |
| `running` | Currently executing |
| `completed` | Successfully finished |
| `failed` | Execution failed |
| `cancelled` | Manually cancelled |

Source: [dashboard/src/components/RolloutTable.component.tsx:1-50](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/RolloutTable.component.tsx)

## Reward Emission

Agents emit reward signals that algorithms consume during training.

```python
from agentlightning import emit_reward

# Emit a reward with metadata
emit_reward(
    reward=0.85,
    name="task_success",
    attributes={
        "task_id": "classification",
        "accuracy": 0.85,
        "latency_ms": 150
    }
)
```

### Reward Span Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `reward.value` | float | Numeric reward value |
| `reward.name` | string | Reward signal identifier |
| `reward.attributes` | dict | Additional metadata |

## Dashboard Integration

The Agent Lightning Dashboard provides real-time inspection of store data and debugging capabilities for running experiments. Source: [dashboard/README.md:1-10](https://github.com/microsoft/agent-lightning/blob/main/dashboard/README.md)

### Drawer Components

The dashboard uses drawer components to display detailed information:

```typescript
// Worker detail drawer
if (content.type === 'worker-detail') {
    const { worker } = content;
    const title = <WorkerDrawerTitle worker={worker} />;
    const body = <JsonEditor value={worker} />;
    return { title, body };
}

// Trace detail drawer
if (content.type === 'trace-detail') {
    const { span } = content;
    const title = <TraceDrawerTitle span={span} />;
    const body = <JsonEditor value={span} />;
    return { title, body };
}
```

Source: [dashboard/src/components/AppDrawer.component.tsx:1-50](https://github.com/microsoft/agent-lightning/blob/main/dashboard/src/components/AppDrawer.component.tsx)

## Minimal Examples Reference

The `examples/minimal/` directory provides documented examples for each building block:

| Component | File | Purpose |
|-----------|------|---------|
| LightningStore + OTLP | `write_traces.py` | Shows `OtelTracer` and `AgentOpsTracer` for rollouts and spans |
| MultiMetrics | `write_metrics.py` | Console and Prometheus metrics backends |
| LLM Proxying | `llm_proxy.py` | Request routing through `/rollout/<id>/attempt/<id>` namespaces |
| vLLM Lifecycle | `vllm_server.py` | Context manager for vLLM server lifecycle |

Source: [examples/minimal/README.md:10-30](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/README.md)

## Best Practices

1. **Use descriptive span names**: Names like `tool_execution` and `classification_pipeline` make debugging easier in the dashboard.
2. **Set appropriate parent IDs**: Maintain span hierarchy for better trace visualization.
3. **Emit rewards consistently**: Use `emit_reward` after each task completion to enable algorithm training.
4. **Handle failures explicitly**: Set appropriate `TraceStatus` codes and descriptions for failed spans.
5. **Use operations for complex workflows**: The `@operation` decorator simplifies recording complex multi-step processes.

## Next Steps

- Explore the [API Reference](./api-reference.md) for detailed method signatures
- Learn about [Training Algorithms](../how-to/training-algorithms.md) that consume traces
- Set up the [Dashboard](../how-to/dashboard-setup.md) for real-time monitoring
- Review [Examples](../examples/overview.md) for complete agent implementations

---

<a id='trainer'></a>

## Trainer Component

### Related Pages

Related topics: [Runner Component](#runner), [LightningStore](#store), [Algorithm Zoo](#algorithm-zoo)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [agentlightning/trainer/trainer.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/trainer.py)
- [agentlightning/trainer/registry.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/registry.py)
- [agentlightning/trainer/init_utils.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/init_utils.py)
- [examples/apo/apo_custom_algorithm_trainer.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)
- [contrib/recipes/agentos/README.md](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/agentos/README.md)
- [examples/calc_x/legacy_calc_agent.py](https://github.com/microsoft/agent-lightning/blob/main/examples/calc_x/legacy_calc_agent.py)
</details>

# Trainer Component

The Trainer is the core orchestration component in Agent Lightning responsible for managing the reinforcement learning training loop. It coordinates runners, algorithms, and the LightningStore to execute agent training with scalable execution strategies.

## Overview

The Trainer serves as the central control plane that:

- Manages worker processes for parallel rollout execution
- Coordinates between the agent runner and learning algorithm
- Persists training traces to the LightningStore
- Provides pluggable execution strategies for different deployment scenarios

Source: [agentlightning/trainer/registry.py:1-6](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/registry.py)

## Architecture

### Component Interactions

```mermaid
graph TD
    T[Trainer] --> R[Runner<br/>Agent Execution]
    T --> A[Algorithm<br/>Policy Update]
    T --> S[LightningStore<br/>Trace Storage]
    T --> E[ExecutionStrategy]
    
    E --> SHM[SharedMemory<br/>Local Workers]
    E --> CS[ClientServer<br/>Remote Workers]
    
    R --> S
    A --> S
```

### Training Loop Flow

```mermaid
sequenceDiagram
    participant T as Trainer
    participant R as Runner
    participant S as LightningStore
    participant A as Algorithm
    
    T->>R: Initialize with config
    T->>A: Load algorithm
    T->>S: Connect store
    
    loop Training Steps
        T->>R: Execute rollouts
        R->>S: Emit spans
        T->>S: Retrieve traces
        T->>A: Process traces
        A->>T: Policy update
    end
```

## Core Configuration

### Constructor Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `n_workers` | `int` | `1` | Number of parallel worker processes |
| `algorithm` | `Algorithm \| str` | `None` | Learning algorithm (name or instance) |
| `runner` | `Runner \| None` | `None` | Agent runner for execution |
| `reward_fn` | `RewardFn \| None` | `None` | Reward function for training |
| `execution_strategy` | `str` | `"shm"` | Strategy: `"shm"`, `"cs"` |

Source: [examples/apo/apo_custom_algorithm_trainer.py:35-37](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

### Execution Strategy Registry

The Trainer supports multiple execution strategies through a registry pattern:

```python
ExecutionStrategyRegistry = {
    "shm": "agentlightning.execution.shared_memory.SharedMemoryExecutionStrategy",
    "cs": "agentlightning.execution.client_server.ClientServerExecutionStrategy",
}
```

Source: [agentlightning/trainer/registry.py:1-6](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/registry.py)

| Strategy | Description | Use Case |
|----------|-------------|----------|
| `shm` | Shared Memory - Local multi-process execution | Single-node GPU training |
| `cs` | Client-Server - Remote worker communication | Distributed deployments |

## Usage Patterns

### Basic Training with GRPO Algorithm

```python
from agentlightning import Trainer

trainer = Trainer(
    runner=runner,
    reward_fn=reward_fn,
    algorithm="GRPO"
)

trainer.train()
```

Source: [contrib/recipes/agentos/README.md:40-47](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/agentos/README.md)

### Custom Algorithm Integration

The Trainer accepts custom algorithms decorated with the `@algo` decorator:

```python
from agentlightning import Trainer
from agentlightning.algorithm import algo
from agentlightning.store import LightningStore

@algo
async def custom_algorithm(*, store: LightningStore):
    # Process traces from store
    return policy_update

trainer = Trainer(n_workers=1, algorithm=custom_algorithm)
trainer.fit(rollout_fn)
```

Source: [examples/apo/apo_custom_algorithm_trainer.py:28-37](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

### Parallel Training with Multiple Workers

```python
from agentlightning import Trainer

trainer = Trainer(
    n_workers=4,           # 4 parallel workers
    execution_strategy="shm",  # Shared memory for local execution
    algorithm="PPO",
    runner=runner
)

trainer.train()
```

## Integration with Agent-OS

The Trainer integrates with Agent-OS for policy-governed training:

```python
from agentlightning import Trainer
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy

# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(deny=["DROP", "DELETE"]))

# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)

# Train with policy-aware rewards
trainer = Trainer(
    runner=runner,
    reward_fn=PolicyReward(kernel),
    algorithm="GRPO"
)

trainer.train()
```

Source: [contrib/recipes/agentos/README.md:25-45](https://github.com/microsoft/agent-lightning/blob/main/contrib/recipes/agentos/README.md)

## Workflow Phases

| Phase | Description |
|-------|-------------|
| **Initialization** | Load algorithm, connect store, spawn workers |
| **Rollout** | Execute agent episodes in parallel workers |
| **Trace Collection** | Retrieve spans from LightningStore |
| **Algorithm Update** | Process traces and update policy |
| **Iteration** | Repeat rollout-collect-update cycle |

## LightningStore Integration

The Trainer maintains bidirectional synchronization with LightningStore:

- **Span Emission**: Workers emit execution traces during rollout
- **Trace Retrieval**: Algorithm reads completed traces for learning
- **Persistence**: Training state survives worker restarts

Source: [CLAUDE.md:4-6](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Command-Line Interface

The Trainer can be invoked via the `agl` CLI:

```bash
# Start training
agl store
python my_training_script.py algo
python my_training_script.py runner
```

Or programmatically:

```bash
python my_training_script.py
```

Source: [examples/apo/apo_custom_algorithm_trainer.py:12-20](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

## Extending the Trainer

### Custom Execution Strategy

Add new strategies to the registry:

```python
# In agentlightning/trainer/registry.py
ExecutionStrategyRegistry["custom"] = "mymodule.CustomExecutionStrategy"
```

### Custom Algorithm

Decorate async functions with `@algo`:

```python
from agentlightning.algorithm import algo

@algo
async def my_algorithm(*, store: LightningStore):
    traces = await store.traces.get_all()
    # Process traces
    return update
```

## Dependencies

| Dependency | Purpose |
|------------|---------|
| `LightningStore` | Trace persistence and retrieval |
| `Algorithm` | Policy learning logic |
| `Runner` | Agent execution environment |
| `ExecutionStrategy` | Worker orchestration |
| `RewardFn` | Training signal computation |

## See Also

- [Agent Lightning Architecture](AGENTS.md) - System-wide architecture overview
- [Algorithm Component](algorithm.md) - Learning algorithm details
- [LightningStore](store.md) - Trace storage system
- [Execution Strategies](execution.md) - Available execution modes

---

<a id='runner'></a>

## Runner Component

### Related Pages

Related topics: [Trainer Component](#trainer), [Tutorial: Writing Agents](#write-agents)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [agentlightning/runner/base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)
- [agentlightning/runner/agent.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/agent.py)
- [agentlightning/runner/legacy.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/legacy.py)
- [agentlightning/runner/__init__.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/__init__.py)
- [agentlightning/types/core.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)
- [agentlightning/trainer/trainer.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/trainer.py)
</details>

# Runner Component

The Runner component is the core execution engine in Agent Lightning responsible for managing agent lifecycle, task processing, and telemetry collection. Runners serve as the bridge between the high-level `Trainer` orchestration and the underlying `LitAgent` implementation, handling initialization, worker management, and graceful shutdown.

## Overview

Runners execute agents in a continuous loop where they poll the `LightningStore` for tasks, execute agent logic, and emit tracing spans for algorithm consumption. The Runner architecture supports both standard execution through `LitAgentRunner` and legacy compatibility through `LegacyAgentRunner`.

Source: [agentlightning/runner/__init__.py:1-11](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/__init__.py)

```python
from .agent import LitAgentRunner
from .base import Runner
from .legacy import LegacyAgentRunner

__all__ = [
    "Runner",
    "LegacyAgentRunner",
    "LitAgentRunner",
]
```

## Architecture

```mermaid
graph TD
    A[Trainer] --> B[Runner Fleet]
    B --> C[LitAgentRunner]
    B --> D[LegacyAgentRunner]
    C --> E[LitAgent]
    D --> F[AgentLightningClient]
    E --> G[LightningStore]
    F --> G
    E --> H[Tracer]
    H --> G
```

### Runner Hierarchy

| Class | Purpose | Source |
|-------|---------|--------|
| `Runner` | Abstract base class defining the runner interface | [base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py) |
| `LitAgentRunner` | Primary runner implementation for standard agent execution | [agent.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/agent.py) |
| `LegacyAgentRunner` | Runner for backward compatibility with AgentOps integration | [legacy.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/legacy.py) |

## Runner Base Class

The `Runner` class defines the core interface that all runner implementations must follow. It establishes the lifecycle methods and execution patterns.

Source: [agentlightning/runner/base.py:1-20](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)

### Lifecycle Methods

The runner lifecycle consists of four key phases:

```mermaid
graph LR
    A[init] --> B[init_worker]
    B --> C[iter/step]
    C --> D[teardown_worker]
    D --> E[teardown]
```

| Method | Purpose | Must Implement |
|--------|---------|----------------|
| `init(agent, hooks)` | Initialize runner with agent and hooks | Yes |
| `init_worker(worker_id, store)` | Per-worker initialization with store | Yes |
| `teardown()` | Release resources from init() | Yes |
| `teardown_worker(worker_id)` | Release per-worker resources | Yes |

### Context Manager Pattern

Runners support a context manager pattern for automatic resource management:

```python
with runner.run_context(agent=agent, store=store, hooks=hooks) as runner:
    # Runner is initialized and ready
    await runner.iter()
# Automatic teardown on exit
```

Source: [agentlightning/runner/base.py:52-86](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)

The `run_context` helper ensures proper cleanup even when exceptions occur:

```python
try:
    self.init(agent=agent, hooks=hooks)
    _initialized = True
    self.init_worker(worker_id=0, store=store)
    _worker_initialized = True
    yield self
finally:
    try:
        if _worker_initialized:
            self.teardown_worker(worker_id=worker_id if worker_id is not None else 0)
    except Exception:
        logger.error("Error during runner worker teardown", exc_info=True)

    try:
        if _initialized:
            self.teardown()
    except Exception:
        logger.error("Error during runner teardown", exc_info=True)
```

### Execution Methods

| Method | Description | Behavior |
|--------|-------------|----------|
| `iter(event)` | Run continuously until event or no tasks | Abstract - subclasses implement |
| `step()` | Execute single unit of work | Abstract - subclasses implement |
| `run()` | Legacy run method | Raises `RuntimeError` - use `iter()` or `step()` |

Source: [agentlightning/runner/base.py:88-102](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)

> **Warning**: The `run()` method raises `RuntimeError` because its behavior is undefined. Always use `iter()` or `step()` instead.

## LitAgentRunner

`LitAgentRunner` is the primary runner implementation that manages the agent-runner relationship, hook registration, and tracer integration.

Source: [agentlightning/runner/agent.py:1-30](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/agent.py)

### Initialization Flow

```mermaid
sequenceDiagram
    participant Trainer
    participant LitAgentRunner
    participant LitAgent
    participant Tracer
    participant LightningStore

    Trainer->>LitAgentRunner: init(agent, hooks)
    LitAgentRunner->>LitAgent: set_runner(self)
    LitAgentRunner->>Tracer: init()
    Trainer->>LitAgentRunner: init_worker(worker_id, store)
    LitAgentRunner->>Tracer: init_worker(worker_id, store)
```

### Key Properties

| Property | Type | Description |
|----------|------|-------------|
| `agent` | `LitAgent[T_task]` | The agent instance (via `get_agent()`) |
| `store` | `LightningStore` | The backing store (via `get_store()`) |
| `worker_id` | `Optional[int]` | Unique worker identifier |
| `tracer` | `Tracer` | Tracer for span emission |

Source: [agentlightning/runner/agent.py:90-110](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/agent.py)

### Accessor Methods

```python
def get_agent(self) -> LitAgent[T_task]:
    """Get the agent instance."""
    if self._agent is None:
        raise ValueError("Agent not initialized. Call init() first.")
    return self._agent

def get_store(self) -> LightningStore:
    """Get the store instance."""
    if self._store is None:
        raise ValueError("Store not initialized. Call init_worker() first.")
    return self._store

def get_worker_id(self) -> str:
    """Get the formatted worker ID string."""
    return f"Worker-{self.worker_id}" if self.worker_id is not None else "Worker-Unknown"
```

### Logging Prefix

The `_log_prefix()` method generates consistent log prefixes for traceability:

```python
def _log_prefix(self, rollout_id: Optional[str] = None) -> str:
    """Generate a standardized log prefix for the current worker."""
    # Returns format: "[Worker-{id}] [{rollout_id}]"
```

## LegacyAgentRunner

`LegacyAgentRunner` provides backward compatibility for workflows using the AgentOps integration and `AgentLightningClient` communication pattern.

Source: [agentlightning/runner/legacy.py:1-35](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/legacy.py)

### Attributes

| Attribute | Type | Description |
|-----------|------|-------------|
| `agent` | `LitAgent[Any]` | The agent instance |
| `client` | `AgentLightningClient` | Server communication client |
| `tracer` | `Tracer` | Tracer instance for span emission |
| `worker_id` | `Optional[str]` | Worker identifier |
| `max_tasks` | `Optional[int]` | Maximum tasks before stopping |

### Architecture

```mermaid
graph TD
    A[LegacyAgentRunner] --> B[LitAgent]
    A --> C[AgentLightningClient]
    A --> D[Tracer]
    C --> E[Server]
    D --> F[LightningStore]
    B --> F
```

## Hook System Integration

Runners integrate with the hook system to provide extensibility at key lifecycle points:

Source: [agentlightning/types/core.py:1-30](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

| Hook | Timing | Purpose |
|------|--------|---------|
| `on_trace_start` | Before tracer enters trace context | Logging, metric collection, resource setup |
| `on_trace_end` | After rollout completes, before tracer exits | Logging, cleanup |
| `on_rollout_start` | Before rollout attempt begins | Per-attempt initialization |
| `on_rollout_end` | After rollout attempt completes | Result processing, cleanup |

Hooks are registered during initialization and called by the runner at appropriate points during execution.

## Trainer Integration

Runners are instantiated and managed by the `Trainer` class, which orchestrates the entire training loop:

Source: [agentlightning/trainer/trainer.py:40-60](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/trainer/trainer.py)

```python
class Trainer(TrainerLegacy):
    """High-level orchestration layer that wires Algorithm <-> Runner <-> Store."""
    
    # Runner fleet configuration
    n_runners: int  # Number of agent runners to run in parallel
    max_rollouts: Optional[int]  # Maximum rollouts per runner
    strategy: ExecutionStrategy  # Process management strategy
    tracer: Tracer  # Tracer instance for telemetry
    hooks: Sequence[Hook]  # Lifecycle callbacks
```

### Training Configuration

| Parameter | Type | Description |
|-----------|------|-------------|
| `n_runners` | `int` | Number of parallel agent runners |
| `max_rollouts` | `Optional[int]` | Stop after N rollouts (None = unlimited) |
| `strategy` | `ExecutionStrategy` | Spawning strategy (shared memory, client/server) |
| `tracer` | `Tracer` | Tracer class or config for span collection |
| `hooks` | `Sequence[Hook]` | Lifecycle callback instances |

## Execution Flow

```mermaid
graph TD
    A[Trainer.fit/dev] --> B[Spawn Runner Fleet]
    B --> C[For each Runner]
    C --> D[runner.run_context]
    D --> E[init + init_worker]
    E --> F[iter/event loop]
    F --> G{Tasks available?}
    G -->|Yes| H[Execute step]
    H --> I[Emit spans to Store]
    I --> F
    G -->|No| J[Exit loop]
    J --> K[teardown_worker]
    K --> L[teardown]
```

## Context Manager Usage

For debugging or standalone usage outside the Trainer stack:

```python
from agentlightning import LitAgentRunner, InMemoryLightningStore

# Create store and agent
store = InMemoryLightningStore()
agent = MyLitAgent()

# Use context manager
runner = LitAgentRunner(tracer=AgentOpsTracer())
with runner.run_context(agent=agent, store=store) as runner:
    # Runner initialized and ready
    worker_id = runner.get_worker_id()
    print(f"Running on {worker_id}")
    
    # Run until complete
    await runner.iter()
# Automatic cleanup
```

Source: [agentlightning/runner/base.py:88-113](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/runner/base.py)

## Error Handling

Runners implement robust error handling during teardown:

| Phase | Error Behavior | Recovery |
|-------|----------------|----------|
| `teardown_worker` | Logged but doesn't propagate | Continue to `teardown` |
| `teardown` | Logged but doesn't propagate | Context manager completes |

This ensures that multiple cleanup errors don't mask the original failure and that partial cleanup still occurs.

## Summary

The Runner component provides:

1. **Lifecycle Management** - Consistent init/teardown patterns via context managers
2. **Worker Isolation** - Per-worker initialization with dedicated store connections
3. **Hook Integration** - Extensibility through lifecycle callbacks
4. **Telemetry** - Built-in tracer integration for span emission
5. **Trainer Integration** - Seamless orchestration within the training loop

Runners are the execution backbone of Agent Lightning, translating high-level training commands into agent task processing while maintaining observability through distributed tracing.

---

<a id='store'></a>

## LightningStore

### Related Pages

Related topics: [System Architecture](#architecture), [Trainer Component](#trainer)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [agentlightning/store/base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/base.py)
- [agentlightning/store/memory.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/memory.py)
- [agentlightning/store/client_server.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/client_server.py)
- [agentlightning/store/threading.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/threading.py)
- [agentlightning/store/collection_based.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/collection_based.py)
</details>

# LightningStore

LightningStore is the central data persistence and synchronization layer in Agent Lightning. It manages the lifecycle of AI agent training workflows, including rollouts, attempts, spans, resources, and worker state. The store serves as the backbone for the training loop, enabling distributed execution, tracing, and experiment tracking.

## Overview

LightningStore provides a unified interface for:

- **Rollout Management**: Tracking agent task executions from enqueue to completion
- **Span Recording**: Capturing fine-grained traces of agent operations via OpenTelemetry
- **Resource Management**: Storing and versioning agent configurations, prompts, and model definitions
- **Worker Coordination**: Managing distributed worker states and heartbeats
- **Metrics Collection**: Aggregating training metrics through Prometheus integration

Source: [agentlightning/store/base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/base.py)

## Architecture

LightningStore follows a pluggable backend architecture with a unified async interface.

```mermaid
graph TD
    subgraph "Client Layer"
        Runner[Runner] --> Tracer[Tracers<br/>OtelTracer<br/>AgentOpsTracer]
        Tracer --> Client[LightningStoreClient]
    end
    
    subgraph "Server Layer"
        Client --> |HTTP/gRPC| Server[LightningStoreServer]
        Server --> Collections[LightningCollections]
    end
    
    subgraph "Storage Backends"
        Collections --> InMemory[InMemoryLightningStore]
        Collections --> SQLite[SQLiteLightningStore]
        Collections --> Mongo[MongoLightningStore]
    end
    
    subgraph "Thread Safety"
        Store[Any Store] --> Threaded[LightningStoreThreaded]
    end
```

### Core Components

| Component | Purpose |
|-----------|---------|
| `LightningStore` | Abstract base class defining the store interface |
| `LightningStoreClient` | HTTP client for remote store communication |
| `LightningStoreServer` | FastAPI-based server handling store operations |
| `LightningCollections` | Organized data collections (rollouts, spans, resources, workers) |
| `LightningStoreThreaded` | Thread-safe wrapper for concurrent access |

Source: [agentlightning/store/client_server.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/client_server.py)

## Data Models

### Core Types

The store operates on these fundamental data structures:

| Model | Description |
|-------|-------------|
| `Rollout` | A complete task execution with status, timestamps, and metadata |
| `Attempt` | A single attempt within a rollout (supports retries) |
| `Span` | Fine-grained trace data for agent operations |
| `TaskInput` | Input data for a task (prompt, parameters) |
| `Worker` | Worker node state and heartbeat information |
| `ResourcesUpdate` | Versioned resource configuration storage |
| `RolloutConfig` | Configuration for rollout execution |

Source: [agentlightning/types/core.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/types/core.py)

### Rollout Status Lifecycle

```mermaid
stateDiagram-v2
    [*] --> Pending: enqueue_rollout
    Pending --> Running: start_rollout
    Running --> Completed: finish_rollout
    Running --> Failed: fail_rollout
    Completed --> [*]
    Failed --> [*]
    
    Running --> Attempted: start_attempt
    Attempted --> Running: finish_attempt
```

The status values are:
- `pending` - Queued for execution
- `running` - Currently executing
- `completed` - Successfully finished
- `failed` - Execution failed

Source: [agentlightning/store/base.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/base.py)

## API Endpoints

The server exposes REST endpoints under `/v1/agl`:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/rollouts` | POST | Enqueue new rollouts |
| `/rollouts/{id}` | GET | Retrieve rollout by ID |
| `/rollouts/{id}/start` | POST | Mark rollout as started |
| `/rollouts/{id}/finish` | POST | Complete a rollout |
| `/rollouts/{id}/attempt` | POST | Start a new attempt |
| `/rollouts/{id}/attempt/{aid}/finish` | POST | Finish an attempt |
| `/spans` | POST | Record span data |
| `/spans/search` | POST | Query spans with filters |
| `/resources` | POST | Add new resources |
| `/resources/{id}` | GET/PUT | Get or update resources |
| `/workers` | POST | Register worker |
| `/workers/{id}/heartbeat` | POST | Worker heartbeat |
| `/statistics` | GET | Store statistics |

Source: [agentlightning/store/client_server.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/client_server.py)

## Implementation Backends

### In-Memory Store

The `InMemoryLightningStore` provides a lightweight, zero-dependency backend suitable for single-node execution and testing.

**Key characteristics:**
- All data stored in process memory
- Supports collections with atomic transactions
- Built-in size estimation for memory monitoring
- Fast for development and small-scale experiments

```python
from agentlightning import InMemoryLightningStore

store = InMemoryLightningStore()
```

Source: [agentlightning/store/memory.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/memory.py)

### SQLite Store

SQLite backend provides persistent storage with ACID guarantees, suitable for single-node deployments requiring durability.

### MongoDB Store

MongoDB backend supports distributed deployments with horizontal scaling, providing high throughput for large-scale training runs.

## Thread Safety

The `LightningStoreThreaded` class wraps any store implementation to provide thread-safe access:

```python
from agentlightning.store.threading import LightningStoreThreaded

# Wrap any store with thread safety
threaded_store = LightningStoreThreaded(store)
```

**Thread safety features:**
- Uses `threading.Lock` for synchronization
- Guarantees atomic operations across concurrent requests
- Maintains all original store capabilities
- Exposes `thread_safe: True` and `async_safe: True` in capabilities

Source: [agentlightning/store/threading.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/threading.py)

## Collection Operations

LightningStore uses a collection-based data organization pattern:

```python
# Atomic write operation
async with store.collections.atomic(mode="w", snapshot=..., labels=["resources"]) as collections:
    await collections.resources.insert([update])
```

### Supported Collections

| Collection | Purpose |
|------------|---------|
| `rollouts` | Task execution records |
| `attempts` | Individual attempt tracking |
| `spans` | OpenTelemetry trace spans |
| `resources` | Versioned configurations |
| `workers` | Worker state management |

Source: [agentlightning/store/collection_based.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/collection_based.py)

## Decorators and Instrumentation

The store layer uses several decorators for observability and reliability:

| Decorator | Purpose |
|-----------|---------|
| `@tracked` | Records operation metrics and timing |
| `@healthcheck_before` | Validates store health before operations |
| `@_with_collections_execute` | Manages collection lifecycle and error handling |

## Integration with Tracers

LightningStore integrates with OpenTelemetry through tracers:

```python
from agentlightning import OtelTracer, AgentOpsTracer

tracer = OtelTracer(store=store)
```

**Tracing workflow:**

```mermaid
sequenceDiagram
    participant Agent
    participant Tracer
    participant Store
    participant OTLP
    
    Agent->>Tracer: Create span
    Tracer->>Store: Record span data
    Store->>Store: Persist to backend
    Tracer->>OTLP: Export spans (optional)
```

Source: [examples/minimal/write_traces.py](https://github.com/microsoft/agent-lightning/blob/main/examples/minimal/write_traces.py)

## Usage Examples

### Basic Store Operations

```python
from agentlightning import InMemoryLightningStore

# Create store
store = InMemoryLightningStore()

# Enqueue a task
rollout = await store.enqueue_rollout(
    input={"prompt": "Solve this problem"},
    mode="train"
)

# Dequeue for processing
task = await store.dequeue_rollout(worker_id="worker-1")

# Complete the rollout
await store.finish_rollout(
    rollout_id=task.rollout.rollout_id,
    attempt_id=task.attempt.attempt_id,
    response={"answer": "42"}
)
```

### Server Setup

```bash
# Start a LightningStore server
agl store --port 45993 --log-level DEBUG
```

### Client Connection

```python
from agentlightning import LightningStoreClient

client = LightningStoreClient(base_url="http://localhost:45993")

# All operations work through the client
rollouts = await client.list_rollouts()
```

## Capabilities

The store reports its capabilities through the `capabilities` property:

| Capability | Description |
|------------|-------------|
| `async_safe` | Supports async operations |
| `thread_safe` | Supports concurrent thread access |
| `distributed` | Supports multi-node deployment |
| `persistence` | Data survives restarts |

Source: [agentlightning/store/threading.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/store/threading.py)

## CLI Commands

The `agl` CLI provides store management:

```bash
# Start store server
agl store --port 45993 --log-level DEBUG

# Prometheus metrics endpoint
agl prometheus
```

Source: [agentlightning/cli/__init__.py](https://github.com/microsoft/agent-lightning/blob/main/agentlightning/cli/__init__.py)

---

<a id='algorithm-zoo'></a>

## Algorithm Zoo

### Related Pages

Related topics: [Tutorial: Train Your First Agent](#train-first-agent)

<details>
<summary>Relevant source files</summary>

The following source files were used to generate this page:

- [examples/apo/apo_custom_algorithm.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm.py)
- [examples/apo/apo_custom_algorithm_trainer.py](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)
- [examples/apo/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md)
- [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)
- [examples/rag/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/rag/README.md)
</details>

# Algorithm Zoo

## Overview

The **Algorithm Zoo** is a modular collection of training algorithms that consume execution traces from the Agent Lightning runtime to improve agent behavior through reinforcement learning and prompt optimization. Source: [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

Agent Lightning runs through a continuous loop where **runners** and **tracers** emit spans, `LightningStore` keeps them synchronized, and algorithms in `agentlightning/algorithm/` consume those traces to improve behavior. Source: [CLAUDE.md](https://github.com/microsoft/agent-lightning/blob/main/CLAUDE.md)

## Architecture

The Algorithm Zoo follows a producer-consumer pattern where the store acts as the central synchronization hub:

```mermaid
graph TD
    A[Runners] -->|emit spans| B[LightningStore]
    C[Tracers] -->|emit spans| B
    B -->|traces| D[Algorithm Zoo]
    D -->|policy updates| E[Improved Agent Behavior]
    B -->|traces| F[Dashboard]
```

## Available Algorithms

### APO (Adaptive Prompt Optimization)

APO is a prompt optimization algorithm that iteratively refines prompt templates based on reward signals collected from agent rollouts.

#### How APO Works

The APO algorithm maintains a collection of prompt candidates and evaluates each one against task objectives. Based on the reward signals, it selects and refines the most effective prompts. Source: [examples/apo/apo_custom_algorithm.py:34-37](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm.py)

```python
async def apo_algorithm(*, store: agl.LightningStore):
    """
    An example of how a prompt optimization works.
    """
    prompt_candidates = [
        "You are a helpful assistant. {any_question}",
        "You are a knowledgeable AI. {any_question}",
        "You are a friendly chatbot. {any_question}",
    ]

    prompt_and_rewards: list[tuple[str, float]] = []
```

#### Custom APO Algorithm

To create a custom algorithm, wrap your async function with the `@algo` decorator. Source: [examples/apo/apo_custom_algorithm_trainer.py:28-39](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

```python
from agentlightning.algorithm import algo

@algo
async def apo_algorithm_usable_in_trainer(*, store: LightningStore):
    """
    You need to wrap the apo_algorithm in an algo decorator to make it usable in trainer.
    """
    return await apo_algorithm(store=store)
```

### VERL (Value-Enhanced Reinforcement Learning)

VERL is a full training algorithm that integrates with the VERL library for GPU-accelerated reinforcement learning. Source: [examples/tinker/q20_train.py:43-52](https://github.com/microsoft/agent-lightning/blob/main/examples/tinker/q20_train.py)

```python
algo_verl_parser = subparsers.add_parser("verl", help="Launch the full training algorithm with VERL.")
algo_verl_parser.add_argument("--port", type=int, default=4747, help="Port for the AgentLightning store.")
algo_verl_parser.add_argument(
    "--model",
    choices=("qwen25", "qwen3"),
    default="qwen3",
    help="Model variant to train.",
)
algo_verl_parser.add_argument("--search", action="store_true", help="Enable search tool.")
```

### FAST (Fast Algorithm Suite Toolkit)

The FAST algorithm provides lightweight optimization capabilities for rapid experimentation.

## Running Algorithms

### Option A: Separate Components

Start the store, algorithm, and runner in three separate terminals: Source: [examples/apo/README.md:10-24](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/README.md)

```bash
# Terminal 1: Start the store
agl store

# Terminal 2: Run the algorithm
python apo_custom_algorithm.py algo

# Terminal 3: Run the rollout runner
python apo_custom_algorithm.py runner
```

### Option B: Integrated Trainer

Use the integrated trainer that handles all components: Source: [examples/apo/apo_custom_algorithm_trainer.py:47-49](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

```python
from agentlightning import Trainer, setup_logging

trainer = Trainer(n_workers=1, algorithm=apo_algorithm_usable_in_trainer)
trainer.fit(apo_rollout)
```

### Algorithm Decorator

The `@algo` decorator transforms any async algorithm function into a component that can be used with the `Trainer`. It injects the `LightningStore` as a keyword argument. Source: [examples/apo/apo_custom_algorithm_trainer.py:28-39](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm_trainer.py)

## Algorithm Configuration

### Common Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `store` | `LightningStore` | Central store for traces and resources |
| `n_workers` | `int` | Number of parallel workers |
| `port` | `int` | Port for store connection (default: 4747) |

### VERL-Specific Options

| Option | Choices | Default | Description |
|--------|---------|---------|-------------|
| `--model` | qwen25, qwen3 | qwen3 | Model variant to train |
| `--port` | int | 4747 | Store connection port |
| `--search` | flag | False | Enable search tool |

## Workflow

```mermaid
graph LR
    A[Define Prompt Candidates] --> B[Loop Through Candidates]
    B --> C[Update Resources in Store]
    C --> D[Run Rollout with Runner]
    D --> E[Collect Reward Signal]
    E --> F[Update Prompt Template]
    F --> B
```

## Extending the Algorithm Zoo

### Creating Custom Algorithms

1. Define an async function that takes `store: LightningStore` as a keyword argument
2. Wrap it with the `@algo` decorator
3. Implement your optimization logic
4. Use the trainer or run separately

Example pattern: Source: [examples/apo/apo_custom_algorithm.py:54-72](https://github.com/microsoft/agent-lightning/blob/main/examples/apo/apo_custom_algorithm.py)

```python
async def apo_algorithm(*, store: agl.LightningStore):
    for prompt in prompt_candidates:
        # 1. The optimization algorithm updates the prompt template
        console.print(f"[Algo] Updating prompt template to: '{prompt}'")
        resources: agl.NamedResources = {
            # The "main_prompt" can be replaced with any name
        }
        # 2. Update resources in store
        # 3. Collect reward signals
        # 4. Refine prompt based on rewards
```

### Requirements for Custom Algorithms

- Must be async functions
- Must accept `store` as keyword argument
- Should be wrapped with `@algo` decorator for trainer integration
- Must interact with `LightningStore` for state synchronization

## Integration with RAG

The Algorithm Zoo can be extended to work with retrieval-augmented generation systems. See the RAG example for integrating FAISS-based retrieval with prompt optimization. Source: [examples/rag/README.md](https://github.com/microsoft/agent-lightning/blob/main/examples/rag/README.md)

## See Also

- [APO Tutorial](../../docs/tutorials/apo.md)
- [Custom Algorithm Tutorial](../../docs/how-to/write-first-algorithm.md)
- [Dashboard Documentation](../../dashboard/README.md)

---

---

## Doramagic Pitfall Log

Project: microsoft/agent-lightning

Summary: Found 7 potential pitfall items; 0 are high/blocking. Highest priority: capability - 能力判断依赖假设.

## 1. capability · 能力判断依赖假设

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: 假设不成立时，用户拿不到承诺的能力。
- Suggested check: 将假设转成下游验证清单。
- Guardrail action: 假设必须转成验证项；没有验证结果前不能写成事实。
- Evidence: capability.assumptions | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | README/documentation is current enough for a first validation pass.

## 2. maintenance · 维护活跃度未知

- Severity: medium
- Evidence strength: source_linked
- Finding: 未记录 last_activity_observed。
- User impact: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Suggested check: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Guardrail action: 维护活跃度未知时，推荐强度不能标为高信任。
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | last_activity_observed missing

## 3. security_permissions · 下游验证发现风险项

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 下游已经要求复核，不能在页面中弱化。
- Suggested check: 进入安全/权限治理复核队列。
- Guardrail action: 下游风险存在时必须保持 review/recommendation 降级。
- Evidence: downstream_validation.risk_items | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium

## 4. security_permissions · 存在安全注意事项

- Severity: medium
- Evidence strength: source_linked
- Finding: No sandbox install has been executed yet; downstream must verify before user use.
- User impact: 用户安装前需要知道权限边界和敏感操作。
- Suggested check: 转成明确权限清单和安全审查提示。
- Guardrail action: 安全注意事项必须面向用户前置展示。
- Evidence: risks.safety_notes | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | No sandbox install has been executed yet; downstream must verify before user use.

## 5. security_permissions · 存在评分风险

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: 风险会影响是否适合普通用户安装。
- Suggested check: 把风险写入边界卡，并确认是否需要人工复核。
- Guardrail action: 评分风险必须进入边界卡，不能只作为内部分数。
- Evidence: risks.scoring_risks | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium

## 6. maintenance · issue/PR 响应质量未知

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: 用户无法判断遇到问题后是否有人维护。
- Suggested check: 抽样最近 issue/PR，判断是否长期无人处理。
- Guardrail action: issue/PR 响应未知时，必须提示维护风险。
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | issue_or_pr_quality=unknown

## 7. maintenance · 发布节奏不明确

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Suggested check: 确认最近 release/tag 和 README 安装命令是否一致。
- Guardrail action: 发布节奏未知或过期时，安装说明必须标注可能漂移。
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | release_recency=unknown

<!-- canonical_name: microsoft/agent-lightning; human_manual_source: deepwiki_human_wiki -->