Doramagic Project Pack · Human Manual

agent-lightning

The Agent Lightning architecture follows a producer-consumer pattern centered around trace collection and consumption.

Introduction to Agent Lightning

Related topics: System Architecture, Installation Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Loop

Continue reading this section for the full explanation and source context.

Section Component Hierarchy

Continue reading this section for the full explanation and source context.

Section Task and Rollout

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Installation Guide

Introduction to Agent Lightning

Agent Lightning is a reinforcement learning framework designed to train any AI agent with RL algorithms. The project provides a unified execution stack, instrumentation capabilities, and training infrastructure that enables researchers and developers to improve agent behavior through reward-based learning. Source: README.md:1

What is Agent Lightning?

Agent Lightning bridges the gap between raw agent execution and RL-based training by providing:

  • Instrumentation Layer: Transparent tracing and logging of agent interactions
  • Training Infrastructure: Built-in support for RL algorithms like GRPO
  • Distributed Execution: Multi-worker rollout management with state synchronization
  • Integration Points: Adapters for popular agent frameworks and execution environments

The framework treats agent training as a continuous feedback loop where traces collected from agent execution are consumed by training algorithms to improve policy behavior over time. Source: CLAUDE.md:3

Architecture Overview

The Agent Lightning architecture follows a producer-consumer pattern centered around trace collection and consumption.

Core Loop

graph TD
    A[Runner] -->|emits spans| B[Tracers]
    B -->|writes traces| C[LightningStore]
    C -->|serves traces| D[Algorithms]
    D -->|updates policy| A
    C -->|serves traces| E[Dashboard]

The continuous execution loop works as follows:

  1. Runners execute agents and emit execution spans
  2. Tracers capture and format these spans with semantic conventions
  3. LightningStore maintains synchronized state across all components
  4. Algorithms consume traces to compute rewards and update agent policies
  5. Dashboard provides real-time visualization for debugging

Source: CLAUDE.md:3

Component Hierarchy

LayerComponentsResponsibility
ExecutionRunner, LitAgentExecute agent logic and manage lifecycle
InstrumentationTracer, OtelTracer, AgentOpsTracerCapture execution traces
StorageLightningStore, LightningStoreClientSynchronized state management
TrainingAlgorithms in agentlightning/algorithm/Process traces, compute rewards
CLIagl commandUser-facing interface

Source: agentlightning/cli/__init__.py:13-16

Core Data Models

The framework defines several fundamental data structures in agentlightning/types/core.py.

Task and Rollout

classDiagram
    class Task {
        +str task_id
        +Any input
        +Optional~str~ instance_id
        +Optional~str~ dataset
    }
    class Rollout {
        +str rollout_id
        +str status
        +Optional~str~ worker_id
        +List~Attempt~ attempts
    }
    class Attempt {
        +str attempt_id
        +str status
        +List~Span~ spans
        +Optional~float~ reward
    }
    class Triplet {
        +Any prompt
        +Any response
        +Optional reward
    }
    
    Task "1" --> "*" Rollout
    Rollout "1" --> "*" Attempt
    Attempt --> Triplet

Core Type Exports

TypePurpose
TaskRepresents a unit of work to be executed by an agent
RolloutCollection of attempts for a single task execution
AttemptSingle execution attempt with spans and reward
TripletPrompt-response-reward tuple for RL training
LightningStoreSynchronized state store for distributed execution

Source: agentlightning/types/core.py:1-60

Runner System

The runner system provides the execution context for agents with integrated lifecycle management.

Runner Lifecycle

sequenceDiagram
    participant User
    participant Runner
    participant Store
    participant Agent
    
    User->>Runner: async with Runner(agent, store)
    Runner->>Runner: init(agent)
    Runner->>Runner: init_worker(store)
    Runner->>Store: Register worker
    Loop Until event
        Runner->>Agent: Execute task
        Agent-->>Runner: Result
        Runner->>Store: Update state
    end
    Runner->>Runner: teardown_worker()
    Runner->>Runner: teardown()

Runner Base Class

The Runner class provides context manager support for safe initialization and cleanup:

async with runner:
    runner.init(agent=agent, hooks=hooks)
    runner.init_worker(worker_id=0, store=store)
    # Execute tasks...

Key runner responsibilities:

  • Initialization: Set up agent and worker state
  • Execution: Poll store for tasks and execute them
  • Cleanup: Graceful teardown of worker and agent resources

Source: agentlightning/runner/base.py:1-80

Tracing and Instrumentation

Agent Lightning provides multiple tracing backends for capturing agent execution.

Supported Tracers

TracerUse CaseBackend
OtelTracerOpenTelemetry-compatible tracingOTLP endpoint
AgentOpsTracerAgentOps platform integrationAgentOps service
Custom TracerFramework integrationPluggable

Semantic Conventions

The framework defines semantic conventions in agentlightning/semconv.py for consistent span attributes:

AttributeDescription
LightningSpanAttributes.REWARDReward values for RL spans
LightningSpanAttributes.LINKSpan linking relationships
LightningSpanAttributes.TAGCustom span tagging
LightningResourceAttributes.ROLLOUT_IDRollout identification
LightningResourceAttributes.ATTEMPT_IDAttempt identification

Source: agentlightning/semconv.py:1-40

Trace Writing Example

The minimal examples demonstrate trace writing with LightningStore:

from agentlightning import AgentOpsTracer, LightningStoreClient, OtelTracer, Span

# Write traces directly to in-memory store
store = InMemoryLightningStore()
tracer = OtelTracer(store=store)

# Or connect to a server-side store
client = LightningStoreClient(endpoint="http://localhost:45993")

Source: examples/minimal/write_traces.py:1-50

LightningStore

LightningStore is the central state management component that keeps all components synchronized.

Store Capabilities

graph LR
    A[Runners] -->|enqueue/dequeue| B[Rollouts]
    A -->|register| C[Workers]
    D[Tracers] -->|write spans| B
    E[Algorithms] -->|query traces| B
    F[Dashboard] -->|inspect state| B

Store Collections

CollectionData TypeAccess Pattern
rolloutsRolloutEnqueue/dequeue by worker
attemptsAttemptLink to rollout
spansSpanQuery by attempt
workersWorkerHeartbeat management
resourcesResourcesUpdateModel/prompt versioning

Source: dashboard/test-utils/python-server.py:1-100

Training Algorithms

Agent Lightning integrates with reinforcement learning algorithms to improve agent behavior.

Algorithm Integration

The framework supports pluggable algorithms defined in agentlightning/algorithm/. Algorithms consume traces from the LightningStore and compute policy updates.

Agent-OS Integration

For production safety-critical deployments, Agent Lightning integrates with Agent-OS:

from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward

runner = AgentOSRunner(kernel, fail_on_violation=False, emit_violations=True)
reward_fn = PolicyReward(kernel)

This integration provides:

  • Policy enforcement: Kernel-level safety during training
  • Violation penalties: Unsafe actions convert to negative RL rewards
  • Audit trail: Complete visibility from training to production

Source: contrib/recipes/agentos/README.md:1-60

Minimal Component Showcase

The examples/minimal/ directory provides isolated demonstrations of individual building blocks.

Available Examples

ComponentFileDemonstrates
LightningStore + OTLPwrite_traces.pyOtelTracer, AgentOpsTracer, rollout/span emission
MultiMetrics backendwrite_metrics.pyConsole and Prometheus metrics simultaneously
LLM proxyingllm_proxy.pyRequest routing through /rollout/<id>/attempt/<id>
vLLM lifecyclevllm_server.pyServer startup, readiness monitoring, teardown

Each example is self-documenting with CLI arguments and environment variables embedded in module docstrings.

Source: examples/minimal/README.md:1-30

Command-Line Interface

The agl CLI provides entry points for all major framework operations.

Available Subcommands

CommandModuleDescription
agl vllmagentlightning.cli.vllmvLLM server with instrumentation
agl storeagentlightning.cli.storeLightningStore server
agl prometheusagentlightning.cli.prometheusPrometheus metrics endpoint
agl agentopsagentlightning.cli.agentops_serverAgentOps server manager

Starting a LightningStore Server

agl store --port 45993 --log-level DEBUG

The store server enables distributed execution where multiple workers can connect and synchronize state.

Source: agentlightning/cli/__init__.py:1-35

Dashboard

The Agent Lightning Dashboard is a React-based web application for inspecting store state and debugging experiments.

Features

  • Real-time state inspection: View rollouts, attempts, and spans
  • Worker monitoring: Track worker status and heartbeat statistics
  • Resource visualization: Inspect model configurations and prompts
  • Experiment debugging: Analyze trace sequences and reward flows

Technology Stack

LayerTechnology
FrameworkReact
UI ComponentsMantine UI
DocumentationStorybook
TestingVitest

Source: dashboard/README.md:1-35

Project Structure

agent-lightning/
├── agentlightning/          # Core library
│   ├── algorithm/           # RL training algorithms
│   ├── cli/                # Command-line interface
│   ├── contrib/            # Third-party integrations
│   ├── runner/             # Execution runners
│   ├── store/              # LightningStore implementations
│   ├── tracer/             # Tracing backends
│   ├── types/              # Data models
│   └── semconv.py          # Semantic conventions
├── contrib/
│   └── recipes/            # Integration examples (webshop, agentos)
├── dashboard/              # React web application
├── docs/                   # Documentation (mkdocs)
├── examples/               # Runnable workflows
├── scripts/                # Automation scripts
└── tests/                  # Test suite

Source: CLAUDE.md:5-15

Development Workflow

Setup

uv sync --group dev

Testing

# Full test suite
uv run --no-sync pytest -v

# Specific tests
uv run --no-sync pytest -v tests/path/to/test.py
uv run --no-sync pytest -v -k "test_pattern"

Type Checking

uv run --no-sync pyright

Pre-commit Checks

uv run --no-sync pre-commit run --all-files --show-diff-on-failure

Documentation

uv run --no-sync mkdocs build --strict

Source: CLAUDE.md:18-30

Contributing

Agent Lightning welcomes contributions through a structured process:

  1. Branch naming: feature/<slug>, fix/<slug>, docs/<slug>, or chore/<slug>
  2. Commits: Imperative, scoped commits with issue references (e.g., Fixes #123)
  3. Pre-submission: Run pre-commit hooks and relevant pytest/doc builds
  4. CLA: Contributor License Agreement required (automatically prompted by CLA bot)

Source: README.md:50-70

Citation

If you use Agent Lightning in research, please cite:

@misc{luo2025agentlightningtrainai,
      title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
      author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
      year={2025},
      eprint={2508.03680},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.03680},
}

Source: README.md:15-25

Further Reading

Source: https://github.com/microsoft/agent-lightning / Human Manual

Installation Guide

Related topics: Introduction to Agent Lightning, Tutorial: Train Your First Agent

Section Related Pages

Continue reading this section for the full explanation and source context.

Section System Requirements

Continue reading this section for the full explanation and source context.

Section Required Tools

Continue reading this section for the full explanation and source context.

Section Method 1: Install from Source

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Agent Lightning, Tutorial: Train Your First Agent

Installation Guide

This guide covers all supported methods for installing and configuring Agent Lightning in your environment. Agent Lightning is a reinforcement learning framework for training AI agents, with support for GPU acceleration, distributed training, and various algorithm backends.

Prerequisites

System Requirements

ComponentMinimumRecommended
Python3.10+3.11 or 3.12
OSLinux (Ubuntu 20.04+), macOSLinux with CUDA
RAM8 GB32 GB+
GPUOptionalNVIDIA GPU with CUDA 11.8+
Disk Space5 GB20 GB+

Source: contrib/recipes/webshop/agl/requirements.txt:1-3

Required Tools

  • uv: Modern Python package manager (recommended)
  • Git: For cloning the repository
  • CUDA Toolkit (for GPU training): Version 11.8 or later

Installation Methods

Method 1: Install from Source

This is the recommended approach for development and contributing.

# Clone the repository
git clone https://github.com/microsoft/agent-lightning.git
cd agent-lightning

# Install all dependencies including development tools
uv sync --group dev

# Install optional GPU dependencies
uv sync --group GPU

Source: CLAUDE.md:20-22

Method 2: Install with Specific Algorithm Backends

Agent Lightning supports multiple reinforcement learning algorithms through optional dependency groups:

# Install with VERL backend (recommended for GPU training)
uv sync --group VERL

# Install with APO backend
uv sync --group APO

# Install with GPU optimizations
uv sync --group GPU

Source: CLAUDE.md:22

Method 3: Using setup.sh for GPU Training

For GPU-accelerated training with the webshop recipe:

# From the contrib/recipes/webshop directory
./setup.sh

This script installs VERL extras for GPU training support. Source: contrib/recipes/webshop/agl/requirements.txt:1-8

Dependency Groups

The pyproject.toml defines several optional dependency groups:

GroupPurposeInstallation Command
devDevelopment tools (pytest, pyright, pre-commit)uv sync --group dev
GPUGPU acceleration packagesuv sync --group GPU
VERLVERL algorithm backenduv sync --group VERL
APOAPO algorithm backenduv sync --group APO

Source: CLAUDE.md:20-23

Environment Setup

Creating a Virtual Environment

Using uv (recommended):

# Create and activate a new virtual environment
uv venv
source .venv/bin/activate  # Linux/macOS
# or
.venv\Scripts\activate     # Windows

Verifying Installation

Run the test suite to verify your installation:

# Run all tests
uv run --no-sync pytest -v

# Run specific test
uv run --no-sync pytest -v tests/path/to/test.py

# Run tests matching a pattern
uv run --no-sync pytest -v -k "test_pattern"

Source: CLAUDE.md:21

Type Checking

Verify type annotations are correct:

uv run --no-sync pyright

Source: CLAUDE.md:22

Pre-commit Hooks

Before committing code, run pre-commit checks:

uv run --no-sync pre-commit run --all-files --show-diff-on-failure

Source: CLAUDE.md:23

Dashboard Installation

The Agent Lightning Dashboard is a separate React application:

cd dashboard

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

Source: dashboard/README.md:npm scripts section

Dashboard npm Scripts

ScriptPurpose
devStart development server
buildBuild production bundle
previewPreview production build locally
storybookStart Storybook dev server
build-storybookBuild Storybook bundle
eslintRun ESLint
stylelintRun Stylelint
prettierRun Prettier
typecheckRun TypeScript typecheck
vitestRun vitest tests

Source: dashboard/README.md:npm scripts

Recipe-Specific Installation

Webshop Recipe

The webshop recipe has specific dependencies:

cd contrib/recipes/webshop/agl

# Install requirements
pip install -r requirements.txt

# For GPU training
./setup.sh

Required dependencies include:

  • pandas>=2.0.0 - Data manipulation
  • pyarrow>=14.0.0 - Parquet file support
  • rich>=13.0.0 - Terminal formatting
  • tqdm>=4.64.0 - Progress bars

Source: contrib/recipes/webshop/agl/requirements.txt:1-15

Development Workflow

Branching Conventions

Create feature branches from a fresh main:

Branch TypeNaming Convention
Featurefeature/<slug>
Fixfix/<slug>
Documentationdocs/<slug>
Maintenancechore/<slug>

Source: CLAUDE.md:8, AGENTS.md:8

Commit and PR Guidelines

  1. Write imperative, scoped commit messages
  2. Reference issues with Fixes #123
  3. Rerun pre-commit and relevant pytest/doc builds before pushing
  4. Include verification commands in PR descriptions
  5. Update documentation via mkdocs.yml or examples/README.md

Source: CLAUDE.md:9-13, AGENTS.md:9-13

GPU Configuration

For optimal GPU training performance:

  1. Install NVIDIA drivers (CUDA 11.8+)
  2. Install the GPU dependency group
  3. For VERL-based training, use uv sync --group GPU

GPU metrics are tracked via heartbeat statistics in worker nodes:

heartbeat_stats={"queue_depth": 2, "gpu_utilization": 0.82}

Source: dashboard/test-utils/python-server.py:Worker class

Troubleshooting

Common Issues

IssueSolution
uv command not foundInstall uv: pip install uv
CUDA not foundEnsure NVIDIA drivers and CUDA toolkit are installed
Import errorsRun uv sync to ensure all dependencies are installed
Type checking failuresRun uv run --no-sync pyright to identify issues

Source: CLAUDE.md:26-30

Lock File Updates

When dependencies change, commit the refreshed uv.lock:

git add uv.lock
git commit -m "chore: update lock file"

Source: CLAUDE.md:24

Next Steps

After installation:

  1. Explore Minimal Component Showcase to understand individual components
  2. Set up the LightningStore for trace storage
  3. Configure tracers for your agent execution
  4. Review the Algorithm Documentation for training options

Source: https://github.com/microsoft/agent-lightning / Human Manual

System Architecture

Related topics: Trainer Component, Runner Component, LightningStore

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Navigation Structure

Continue reading this section for the full explanation and source context.

Section Page Components

Continue reading this section for the full explanation and source context.

Section Worker Heartbeat Flow

Continue reading this section for the full explanation and source context.

Related topics: Trainer Component, Runner Component, LightningStore

System Architecture

Overview

Agent Lightning is a reinforcement learning framework for training AI agents, with a distributed system architecture that supports multi-worker training orchestration, resource management, and distributed tracing. The system consists of three primary layers: a Backend Training Engine, a State Store, and a Dashboard Frontend.

The architecture enables parallel training across multiple workers, centralized resource configuration, and real-time monitoring of training workflows through traces and metrics.

Source: dashboard/src/layouts/AppLayout.tsx:1-50

High-Level Architecture Components

The Agent Lightning system comprises the following core entities:

ComponentDescriptionKey Attributes
ResourcesConfiguration templates for prompts, models, and sampling parametersresources_id, version, resources (dict with PromptTemplate/LLM)
WorkersRunner processes that execute training rolloutsworker_id, status, heartbeat_stats, current_rollout_id
RolloutsComplete training episodes with multiple attemptsrollout_id, status, mode, attempts
AttemptsIndividual training attempts within a rolloutattempt_id, status, metrics
SpansDistributed tracing spans for observabilitytrace_id, span_id, status, attributes, start_time, end_time

Source: dashboard/test-utils/python-server.py:1-300

Frontend Dashboard Architecture

The dashboard is a React-based frontend built with Mantine UI components that communicates with the backend via REST APIs.

Navigation Structure

The application uses a sidebar navigation layout with the following sections:

graph TD
    A[AppLayout] --> B[Navbar]
    A --> C[Main Content Area]
    B --> D[Rollouts]
    B --> E[Workers]
    B --> F[Resources]
    B --> G[Traces]
    B --> H[Settings]
    C --> I[Outlet Component]

Source: dashboard/src/layouts/AppLayout.tsx:20-50

Page Components

PageFile PathPurpose
Rolloutsdashboard/src/pages/Rollouts.page.tsxDisplay and manage training rollouts with status filtering
Workersdashboard/src/pages/Workers.page.tsxMonitor worker health and current assignments
Resourcesdashboard/src/pages/Resources.page.tsxView and manage configuration resources
Tracesdashboard/src/components/TracesTable.component.tsxAnalyze distributed tracing spans

Source: dashboard/src/pages/Rollouts.page.tsx:1-80

Data Flow Architecture

Worker Heartbeat Flow

Workers periodically send heartbeat signals to indicate their operational state. The dashboard monitors these heartbeats to determine worker availability.

sequenceDiagram
    participant W as Worker
    participant S as Store
    participant D as Dashboard
    
    W->>S: Heartbeat (status, queue_depth, gpu_utilization)
    S->>S: Update last_heartbeat_time
    D->>S: Poll /workers endpoint
    S-->>D: Worker list with status

Source: dashboard/test-utils/python-server.py:100-150

Rollout Execution Flow

Training rollouts follow a multi-attempt execution model:

graph LR
    A[Rollout Created] --> B[Attempt 1]
    B --> C{Success?}
    C -->|Yes| D[Rollout Complete]
    C -->|No| E[Attempt 2]
    E --> F{Success?}
    F -->|Yes| D
    F -->|No| G[Attempt N]
    G --> H[Max Attempts Reached]

Source: dashboard/src/components/TracesTable.component.tsx:50-150

Core Entity Schemas

Resources Entity

Resources define reusable configuration templates used by workers during training.

FieldTypeDescription
resources_idstringUnique identifier for the resource
versionintegerVersion number for tracking changes
create_timetimestampCreation timestamp
update_timetimestampLast modification timestamp
resourcesdictConfiguration dictionary (PromptTemplate, LLM configs)

Source: dashboard/test-utils/python-server.py:50-100

Workers Entity

FieldTypeDescription
worker_idstringUnique worker identifier
statusenumCurrent status: idle, busy, offline
heartbeat_statsdictMetrics including queue_depth, gpu_utilization
last_heartbeat_timetimestampTime of last heartbeat
current_rollout_idstringCurrently assigned rollout (if busy)
current_attempt_idstringCurrently executing attempt

Source: dashboard/src/components/AppDrawer.component.tsx:1-60

Spans Entity (Distributed Tracing)

FieldTypeDescription
rollout_idstringAssociated rollout
attempt_idstringAssociated attempt
trace_idstringDistributed trace identifier
span_idstringUnique span identifier
parent_idstringParent span ID for hierarchy
namestringOperation name (e.g., classification_pipeline)
statusTraceStatusStatus with status_code (OK, ERROR) and description
attributesdictKey-value metadata (model, batch_size, accuracy)
start_timetimestampSpan start time
end_timetimestampSpan end time

Source: dashboard/src/components/TracesTable.component.tsx:50-120

Component Architecture (Frontend)

Table Components Pattern

The dashboard uses a consistent table component pattern across all pages:

graph TD
    A[Page Component] --> B[Table Component]
    B --> C[Column Definitions]
    B --> D[Filtering Logic]
    B --> E[Pagination Controls]
    A --> F[useQuery Hook]
    F --> G[API Endpoints]
ComponentPropsPurpose
RolloutTablerollouts, totalRecords, statusFilters, onViewTracesTraining rollout display
WorkersTableworkers, onShowDetailsWorker monitoring
ResourcesTableresourcesList, renderRowExpansionResource configuration
TracesTablespans, onShowSpanDetailTrace analysis

Source: dashboard/src/components/WorkersTable.component.tsx:1-80

Drawer Container Pattern

The application uses an AppDrawerContainer for displaying detailed information:

graph TD
    A[AppDrawerContainer] --> B[Redux State]
    B --> C{Content Type}
    C -->|worker-detail| D[WorkerDrawerTitle]
    C -->|rollout-detail| E[RolloutDrawer]
    C -->|span-detail| F[SpanDetailDrawer]
    D --> G[ConnectionIndicator]
    G --> H[baseUrl, status, isRefreshing]

Source: dashboard/src/components/AppDrawer.component.tsx:60-120

State Management

The frontend uses Redux for state management with the following key selectors:

SelectorPurpose
selectConfigApplication configuration (baseUrl, autoRefreshMs)
selectDrawerIsOpenDrawer visibility state
selectDrawerContentCurrent drawer content type and data
selectConnectionStateBackend connection status

Source: dashboard/src/layouts/AppLayout.tsx:50-80

Connection Management

The dashboard includes a ConnectionIndicator component that displays the connection status to the backend:

StatusDescription
connectedSuccessfully connected to backend
disconnectedCannot reach backend
refreshingActively reconnecting

Source: dashboard/src/layouts/AppLayout.tsx:40-45

Training Workflow Integration

Status Lifecycle

Rollouts and attempts follow a defined status lifecycle:

StatusDescription
pendingInitial state, not yet started
runningCurrently executing
succeededCompleted successfully
failedExecution failed
cancelledManually cancelled

Mode Types

ModeDescription
trainTraining mode with gradient updates
evalEvaluation mode without updates
inferenceProduction inference mode

Source: dashboard/src/pages/Rollouts.page.tsx:30-60

Observability Architecture

Trace Hierarchy

Traces are organized in a hierarchical structure:

Trace
└── Span (root)
    ├── Span (child - preprocess)
    ├── Span (child - classifier)
    └── Span (child - formatter)

Each span captures:

  • Execution timing (start_time, end_time, duration)
  • Status and error information
  • Custom attributes (model, batch_size, accuracy)
  • Resource metadata (service name)

Source: dashboard/test-utils/python-server.py:200-300

Attribute Keys

Common span attributes include:

AttributeExample ValueDescription
typeclassificationOperation type
modelbert-classifierModel used
batch_size10Processing batch size
accuracy0.95Achieved accuracy
timeouttrueWhether operation timed out
retrytrueWhether this was a retry attempt

Source: dashboard/src/components/TracesTable.component.tsx:30-50

Resource Configuration Templates

Resources support multiple template engines:

EngineSyntaxExample
f-string{variable}"Classify: {ticket}"
jinja{{ variable }} or {% for %}"{% for r in results %}{{ r }}{% endfor %}"

Source: dashboard/test-utils/python-server.py:50-90

Summary

The Agent Lightning system architecture provides:

  1. Distributed Training - Multiple workers executing rollouts in parallel
  2. Centralized Configuration - Versioned resource templates for prompts and models
  3. Real-time Monitoring - Worker heartbeat tracking and status dashboards
  4. Full Observability - Distributed tracing with hierarchical spans
  5. State Persistence - Store-based architecture for maintaining system state

The architecture is designed for horizontal scalability, allowing additional workers to be added to increase training throughput while maintaining centralized configuration management and monitoring through the dashboard frontend.

Source: https://github.com/microsoft/agent-lightning / Human Manual

Core Abstractions and Data Models

Related topics: System Architecture, Trainer Component

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Task Representation

Continue reading this section for the full explanation and source context.

Section Rollout Lifecycle

Continue reading this section for the full explanation and source context.

Section Attempt Model

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Trainer Component

Core Abstractions and Data Models

The Agent Lightning framework relies on a set of foundational abstractions and data models that enable the coordination between runners, tracers, the LightningStore, and training algorithms. These core types are defined in agentlightning/types/ and serve as the canonical data structures used throughout the system for representing tasks, rollouts, attempts, traces, and resources.

Architecture Overview

Agent Lightning operates through a continuous execution loop where multiple components interact. The core abstractions facilitate:

  1. Trace Emission - Runners and tracers emit spans during execution
  2. State Synchronization - LightningStore maintains synchronized state
  3. Algorithm Consumption - Training algorithms in agentlightning/algorithm/ consume traces to improve agent behavior
graph TD
    A[Runners] -->|emit spans| B[Tracers]
    B --> C[LightningStore]
    C --> D[Algorithms]
    D -->|improve behavior| A
    C --> E[Dashboard]
    F[Resources] -->|configure| A

Source: CLAUDE.md

Task and Rollout Models

Task Representation

The Task and related classes define the fundamental unit of work in Agent Lightning. Tasks represent the objectives that agents attempt to accomplish during training and evaluation.

ClassPurpose
TaskCore task definition containing input and configuration
TaskInputInput data passed to a task
TaskIfAnyConditional task input supporting optional parameters
DatasetCollection of tasks for batch processing

Source: agentlightning/types/core.py:1-50

Rollout Lifecycle

Rollouts represent complete execution attempts of a task. The rollout model captures the entire lifecycle from enqueue to completion.

stateDiagram-v2
    [*] --> Enqueued: EnqueueRolloutRequest
    Enqueued --> InProgress: Runner picks up
    InProgress --> Attempted: First attempt completes
    Attempted --> InProgress: Retry triggered
    InProgress --> [*]: Final attempt
    Attempted --> [*]: Success/Failure
ClassDescription
RolloutRepresents a single task execution instance
RolloutConfigConfiguration for rollout execution
RolloutModeExecution mode (training, evaluation, etc.)
RolloutStatusCurrent state of the rollout

Source: agentlightning/types/core.py:50-100

Attempt Model

Attempts represent individual tries within a rollout, enabling retry mechanisms and granular progress tracking.

PropertyTypeDescription
attempt_idstrUnique identifier for the attempt
rollout_idstrParent rollout identifier
statusAttemptStatusCurrent attempt status
sequence_idintOrder within the rollout

Source: agentlightning/types/core.py:100-150

AttemptedRollout

The AttemptedRollout class aggregates results from all attempts within a rollout:

class AttemptedRollout(BaseModel):
    rollout: Rollout
    attempts: List[Attempt]
    # Aggregated metrics and results

Source: agentlightning/types/core.py:150-180

Tracing Abstractions

OpenTelemetry Integration

Agent Lightning uses OpenTelemetry for distributed tracing. The tracer types provide serialization and interoperability with the broader observability ecosystem.

ClassPurpose
SpanSingle unit of work in a trace
SpanCoreFieldsCore fields shared across span implementations
OtelResourceSerializable OpenTelemetry resource representation
TraceStatusSpan completion status with error information

Source: agentlightning/types/tracer.py:1-80

Span Structure

Spans form the atomic tracing unit, capturing timing, status, attributes, and relationships:

graph LR
    subgraph Span
        A[name] --> B[status]
        B --> C[attributes]
        C --> D[start_time/end_time]
        D --> E[parent_id/span_id]
        E --> F[resource]
    end
AttributeDescription
nameHuman-readable span identifier
statusTraceStatus with status_code and optional description
attributesKey-value metadata dictionary
parent_idReference to parent span (None for root)
resourceOtelResource containing service metadata

Source: agentlightning/types/tracer.py:80-120

OtelResource Model

The OtelResource class provides a serializable representation of OpenTelemetry resources:

class OtelResource(BaseModel):
    attributes: Attributes
    schema_url: str

This model avoids confusion with the application's Resource class and enables span serialization for store persistence.

Source: agentlightning/types/tracer.py:120-150

Span Creation Patterns

#### SpanCoreFields for Lightweight Creation

For span creators that don't require the full span model, SpanCoreFields provides a minimal interface:

class SpanCoreFields(BaseModel):
    name: str
    status: TraceStatus
    attributes: Attributes
    start_time: Optional[float]
    end_time: Optional[float]

Source: agentlightning/types/tracer.py:150-180

#### Weave Tracer Span Creation

The Weave tracer implementation demonstrates proper span construction with resource attributes:

resource=OtelResource(
    attributes={
        LightningResourceAttributes.ROLLOUT_ID.value: rollout_id,
        LightningResourceAttributes.ATTEMPT_ID.value: attempt_id,
        LightningResourceAttributes.SPAN_SEQUENCE_ID.value: sequence_id,
        LightningResourceAttributes.TRACER_NAME.value: "weave",
    },
    schema_url="",
)

Source: agentlightning/tracer/weave.py:1-50

Resource Management

ResourcesUpdate Model

Resources define configurable components that can be versioned and updated:

class ResourcesUpdate(BaseModel):
    resources_id: str
    version: int
    create_time: float
    update_time: float
    resources: Dict[str, Any]
FieldTypeDescription
resources_idstrUnique identifier for the resource set
versionintVersion number for optimistic concurrency
create_timefloatUnix timestamp of creation
update_timefloatUnix timestamp of last update
resourcesDict[str, Any]Arbitrary resource configuration

Source: dashboard/test-utils/python-server.py:1-80

Resource Types

Resources support flexible configuration through templates and model definitions:

Resource TypeDescription
PromptTemplateTemplated prompts with jinja2 or f-string engines
LLMLanguage model configuration with endpoint and sampling parameters
Custom Dict[str, Any]Arbitrary configuration dictionaries

Source: dashboard/test-utils/python-server.py:80-150

Worker Abstraction

Workers represent execution agents that process rollouts:

classDiagram
    class Worker {
        +worker_id: str
        +status: WorkerStatus
        +heartbeat_stats: Dict
        +last_heartbeat_time: float
        +current_rollout_id: Optional[str]
        +current_attempt_id: Optional[str]
    }
PropertyTypeDescription
worker_idstrUnique worker identifier
statusWorkerStatusCurrent status (busy, idle, etc.)
heartbeat_statsDictRuntime metrics (queue_depth, gpu_utilization)
last_heartbeat_timefloatLast check-in timestamp
current_rollout_idOptional[str]Currently executing rollout

Source: agentlightning/types/core.py:180-220

Worker Status States

stateDiagram-v2
    [*] --> Idle: Startup
    Idle --> Busy: Dequeue rollout
    Busy --> Idle: Complete
    Busy --> Busy: Heartbeat
    Idle --> [*]: Shutdown
    Busy --> [*]: Shutdown

Source: dashboard/test-utils/python-server.py:150-200

Filtering and Pagination

Query Models

The store supports filtered and paginated queries for efficient data access:

ClassPurpose
FilterOptionsCriteria for filtering results
FilterFieldIndividual filter condition
SortOptionsSorting configuration
PaginatedResultPaginated response wrapper

Source: agentlightning/types/core.py:220-260

Operation Context

The @operation decorator provides a simplified span creation interface for user code:

@operation(name="my_operation")
async def my_function():
    # Automatically creates and manages a span
    pass

OperationContext Parameters

ParameterTypeDescription
propagateboolWhether spans should use active span processor
nameOptional[str]Alias populating OPERATION_NAME attribute

The decorator supports two usage patterns:

  1. As a bare decorator: @operation
  2. As a context manager factory: with operation(name="custom"):

Source: agentlightning/emitter/annotation.py:1-60

Data Flow Summary

graph TD
    subgraph Input
        A[Dataset] --> B[Task]
        B --> C[EnqueueRolloutRequest]
    end
    
    subgraph Execution
        C --> D[Runner]
        D --> E[Worker]
        E --> F[Attempt]
        F --> G[Span]
    end
    
    subgraph Storage
        G --> H[LightningStore]
        H --> I[PaginatedResult]
    end
    
    subgraph Training
        H --> J[Algorithm]
        J --> K[Improved Policy]
    end

Key Type Exports

The agentlightning/types/core.py module exports the following public API:

__all__ = [
    "Triplet",
    "RolloutLegacy",
    "Task",
    "TaskInput",
    "TaskIfAny",
    "RolloutRawResultLegacy",
    "RolloutRawResult",
    "RolloutMode",
    "GenericResponse",
    "ParallelWorkerBase",
    "Dataset",
    "AttemptStatus",
    "RolloutStatus",
    "RolloutConfig",
    "Rollout",
    "Attempt",
    "AttemptedRollout",
    "EnqueueRolloutRequest",
    "Hook",
    "Worker",
    "WorkerStatus",
    "PaginatedResult",
    "FilterOptions",
    "SortOptions",
    "FilterField",
]

Source: agentlightning/types/core.py:40-60

Usage Patterns

Creating a Rollout Request

request = EnqueueRolloutRequest(
    task_id="task-001",
    config=RolloutConfig(mode=RolloutMode.TRAINING),
    priority=1
)

Querying with Filters

filters = FilterOptions(
    fields=[FilterField(name="status", operator="eq", value="completed")],
    sort=SortOptions(field="create_time", direction="desc"),
    offset=0,
    limit=50
)

Source: agentlightning/types/core.py:260-300

Source: https://github.com/microsoft/agent-lightning / Human Manual

Tutorial: Train Your First Agent

Related topics: Tutorial: Writing Agents, Algorithm Zoo

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Supported Agent Components

Continue reading this section for the full explanation and source context.

Section Resource Types

Continue reading this section for the full explanation and source context.

Section Trainer Configuration Parameters

Continue reading this section for the full explanation and source context.

Related topics: Tutorial: Writing Agents, Algorithm Zoo

Tutorial: Train Your First Agent

Overview

This tutorial guides you through training your first AI agent using Agent Lightning's reinforcement learning framework. You will learn how to set up a training pipeline, define prompts and resources, create a dataset, and run the APO (Agent Prompt Optimization) algorithm to improve your agent's behavior through feedback-driven learning.

Agent Lightning provides a complete training loop where runners and tracers emit spans, LightningStore keeps them synchronized, and algorithms consume those traces to improve behavior. Source: CLAUDE.md

Prerequisites

Before starting this tutorial, ensure you have:

  • Python 3.10+ installed
  • Agent Lightning installed following the installation guide
  • An OpenAI-compatible API service available
  • APO extra dependencies installed

Architecture Overview

Agent Lightning trains agents through a continuous feedback loop:

graph TD
    A[Runner - Executes Agent] --> B[Tracer - Emits Spans]
    B --> C[LightningStore - Synchronizes Data]
    C --> D[Algorithm - Consumes Traces]
    D --> E[Improved Agent Behavior]
    E --> A
    
    F[Dataset - Training Data] --> D
    G[Resources - Prompts/Models] --> A

Source: CLAUDE.md

Step 1: Create Your Agent

Begin by defining a simple room booking agent that uses function calling. The agent receives a user request and selects an appropriate room from available options.

# examples/apo/room_selector.py

from agentlightning import Runner, DataProto
from typing import Any
import json

class RoomSelector(Runner):
    """Room booking agent using function calling."""

    def run(self, task: str, context: dict | None = None) -> DataProto:
        # Define available rooms
        rooms = [
            {"id": "R001", "name": "Conference A", "capacity": 10},
            {"id": "R002", "name": "Meeting Room B", "capacity": 4},
            {"id": "R003", "name": "Board Room", "capacity": 20},
        ]
        
        # Mock LLM response selecting a room
        selected_room = rooms[1]  # Default to Meeting Room B
        
        return DataProto(
            data={
                "selected_room": selected_room["name"],
                "room_id": selected_room["id"],
            },
            raw_response=json.dumps(selected_room),
        )

Source: examples/apo/room_selector.py

Supported Agent Components

ComponentDescriptionUsage
RunnerBase class for agent executionExtend to define custom agent logic
TrainerTraining orchestrationManages training loop and workers
LightningStoreData synchronizationStores traces and spans
OtelTracerOpenTelemetry span emissionRecords execution traces

Source: examples/apo/apo_debug.py

Step 2: Prepare Your Dataset

Create a training dataset with room booking scenarios. Each task should include the user request and expected room selection.

# examples/apo/room_selector_apo.py

from datasets import load_dataset

def create_room_dataset():
    """Create dataset for room booking tasks."""
    
    # Example tasks for room booking
    tasks = [
        {
            "task": "I need to schedule a meeting for 3 people tomorrow at 2 PM",
            "expected_room": "Meeting Room B",
        },
        {
            "task": "We are hosting a team event for 15 team members",
            "expected_room": "Board Room",
        },
        {
            "task": "Quick 1-on-1 sync needed this afternoon",
            "expected_room": "Meeting Room B",
        },
    ]
    
    return tasks

Source: examples/apo/room_selector_apo.py

Step 3: Define Training Resources

Resources define the prompts and model configurations used by your agent during training. You can tune any resource—typically prompt templates—using reinforcement learning.

from agentlightning.prompts import PromptTemplate
from agentlightning.models import LLM

# Define a tunable prompt template
main_prompt = PromptTemplate(
    template="""You are a helpful assistant that helps users book meeting rooms.
    
    Available rooms:
    - Conference A: capacity 10
    - Meeting Room B: capacity 4
    - Board Room: capacity 20
    
    User request: {user_request}
    
    Select the most appropriate room and explain your choice.""",
    engine="f-string",
)

Source: examples/apo/apo_debug.py

Resource Types

TypeDescriptionTunable
PromptTemplateText templates with variable substitutionYes
LLMModel configuration (endpoint, sampling params)No
SystemPromptSystem-level instructionsYes
SamplingParametersTemperature, top_p, max_tokensNo

Source: examples/apo/README.md

Step 4: Configure the Trainer

The Trainer class orchestrates the training loop. It manages workers, coordinates with the LightningStore, and applies the optimization algorithm.

from agentlightning import Trainer

# Initialize trainer with one worker
trainer = Trainer(
    n_workers=1,
    # Resources to tune - only these will be optimized
    initial_resources={
        "main_prompt": main_prompt,
    },
)

# Configure the APO algorithm
trainer.configure(
    algorithm="APO",
    lr=1e-3,
    epochs=10,
)

Source: examples/apo/apo_debug.py

Trainer Configuration Parameters

ParameterTypeDefaultDescription
n_workersint1Number of parallel training workers
initial_resourcesdictRequiredResources to optimize
algorithmstrRequiredOptimization algorithm name
lrfloat1e-3Learning rate
epochsint10Number of training epochs

Source: examples/apo/apo_debug.py

Step 5: Implement Reward Function

The reward function evaluates agent outputs and provides feedback signals for reinforcement learning.

from typing import Any

def room_booking_reward(output: Any, expected: dict) -> float:
    """
    Calculate reward based on room selection accuracy.
    
    Args:
        output: Agent's room selection
        expected: Expected room from dataset
    
    Returns:
        float: Reward score between 0.0 and 1.0
    """
    if not output or not output.data:
        return 0.0
    
    selected_room = output.data.get("selected_room", "")
    expected_room = expected.get("expected_room", "")
    
    # Exact match gets full reward
    if selected_room == expected_room:
        return 1.0
    
    # Partial match gets partial reward
    if expected_room.lower() in selected_room.lower():
        return 0.5
    
    return 0.0

Source: examples/apo/room_selector_apo.py

Step 6: Run the Training Loop

Execute the training with your runner, dataset, and reward function.

import asyncio
from agentlightning import setup_logging

async def train_room_selector():
    setup_logging()
    
    # Initialize agent and trainer
    agent = RoomSelector()
    dataset = create_room_dataset()
    
    trainer = Trainer(
        n_workers=1,
        initial_resources={"main_prompt": main_prompt},
    )
    
    # Run training
    results = await trainer.train(
        runner=agent,
        dataset=dataset,
        reward_fn=room_booking_reward,
        max_iterations=100,
    )
    
    print(f"Training completed: {results}")

if __name__ == "__main__":
    asyncio.run(train_room_selector())

Source: examples/apo/apo_debug.py

Understanding the Training Flow

sequenceDiagram
    participant User as User Code
    participant Trainer as Trainer
    participant Runner as RoomSelector
    participant Store as LightningStore
    participant Algo as APO Algorithm
    
    User->>Trainer: train(runner, dataset, reward_fn)
    Trainer->>Runner: execute_task(task)
    Runner->>Runner: select_room()
    Runner-->>Trainer: output
    Trainer->>Store: record_span(rollout_id, attempt_id)
    Trainer->>Trainer: calculate_reward(output, expected)
    Trainer->>Algo: optimize_step(rewards, traces)
    Algo-->>Trainer: updated_resources
    Trainer->>Runner: update_resources()
    Note over Trainer,Algo: Repeat for max_iterations

Debugging Your Training

Agent Lightning provides multiple debugging approaches:

Approach 1: Runner Mode

Direct execution without training to verify agent logic:

python apo_debug.py --mode runner

Source: examples/apo/apo_debug.py

Approach 2: Hook Mode

Debug with tracing hooks enabled:

python apo_debug.py --mode hook

Approach 3: Trainer Mode

Full training debug with detailed logging:

python apo_debug.py --mode trainer

Viewing Training Traces

During and after training, spans are recorded to the LightningStore. View them in the dashboard:

graph LR
    A[Training Run] --> B[Spans Emitted]
    B --> C[LightningStore]
    C --> D[Dashboard]
    D --> E[Trace Visualization]
    D --> F[Span Details]

The dashboard displays:

ViewDescription
RolloutsComplete training iterations
SpansIndividual function calls and operations
ResourcesTunable prompt templates
MetricsReward scores and training statistics

Source: examples/minimal/README.md

Common Issues and Solutions

Issue: Tracer Conflicts

Running multiple modes consecutively in one process may cause tracer conflicts.

Solution: Run each mode in a separate process or ensure proper tracer cleanup between runs.

Source: examples/apo/apo_debug.py

Issue: Missing Dependencies

APO requires additional dependencies not in the core installation.

Solution: Install with extras:

pip install agentlightning[apo]

Source: examples/apo/README.md

Next Steps

After completing this tutorial:

  1. Advanced Algorithms: Explore custom algorithms in apo_custom_algorithm.py
  2. Integration: Learn Agent-OS integration for policy-aware training
  3. Dashboard: Use the dashboard to visualize training progress
  4. Production: Scale training with multiple workers and distributed execution

Summary

This tutorial covered the essential steps to train your first agent with Agent Lightning:

  • Define a Runner implementing your agent logic
  • Prepare a dataset with tasks and expected outputs
  • Configure PromptTemplate resources for tuning
  • Implement a reward function for RL feedback
  • Use Trainer to orchestrate the training loop
  • Debug with multiple modes and visualize traces in the dashboard

The training loop continuously improves your agent by optimizing prompt resources based on reward signals, enabling agents to learn from feedback without manual prompt engineering.

Source: https://github.com/microsoft/agent-lightning / Human Manual

Tutorial: Writing Agents

Related topics: Tutorial: Train Your First Agent, Runner Component

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Basic Agent Structure

Continue reading this section for the full explanation and source context.

Section Setting Up the Tracer

Continue reading this section for the full explanation and source context.

Related topics: Tutorial: Train Your First Agent, Runner Component

Tutorial: Writing Agents

This tutorial provides a comprehensive guide to building AI agents using the Agent Lightning framework. It covers the core concepts, architecture, and practical implementation patterns for creating agents that can be trained with reinforcement learning.

Overview

Agent Lightning is a framework designed to train AI agents using reinforcement learning. The framework provides a complete execution stack including tracing, storage, and algorithm components that work together in a continuous loop. Source: CLAUDE.md:1-5

Agents in this framework are built using the LightningStore architecture, which synchronizes data between runners, tracers, and algorithms. The tracers emit spans that capture the agent's execution behavior, and these spans are consumed by algorithms to improve the agent's performance over time. Source: AGENTS.md:1-5

Architecture Overview

The Agent Lightning framework follows a continuous loop architecture where multiple components interact to enable training of AI agents.

graph TD
    A[Agent / Runner] -->|Emits Spans| B[Tracer]
    B -->|Traces| C[LightningStore]
    C -->|Synchronized Data| D[Algorithms]
    D -->|Training Signals| A
    E[Dashboard] -->|Inspect & Debug| C

Core Components

ComponentPurposeLocation
LightningStoreCentral data store for traces and rolloutsagentlightning/store/
OtelTracerOpenTelemetry-based span emissionVia OtelTracer class
AgentOpsTracerAgentOps integration for tracingVia AgentOpsTracer class
SpanIndividual trace unitData model
emit_rewardReward signal emissionAPI function

Source: examples/minimal/write_traces.py:1-40

Writing Your First Agent

Basic Agent Structure

An agent in Agent Lightning is built around the tracing and store infrastructure. The minimal component showcase in examples/minimal/ demonstrates how individual building blocks behave in isolation. Source: examples/minimal/README.md:1-10

Setting Up the Tracer

The framework supports two primary tracing mechanisms:

  1. OtelTracer: OpenTelemetry-based tracing that can forward spans to a remote store client
  2. AgentOpsTracer: AgentOps integration for agent operations tracking
from agentlightning import OtelTracer, LightningStoreClient, setup_logging

# Initialize logging
setup_logging()

# Create tracer with optional remote store client
tracer = OtelTracer(
    rollout_id="ro-001",
    attempt_id="at-001",
    store_client=None  # Or LightningStoreClient(endpoint="...")
)

Source: examples/minimal/write_traces.py:40-60

Opening Rollouts and Emitting Spans

Rollouts represent a single execution attempt of an agent, and attempts within rollouts allow for retry logic and tracking.

# Open a new rollout
tracer.open_rollout(rollout_id="ro-001", user_id="user-123")

# Open an attempt within the rollout
tracer.open_attempt(attempt_id="at-001", sequence_id=1)

# Emit spans during agent execution
tracer.emit_span(
    name="tool_execution",
    attributes={
        "tool": "web_search",
        "query": "onboarding summary"
    }
)

# Close attempt and rollout
tracer.close_attempt()
tracer.close_rollout()

Source: examples/minimal/write_traces.py:60-85

Span Data Model

Spans are the fundamental unit of tracing in Agent Lightning. Each span captures a discrete unit of work within an agent's execution.

Span Attributes

AttributeTypeDescription
rollout_idstringUnique identifier for the rollout
attempt_idstringUnique identifier for the attempt
sequence_idintegerOrder of the span within the attempt
trace_idstringTrace grouping identifier
span_idstringUnique span identifier
parent_idstringParent span ID for hierarchy
namestringHuman-readable span name
statusTraceStatusExecution status (OK, ERROR)
attributesdictKey-value metadata
start_timedatetimeSpan start timestamp
end_timedatetimeSpan end timestamp

Source: dashboard/test-utils/python-server.py:1-100

Example Span Creation

from agentlightning import Span, TraceStatus
from datetime import datetime

span = Span(
    rollout_id="ro-story-001",
    attempt_id="at-story-010",
    sequence_id=3,
    trace_id="trace-001-main",
    span_id="span-003-tool",
    parent_id="span-001-root",
    name="tool_execution",
    status=TraceStatus(status_code="OK", description=None),
    attributes={"tool": "web_search", "query": "onboarding summary"},
    events=[],
    links=[],
    start_time=datetime.now(),
    end_time=datetime.now(),
    context=None,
    parent=None,
    resource=OtelResource(attributes={"service.name": "tool-service"}, schema_url="")
)

Source: dashboard/test-utils/python-server.py:100-130

Using Operations

The framework provides an operation decorator for recording synthetic operation spans with additional linking capabilities.

from agentlightning.operation import operation
from agentlightning.utils.otel import make_link_attributes, make_tag_attributes

# Record an operation span
@operation(name="classify_ticket")
def classify_ticket(ticket: str):
    with make_link_attributes(linked_rollout_id="ro-001", linked_attempt_id="at-001"):
        # Operation execution
        result = llm.classify(ticket)
    
    # Tag the reward
    make_tag_attributes(tags={"accuracy": 0.95})
    emit_reward(reward=0.95, name="classification_accuracy")
    
    return result

Source: examples/minimal/write_traces.py:20-35

LightningStore Integration

The LightningStore keeps tracers and runners synchronized, serving as the central data repository.

from agentlightning.store import InMemoryLightningStore

# Use in-memory store for local development
store = InMemoryLightningStore()

# Or connect to a remote store server
store = LightningStoreClient(endpoint="http://localhost:45993")

Source: examples/minimal/write_traces.py:25-35

Store Server CLI

Start a LightningStore server with OTLP enabled:

agl store --port 45993 --log-level DEBUG

Source: examples/minimal/write_traces.py:15-20

Workflow Execution Model

Agents in Agent Lightning follow a structured execution model with rollouts, attempts, and spans.

graph LR
    subgraph Rollout[Rollout: ro-001]
        subgraph Attempt1[Attempt: at-001]
            S1[Span: root]
            S2[Span: preprocess]
            S3[Span: classify]
            S1 --> S2
            S2 --> S3
        end
        subgraph Attempt2[Attempt: at-002]
            S4[Span: root]
            S5[Span: preprocess]
            S6[Span: classify]
            S4 --> S5
            S5 --> S6
        end
    end

State Transitions

StateDescription
pendingRollout/attempt created but not started
runningCurrently executing
completedSuccessfully finished
failedExecution failed
cancelledManually cancelled

Source: dashboard/src/components/RolloutTable.component.tsx:1-50

Reward Emission

Agents emit reward signals that algorithms consume during training.

from agentlightning import emit_reward

# Emit a reward with metadata
emit_reward(
    reward=0.85,
    name="task_success",
    attributes={
        "task_id": "classification",
        "accuracy": 0.85,
        "latency_ms": 150
    }
)

Reward Span Attributes

AttributeTypeDescription
reward.valuefloatNumeric reward value
reward.namestringReward signal identifier
reward.attributesdictAdditional metadata

Dashboard Integration

The Agent Lightning Dashboard provides real-time inspection of store data and debugging capabilities for running experiments. Source: dashboard/README.md:1-10

Drawer Components

The dashboard uses drawer components to display detailed information:

// Worker detail drawer
if (content.type === 'worker-detail') {
    const { worker } = content;
    const title = <WorkerDrawerTitle worker={worker} />;
    const body = <JsonEditor value={worker} />;
    return { title, body };
}

// Trace detail drawer
if (content.type === 'trace-detail') {
    const { span } = content;
    const title = <TraceDrawerTitle span={span} />;
    const body = <JsonEditor value={span} />;
    return { title, body };
}

Source: dashboard/src/components/AppDrawer.component.tsx:1-50

Minimal Examples Reference

The examples/minimal/ directory provides documented examples for each building block:

ComponentFilePurpose
LightningStore + OTLPwrite_traces.pyShows OtelTracer and AgentOpsTracer for rollouts and spans
MultiMetricswrite_metrics.pyConsole and Prometheus metrics backends
LLM Proxyingllm_proxy.pyRequest routing through /rollout/<id>/attempt/<id> namespaces
vLLM Lifecyclevllm_server.pyContext manager for vLLM server lifecycle

Source: examples/minimal/README.md:10-30

Best Practices

  1. Use descriptive span names: Names like tool_execution and classification_pipeline make debugging easier in the dashboard.
  2. Set appropriate parent IDs: Maintain span hierarchy for better trace visualization.
  3. Emit rewards consistently: Use emit_reward after each task completion to enable algorithm training.
  4. Handle failures explicitly: Set appropriate TraceStatus codes and descriptions for failed spans.
  5. Use operations for complex workflows: The @operation decorator simplifies recording complex multi-step processes.

Next Steps

Source: https://github.com/microsoft/agent-lightning / Human Manual

Trainer Component

Related topics: Runner Component, LightningStore, Algorithm Zoo

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Interactions

Continue reading this section for the full explanation and source context.

Section Training Loop Flow

Continue reading this section for the full explanation and source context.

Section Constructor Parameters

Continue reading this section for the full explanation and source context.

Related topics: Runner Component, LightningStore, Algorithm Zoo

Trainer Component

The Trainer is the core orchestration component in Agent Lightning responsible for managing the reinforcement learning training loop. It coordinates runners, algorithms, and the LightningStore to execute agent training with scalable execution strategies.

Overview

The Trainer serves as the central control plane that:

  • Manages worker processes for parallel rollout execution
  • Coordinates between the agent runner and learning algorithm
  • Persists training traces to the LightningStore
  • Provides pluggable execution strategies for different deployment scenarios

Source: agentlightning/trainer/registry.py:1-6

Architecture

Component Interactions

graph TD
    T[Trainer] --> R[Runner<br/>Agent Execution]
    T --> A[Algorithm<br/>Policy Update]
    T --> S[LightningStore<br/>Trace Storage]
    T --> E[ExecutionStrategy]
    
    E --> SHM[SharedMemory<br/>Local Workers]
    E --> CS[ClientServer<br/>Remote Workers]
    
    R --> S
    A --> S

Training Loop Flow

sequenceDiagram
    participant T as Trainer
    participant R as Runner
    participant S as LightningStore
    participant A as Algorithm
    
    T->>R: Initialize with config
    T->>A: Load algorithm
    T->>S: Connect store
    
    loop Training Steps
        T->>R: Execute rollouts
        R->>S: Emit spans
        T->>S: Retrieve traces
        T->>A: Process traces
        A->>T: Policy update
    end

Core Configuration

Constructor Parameters

ParameterTypeDefaultDescription
n_workersint1Number of parallel worker processes
algorithm`Algorithm \str`NoneLearning algorithm (name or instance)
runner`Runner \None`NoneAgent runner for execution
reward_fn`RewardFn \None`NoneReward function for training
execution_strategystr"shm"Strategy: "shm", "cs"

Source: examples/apo/apo_custom_algorithm_trainer.py:35-37

Execution Strategy Registry

The Trainer supports multiple execution strategies through a registry pattern:

ExecutionStrategyRegistry = {
    "shm": "agentlightning.execution.shared_memory.SharedMemoryExecutionStrategy",
    "cs": "agentlightning.execution.client_server.ClientServerExecutionStrategy",
}

Source: agentlightning/trainer/registry.py:1-6

StrategyDescriptionUse Case
shmShared Memory - Local multi-process executionSingle-node GPU training
csClient-Server - Remote worker communicationDistributed deployments

Usage Patterns

Basic Training with GRPO Algorithm

from agentlightning import Trainer

trainer = Trainer(
    runner=runner,
    reward_fn=reward_fn,
    algorithm="GRPO"
)

trainer.train()

Source: contrib/recipes/agentos/README.md:40-47

Custom Algorithm Integration

The Trainer accepts custom algorithms decorated with the @algo decorator:

from agentlightning import Trainer
from agentlightning.algorithm import algo
from agentlightning.store import LightningStore

@algo
async def custom_algorithm(*, store: LightningStore):
    # Process traces from store
    return policy_update

trainer = Trainer(n_workers=1, algorithm=custom_algorithm)
trainer.fit(rollout_fn)

Source: examples/apo/apo_custom_algorithm_trainer.py:28-37

Parallel Training with Multiple Workers

from agentlightning import Trainer

trainer = Trainer(
    n_workers=4,           # 4 parallel workers
    execution_strategy="shm",  # Shared memory for local execution
    algorithm="PPO",
    runner=runner
)

trainer.train()

Integration with Agent-OS

The Trainer integrates with Agent-OS for policy-governed training:

from agentlightning import Trainer
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy

# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(deny=["DROP", "DELETE"]))

# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)

# Train with policy-aware rewards
trainer = Trainer(
    runner=runner,
    reward_fn=PolicyReward(kernel),
    algorithm="GRPO"
)

trainer.train()

Source: contrib/recipes/agentos/README.md:25-45

Workflow Phases

PhaseDescription
InitializationLoad algorithm, connect store, spawn workers
RolloutExecute agent episodes in parallel workers
Trace CollectionRetrieve spans from LightningStore
Algorithm UpdateProcess traces and update policy
IterationRepeat rollout-collect-update cycle

LightningStore Integration

The Trainer maintains bidirectional synchronization with LightningStore:

  • Span Emission: Workers emit execution traces during rollout
  • Trace Retrieval: Algorithm reads completed traces for learning
  • Persistence: Training state survives worker restarts

Source: CLAUDE.md:4-6

Command-Line Interface

The Trainer can be invoked via the agl CLI:

# Start training
agl store
python my_training_script.py algo
python my_training_script.py runner

Or programmatically:

python my_training_script.py

Source: examples/apo/apo_custom_algorithm_trainer.py:12-20

Extending the Trainer

Custom Execution Strategy

Add new strategies to the registry:

# In agentlightning/trainer/registry.py
ExecutionStrategyRegistry["custom"] = "mymodule.CustomExecutionStrategy"

Custom Algorithm

Decorate async functions with @algo:

from agentlightning.algorithm import algo

@algo
async def my_algorithm(*, store: LightningStore):
    traces = await store.traces.get_all()
    # Process traces
    return update

Dependencies

DependencyPurpose
LightningStoreTrace persistence and retrieval
AlgorithmPolicy learning logic
RunnerAgent execution environment
ExecutionStrategyWorker orchestration
RewardFnTraining signal computation

See Also

Source: https://github.com/microsoft/agent-lightning / Human Manual

Runner Component

Related topics: Trainer Component, Tutorial: Writing Agents

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Runner Hierarchy

Continue reading this section for the full explanation and source context.

Section Lifecycle Methods

Continue reading this section for the full explanation and source context.

Section Context Manager Pattern

Continue reading this section for the full explanation and source context.

Related topics: Trainer Component, Tutorial: Writing Agents

Runner Component

The Runner component is the core execution engine in Agent Lightning responsible for managing agent lifecycle, task processing, and telemetry collection. Runners serve as the bridge between the high-level Trainer orchestration and the underlying LitAgent implementation, handling initialization, worker management, and graceful shutdown.

Overview

Runners execute agents in a continuous loop where they poll the LightningStore for tasks, execute agent logic, and emit tracing spans for algorithm consumption. The Runner architecture supports both standard execution through LitAgentRunner and legacy compatibility through LegacyAgentRunner.

Source: agentlightning/runner/__init__.py:1-11

from .agent import LitAgentRunner
from .base import Runner
from .legacy import LegacyAgentRunner

__all__ = [
    "Runner",
    "LegacyAgentRunner",
    "LitAgentRunner",
]

Architecture

graph TD
    A[Trainer] --> B[Runner Fleet]
    B --> C[LitAgentRunner]
    B --> D[LegacyAgentRunner]
    C --> E[LitAgent]
    D --> F[AgentLightningClient]
    E --> G[LightningStore]
    F --> G
    E --> H[Tracer]
    H --> G

Runner Hierarchy

ClassPurposeSource
RunnerAbstract base class defining the runner interfacebase.py
LitAgentRunnerPrimary runner implementation for standard agent executionagent.py
LegacyAgentRunnerRunner for backward compatibility with AgentOps integrationlegacy.py

Runner Base Class

The Runner class defines the core interface that all runner implementations must follow. It establishes the lifecycle methods and execution patterns.

Source: agentlightning/runner/base.py:1-20

Lifecycle Methods

The runner lifecycle consists of four key phases:

graph LR
    A[init] --> B[init_worker]
    B --> C[iter/step]
    C --> D[teardown_worker]
    D --> E[teardown]
MethodPurposeMust Implement
init(agent, hooks)Initialize runner with agent and hooksYes
init_worker(worker_id, store)Per-worker initialization with storeYes
teardown()Release resources from init()Yes
teardown_worker(worker_id)Release per-worker resourcesYes

Context Manager Pattern

Runners support a context manager pattern for automatic resource management:

with runner.run_context(agent=agent, store=store, hooks=hooks) as runner:
    # Runner is initialized and ready
    await runner.iter()
# Automatic teardown on exit

Source: agentlightning/runner/base.py:52-86

The run_context helper ensures proper cleanup even when exceptions occur:

try:
    self.init(agent=agent, hooks=hooks)
    _initialized = True
    self.init_worker(worker_id=0, store=store)
    _worker_initialized = True
    yield self
finally:
    try:
        if _worker_initialized:
            self.teardown_worker(worker_id=worker_id if worker_id is not None else 0)
    except Exception:
        logger.error("Error during runner worker teardown", exc_info=True)

    try:
        if _initialized:
            self.teardown()
    except Exception:
        logger.error("Error during runner teardown", exc_info=True)

Execution Methods

MethodDescriptionBehavior
iter(event)Run continuously until event or no tasksAbstract - subclasses implement
step()Execute single unit of workAbstract - subclasses implement
run()Legacy run methodRaises RuntimeError - use iter() or step()

Source: agentlightning/runner/base.py:88-102

Warning: The run() method raises RuntimeError because its behavior is undefined. Always use iter() or step() instead.

LitAgentRunner

LitAgentRunner is the primary runner implementation that manages the agent-runner relationship, hook registration, and tracer integration.

Source: agentlightning/runner/agent.py:1-30

Initialization Flow

sequenceDiagram
    participant Trainer
    participant LitAgentRunner
    participant LitAgent
    participant Tracer
    participant LightningStore

    Trainer->>LitAgentRunner: init(agent, hooks)
    LitAgentRunner->>LitAgent: set_runner(self)
    LitAgentRunner->>Tracer: init()
    Trainer->>LitAgentRunner: init_worker(worker_id, store)
    LitAgentRunner->>Tracer: init_worker(worker_id, store)

Key Properties

PropertyTypeDescription
agentLitAgent[T_task]The agent instance (via get_agent())
storeLightningStoreThe backing store (via get_store())
worker_idOptional[int]Unique worker identifier
tracerTracerTracer for span emission

Source: agentlightning/runner/agent.py:90-110

Accessor Methods

def get_agent(self) -> LitAgent[T_task]:
    """Get the agent instance."""
    if self._agent is None:
        raise ValueError("Agent not initialized. Call init() first.")
    return self._agent

def get_store(self) -> LightningStore:
    """Get the store instance."""
    if self._store is None:
        raise ValueError("Store not initialized. Call init_worker() first.")
    return self._store

def get_worker_id(self) -> str:
    """Get the formatted worker ID string."""
    return f"Worker-{self.worker_id}" if self.worker_id is not None else "Worker-Unknown"

Logging Prefix

The _log_prefix() method generates consistent log prefixes for traceability:

def _log_prefix(self, rollout_id: Optional[str] = None) -> str:
    """Generate a standardized log prefix for the current worker."""
    # Returns format: "[Worker-{id}] [{rollout_id}]"

LegacyAgentRunner

LegacyAgentRunner provides backward compatibility for workflows using the AgentOps integration and AgentLightningClient communication pattern.

Source: agentlightning/runner/legacy.py:1-35

Attributes

AttributeTypeDescription
agentLitAgent[Any]The agent instance
clientAgentLightningClientServer communication client
tracerTracerTracer instance for span emission
worker_idOptional[str]Worker identifier
max_tasksOptional[int]Maximum tasks before stopping

Architecture

graph TD
    A[LegacyAgentRunner] --> B[LitAgent]
    A --> C[AgentLightningClient]
    A --> D[Tracer]
    C --> E[Server]
    D --> F[LightningStore]
    B --> F

Hook System Integration

Runners integrate with the hook system to provide extensibility at key lifecycle points:

Source: agentlightning/types/core.py:1-30

HookTimingPurpose
on_trace_startBefore tracer enters trace contextLogging, metric collection, resource setup
on_trace_endAfter rollout completes, before tracer exitsLogging, cleanup
on_rollout_startBefore rollout attempt beginsPer-attempt initialization
on_rollout_endAfter rollout attempt completesResult processing, cleanup

Hooks are registered during initialization and called by the runner at appropriate points during execution.

Trainer Integration

Runners are instantiated and managed by the Trainer class, which orchestrates the entire training loop:

Source: agentlightning/trainer/trainer.py:40-60

class Trainer(TrainerLegacy):
    """High-level orchestration layer that wires Algorithm <-> Runner <-> Store."""
    
    # Runner fleet configuration
    n_runners: int  # Number of agent runners to run in parallel
    max_rollouts: Optional[int]  # Maximum rollouts per runner
    strategy: ExecutionStrategy  # Process management strategy
    tracer: Tracer  # Tracer instance for telemetry
    hooks: Sequence[Hook]  # Lifecycle callbacks

Training Configuration

ParameterTypeDescription
n_runnersintNumber of parallel agent runners
max_rolloutsOptional[int]Stop after N rollouts (None = unlimited)
strategyExecutionStrategySpawning strategy (shared memory, client/server)
tracerTracerTracer class or config for span collection
hooksSequence[Hook]Lifecycle callback instances

Execution Flow

graph TD
    A[Trainer.fit/dev] --> B[Spawn Runner Fleet]
    B --> C[For each Runner]
    C --> D[runner.run_context]
    D --> E[init + init_worker]
    E --> F[iter/event loop]
    F --> G{Tasks available?}
    G -->|Yes| H[Execute step]
    H --> I[Emit spans to Store]
    I --> F
    G -->|No| J[Exit loop]
    J --> K[teardown_worker]
    K --> L[teardown]

Context Manager Usage

For debugging or standalone usage outside the Trainer stack:

from agentlightning import LitAgentRunner, InMemoryLightningStore

# Create store and agent
store = InMemoryLightningStore()
agent = MyLitAgent()

# Use context manager
runner = LitAgentRunner(tracer=AgentOpsTracer())
with runner.run_context(agent=agent, store=store) as runner:
    # Runner initialized and ready
    worker_id = runner.get_worker_id()
    print(f"Running on {worker_id}")
    
    # Run until complete
    await runner.iter()
# Automatic cleanup

Source: agentlightning/runner/base.py:88-113

Error Handling

Runners implement robust error handling during teardown:

PhaseError BehaviorRecovery
teardown_workerLogged but doesn't propagateContinue to teardown
teardownLogged but doesn't propagateContext manager completes

This ensures that multiple cleanup errors don't mask the original failure and that partial cleanup still occurs.

Summary

The Runner component provides:

  1. Lifecycle Management - Consistent init/teardown patterns via context managers
  2. Worker Isolation - Per-worker initialization with dedicated store connections
  3. Hook Integration - Extensibility through lifecycle callbacks
  4. Telemetry - Built-in tracer integration for span emission
  5. Trainer Integration - Seamless orchestration within the training loop

Runners are the execution backbone of Agent Lightning, translating high-level training commands into agent task processing while maintaining observability through distributed tracing.

Source: https://github.com/microsoft/agent-lightning / Human Manual

LightningStore

Related topics: System Architecture, Trainer Component

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Core Types

Continue reading this section for the full explanation and source context.

Section Rollout Status Lifecycle

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Trainer Component

LightningStore

LightningStore is the central data persistence and synchronization layer in Agent Lightning. It manages the lifecycle of AI agent training workflows, including rollouts, attempts, spans, resources, and worker state. The store serves as the backbone for the training loop, enabling distributed execution, tracing, and experiment tracking.

Overview

LightningStore provides a unified interface for:

  • Rollout Management: Tracking agent task executions from enqueue to completion
  • Span Recording: Capturing fine-grained traces of agent operations via OpenTelemetry
  • Resource Management: Storing and versioning agent configurations, prompts, and model definitions
  • Worker Coordination: Managing distributed worker states and heartbeats
  • Metrics Collection: Aggregating training metrics through Prometheus integration

Source: agentlightning/store/base.py

Architecture

LightningStore follows a pluggable backend architecture with a unified async interface.

graph TD
    subgraph "Client Layer"
        Runner[Runner] --> Tracer[Tracers<br/>OtelTracer<br/>AgentOpsTracer]
        Tracer --> Client[LightningStoreClient]
    end
    
    subgraph "Server Layer"
        Client --> |HTTP/gRPC| Server[LightningStoreServer]
        Server --> Collections[LightningCollections]
    end
    
    subgraph "Storage Backends"
        Collections --> InMemory[InMemoryLightningStore]
        Collections --> SQLite[SQLiteLightningStore]
        Collections --> Mongo[MongoLightningStore]
    end
    
    subgraph "Thread Safety"
        Store[Any Store] --> Threaded[LightningStoreThreaded]
    end

Core Components

ComponentPurpose
LightningStoreAbstract base class defining the store interface
LightningStoreClientHTTP client for remote store communication
LightningStoreServerFastAPI-based server handling store operations
LightningCollectionsOrganized data collections (rollouts, spans, resources, workers)
LightningStoreThreadedThread-safe wrapper for concurrent access

Source: agentlightning/store/client_server.py

Data Models

Core Types

The store operates on these fundamental data structures:

ModelDescription
RolloutA complete task execution with status, timestamps, and metadata
AttemptA single attempt within a rollout (supports retries)
SpanFine-grained trace data for agent operations
TaskInputInput data for a task (prompt, parameters)
WorkerWorker node state and heartbeat information
ResourcesUpdateVersioned resource configuration storage
RolloutConfigConfiguration for rollout execution

Source: agentlightning/types/core.py

Rollout Status Lifecycle

stateDiagram-v2
    [*] --> Pending: enqueue_rollout
    Pending --> Running: start_rollout
    Running --> Completed: finish_rollout
    Running --> Failed: fail_rollout
    Completed --> [*]
    Failed --> [*]
    
    Running --> Attempted: start_attempt
    Attempted --> Running: finish_attempt

The status values are:

  • pending - Queued for execution
  • running - Currently executing
  • completed - Successfully finished
  • failed - Execution failed

Source: agentlightning/store/base.py

API Endpoints

The server exposes REST endpoints under /v1/agl:

EndpointMethodDescription
/rolloutsPOSTEnqueue new rollouts
/rollouts/{id}GETRetrieve rollout by ID
/rollouts/{id}/startPOSTMark rollout as started
/rollouts/{id}/finishPOSTComplete a rollout
/rollouts/{id}/attemptPOSTStart a new attempt
/rollouts/{id}/attempt/{aid}/finishPOSTFinish an attempt
/spansPOSTRecord span data
/spans/searchPOSTQuery spans with filters
/resourcesPOSTAdd new resources
/resources/{id}GET/PUTGet or update resources
/workersPOSTRegister worker
/workers/{id}/heartbeatPOSTWorker heartbeat
/statisticsGETStore statistics

Source: agentlightning/store/client_server.py

Implementation Backends

In-Memory Store

The InMemoryLightningStore provides a lightweight, zero-dependency backend suitable for single-node execution and testing.

Key characteristics:

  • All data stored in process memory
  • Supports collections with atomic transactions
  • Built-in size estimation for memory monitoring
  • Fast for development and small-scale experiments
from agentlightning import InMemoryLightningStore

store = InMemoryLightningStore()

Source: agentlightning/store/memory.py

SQLite Store

SQLite backend provides persistent storage with ACID guarantees, suitable for single-node deployments requiring durability.

MongoDB Store

MongoDB backend supports distributed deployments with horizontal scaling, providing high throughput for large-scale training runs.

Thread Safety

The LightningStoreThreaded class wraps any store implementation to provide thread-safe access:

from agentlightning.store.threading import LightningStoreThreaded

# Wrap any store with thread safety
threaded_store = LightningStoreThreaded(store)

Thread safety features:

  • Uses threading.Lock for synchronization
  • Guarantees atomic operations across concurrent requests
  • Maintains all original store capabilities
  • Exposes thread_safe: True and async_safe: True in capabilities

Source: agentlightning/store/threading.py

Collection Operations

LightningStore uses a collection-based data organization pattern:

# Atomic write operation
async with store.collections.atomic(mode="w", snapshot=..., labels=["resources"]) as collections:
    await collections.resources.insert([update])

Supported Collections

CollectionPurpose
rolloutsTask execution records
attemptsIndividual attempt tracking
spansOpenTelemetry trace spans
resourcesVersioned configurations
workersWorker state management

Source: agentlightning/store/collection_based.py

Decorators and Instrumentation

The store layer uses several decorators for observability and reliability:

DecoratorPurpose
@trackedRecords operation metrics and timing
@healthcheck_beforeValidates store health before operations
@_with_collections_executeManages collection lifecycle and error handling

Integration with Tracers

LightningStore integrates with OpenTelemetry through tracers:

from agentlightning import OtelTracer, AgentOpsTracer

tracer = OtelTracer(store=store)

Tracing workflow:

sequenceDiagram
    participant Agent
    participant Tracer
    participant Store
    participant OTLP
    
    Agent->>Tracer: Create span
    Tracer->>Store: Record span data
    Store->>Store: Persist to backend
    Tracer->>OTLP: Export spans (optional)

Source: examples/minimal/write_traces.py

Usage Examples

Basic Store Operations

from agentlightning import InMemoryLightningStore

# Create store
store = InMemoryLightningStore()

# Enqueue a task
rollout = await store.enqueue_rollout(
    input={"prompt": "Solve this problem"},
    mode="train"
)

# Dequeue for processing
task = await store.dequeue_rollout(worker_id="worker-1")

# Complete the rollout
await store.finish_rollout(
    rollout_id=task.rollout.rollout_id,
    attempt_id=task.attempt.attempt_id,
    response={"answer": "42"}
)

Server Setup

# Start a LightningStore server
agl store --port 45993 --log-level DEBUG

Client Connection

from agentlightning import LightningStoreClient

client = LightningStoreClient(base_url="http://localhost:45993")

# All operations work through the client
rollouts = await client.list_rollouts()

Capabilities

The store reports its capabilities through the capabilities property:

CapabilityDescription
async_safeSupports async operations
thread_safeSupports concurrent thread access
distributedSupports multi-node deployment
persistenceData survives restarts

Source: agentlightning/store/threading.py

CLI Commands

The agl CLI provides store management:

# Start store server
agl store --port 45993 --log-level DEBUG

# Prometheus metrics endpoint
agl prometheus

Source: agentlightning/cli/__init__.py

Source: https://github.com/microsoft/agent-lightning / Human Manual

Algorithm Zoo

Related topics: Tutorial: Train Your First Agent

Section Related Pages

Continue reading this section for the full explanation and source context.

Section APO (Adaptive Prompt Optimization)

Continue reading this section for the full explanation and source context.

Section VERL (Value-Enhanced Reinforcement Learning)

Continue reading this section for the full explanation and source context.

Section FAST (Fast Algorithm Suite Toolkit)

Continue reading this section for the full explanation and source context.

Related topics: Tutorial: Train Your First Agent

Algorithm Zoo

Overview

The Algorithm Zoo is a modular collection of training algorithms that consume execution traces from the Agent Lightning runtime to improve agent behavior through reinforcement learning and prompt optimization. Source: CLAUDE.md

Agent Lightning runs through a continuous loop where runners and tracers emit spans, LightningStore keeps them synchronized, and algorithms in agentlightning/algorithm/ consume those traces to improve behavior. Source: CLAUDE.md

Architecture

The Algorithm Zoo follows a producer-consumer pattern where the store acts as the central synchronization hub:

graph TD
    A[Runners] -->|emit spans| B[LightningStore]
    C[Tracers] -->|emit spans| B
    B -->|traces| D[Algorithm Zoo]
    D -->|policy updates| E[Improved Agent Behavior]
    B -->|traces| F[Dashboard]

Available Algorithms

APO (Adaptive Prompt Optimization)

APO is a prompt optimization algorithm that iteratively refines prompt templates based on reward signals collected from agent rollouts.

#### How APO Works

The APO algorithm maintains a collection of prompt candidates and evaluates each one against task objectives. Based on the reward signals, it selects and refines the most effective prompts. Source: examples/apo/apo_custom_algorithm.py:34-37

async def apo_algorithm(*, store: agl.LightningStore):
    """
    An example of how a prompt optimization works.
    """
    prompt_candidates = [
        "You are a helpful assistant. {any_question}",
        "You are a knowledgeable AI. {any_question}",
        "You are a friendly chatbot. {any_question}",
    ]

    prompt_and_rewards: list[tuple[str, float]] = []

#### Custom APO Algorithm

To create a custom algorithm, wrap your async function with the @algo decorator. Source: examples/apo/apo_custom_algorithm_trainer.py:28-39

from agentlightning.algorithm import algo

@algo
async def apo_algorithm_usable_in_trainer(*, store: LightningStore):
    """
    You need to wrap the apo_algorithm in an algo decorator to make it usable in trainer.
    """
    return await apo_algorithm(store=store)

VERL (Value-Enhanced Reinforcement Learning)

VERL is a full training algorithm that integrates with the VERL library for GPU-accelerated reinforcement learning. Source: examples/tinker/q20_train.py:43-52

algo_verl_parser = subparsers.add_parser("verl", help="Launch the full training algorithm with VERL.")
algo_verl_parser.add_argument("--port", type=int, default=4747, help="Port for the AgentLightning store.")
algo_verl_parser.add_argument(
    "--model",
    choices=("qwen25", "qwen3"),
    default="qwen3",
    help="Model variant to train.",
)
algo_verl_parser.add_argument("--search", action="store_true", help="Enable search tool.")

FAST (Fast Algorithm Suite Toolkit)

The FAST algorithm provides lightweight optimization capabilities for rapid experimentation.

Running Algorithms

Option A: Separate Components

Start the store, algorithm, and runner in three separate terminals: Source: examples/apo/README.md:10-24

# Terminal 1: Start the store
agl store

# Terminal 2: Run the algorithm
python apo_custom_algorithm.py algo

# Terminal 3: Run the rollout runner
python apo_custom_algorithm.py runner

Option B: Integrated Trainer

Use the integrated trainer that handles all components: Source: examples/apo/apo_custom_algorithm_trainer.py:47-49

from agentlightning import Trainer, setup_logging

trainer = Trainer(n_workers=1, algorithm=apo_algorithm_usable_in_trainer)
trainer.fit(apo_rollout)

Algorithm Decorator

The @algo decorator transforms any async algorithm function into a component that can be used with the Trainer. It injects the LightningStore as a keyword argument. Source: examples/apo/apo_custom_algorithm_trainer.py:28-39

Algorithm Configuration

Common Parameters

ParameterTypeDescription
storeLightningStoreCentral store for traces and resources
n_workersintNumber of parallel workers
portintPort for store connection (default: 4747)

VERL-Specific Options

OptionChoicesDefaultDescription
--modelqwen25, qwen3qwen3Model variant to train
--portint4747Store connection port
--searchflagFalseEnable search tool

Workflow

graph LR
    A[Define Prompt Candidates] --> B[Loop Through Candidates]
    B --> C[Update Resources in Store]
    C --> D[Run Rollout with Runner]
    D --> E[Collect Reward Signal]
    E --> F[Update Prompt Template]
    F --> B

Extending the Algorithm Zoo

Creating Custom Algorithms

  1. Define an async function that takes store: LightningStore as a keyword argument
  2. Wrap it with the @algo decorator
  3. Implement your optimization logic
  4. Use the trainer or run separately

Example pattern: Source: examples/apo/apo_custom_algorithm.py:54-72

async def apo_algorithm(*, store: agl.LightningStore):
    for prompt in prompt_candidates:
        # 1. The optimization algorithm updates the prompt template
        console.print(f"[Algo] Updating prompt template to: '{prompt}'")
        resources: agl.NamedResources = {
            # The "main_prompt" can be replaced with any name
        }
        # 2. Update resources in store
        # 3. Collect reward signals
        # 4. Refine prompt based on rewards

Requirements for Custom Algorithms

  • Must be async functions
  • Must accept store as keyword argument
  • Should be wrapped with @algo decorator for trainer integration
  • Must interact with LightningStore for state synchronization

Integration with RAG

The Algorithm Zoo can be extended to work with retrieval-augmented generation systems. See the RAG example for integrating FAISS-based retrieval with prompt optimization. Source: examples/rag/README.md

See Also

Source: https://github.com/microsoft/agent-lightning / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

medium No sandbox install has been executed yet; downstream must verify before user use.

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 7 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | README/documentation is current enough for a first validation pass.

2. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | last_activity_observed missing

3. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium

4. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.

  • Severity: medium
  • Finding: No sandbox install has been executed yet; downstream must verify before user use.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.safety_notes | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | No sandbox install has been executed yet; downstream must verify before user use.

5. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium

6. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | issue_or_pr_quality=unknown

7. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agent-lightning with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence