Doramagic Project Pack · Human Manual
agent-lightning
The Agent Lightning architecture follows a producer-consumer pattern centered around trace collection and consumption.
Introduction to Agent Lightning
Related topics: System Architecture, Installation Guide
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Installation Guide
Introduction to Agent Lightning
Agent Lightning is a reinforcement learning framework designed to train any AI agent with RL algorithms. The project provides a unified execution stack, instrumentation capabilities, and training infrastructure that enables researchers and developers to improve agent behavior through reward-based learning. Source: README.md:1
What is Agent Lightning?
Agent Lightning bridges the gap between raw agent execution and RL-based training by providing:
- Instrumentation Layer: Transparent tracing and logging of agent interactions
- Training Infrastructure: Built-in support for RL algorithms like GRPO
- Distributed Execution: Multi-worker rollout management with state synchronization
- Integration Points: Adapters for popular agent frameworks and execution environments
The framework treats agent training as a continuous feedback loop where traces collected from agent execution are consumed by training algorithms to improve policy behavior over time. Source: CLAUDE.md:3
Architecture Overview
The Agent Lightning architecture follows a producer-consumer pattern centered around trace collection and consumption.
Core Loop
graph TD
A[Runner] -->|emits spans| B[Tracers]
B -->|writes traces| C[LightningStore]
C -->|serves traces| D[Algorithms]
D -->|updates policy| A
C -->|serves traces| E[Dashboard]The continuous execution loop works as follows:
- Runners execute agents and emit execution spans
- Tracers capture and format these spans with semantic conventions
- LightningStore maintains synchronized state across all components
- Algorithms consume traces to compute rewards and update agent policies
- Dashboard provides real-time visualization for debugging
Source: CLAUDE.md:3
Component Hierarchy
| Layer | Components | Responsibility |
|---|---|---|
| Execution | Runner, LitAgent | Execute agent logic and manage lifecycle |
| Instrumentation | Tracer, OtelTracer, AgentOpsTracer | Capture execution traces |
| Storage | LightningStore, LightningStoreClient | Synchronized state management |
| Training | Algorithms in agentlightning/algorithm/ | Process traces, compute rewards |
| CLI | agl command | User-facing interface |
Source: agentlightning/cli/__init__.py:13-16
Core Data Models
The framework defines several fundamental data structures in agentlightning/types/core.py.
Task and Rollout
classDiagram
class Task {
+str task_id
+Any input
+Optional~str~ instance_id
+Optional~str~ dataset
}
class Rollout {
+str rollout_id
+str status
+Optional~str~ worker_id
+List~Attempt~ attempts
}
class Attempt {
+str attempt_id
+str status
+List~Span~ spans
+Optional~float~ reward
}
class Triplet {
+Any prompt
+Any response
+Optional reward
}
Task "1" --> "*" Rollout
Rollout "1" --> "*" Attempt
Attempt --> TripletCore Type Exports
| Type | Purpose |
|---|---|
Task | Represents a unit of work to be executed by an agent |
Rollout | Collection of attempts for a single task execution |
Attempt | Single execution attempt with spans and reward |
Triplet | Prompt-response-reward tuple for RL training |
LightningStore | Synchronized state store for distributed execution |
Source: agentlightning/types/core.py:1-60
Runner System
The runner system provides the execution context for agents with integrated lifecycle management.
Runner Lifecycle
sequenceDiagram
participant User
participant Runner
participant Store
participant Agent
User->>Runner: async with Runner(agent, store)
Runner->>Runner: init(agent)
Runner->>Runner: init_worker(store)
Runner->>Store: Register worker
Loop Until event
Runner->>Agent: Execute task
Agent-->>Runner: Result
Runner->>Store: Update state
end
Runner->>Runner: teardown_worker()
Runner->>Runner: teardown()Runner Base Class
The Runner class provides context manager support for safe initialization and cleanup:
async with runner:
runner.init(agent=agent, hooks=hooks)
runner.init_worker(worker_id=0, store=store)
# Execute tasks...
Key runner responsibilities:
- Initialization: Set up agent and worker state
- Execution: Poll store for tasks and execute them
- Cleanup: Graceful teardown of worker and agent resources
Source: agentlightning/runner/base.py:1-80
Tracing and Instrumentation
Agent Lightning provides multiple tracing backends for capturing agent execution.
Supported Tracers
| Tracer | Use Case | Backend |
|---|---|---|
OtelTracer | OpenTelemetry-compatible tracing | OTLP endpoint |
AgentOpsTracer | AgentOps platform integration | AgentOps service |
| Custom Tracer | Framework integration | Pluggable |
Semantic Conventions
The framework defines semantic conventions in agentlightning/semconv.py for consistent span attributes:
| Attribute | Description |
|---|---|
LightningSpanAttributes.REWARD | Reward values for RL spans |
LightningSpanAttributes.LINK | Span linking relationships |
LightningSpanAttributes.TAG | Custom span tagging |
LightningResourceAttributes.ROLLOUT_ID | Rollout identification |
LightningResourceAttributes.ATTEMPT_ID | Attempt identification |
Source: agentlightning/semconv.py:1-40
Trace Writing Example
The minimal examples demonstrate trace writing with LightningStore:
from agentlightning import AgentOpsTracer, LightningStoreClient, OtelTracer, Span
# Write traces directly to in-memory store
store = InMemoryLightningStore()
tracer = OtelTracer(store=store)
# Or connect to a server-side store
client = LightningStoreClient(endpoint="http://localhost:45993")
Source: examples/minimal/write_traces.py:1-50
LightningStore
LightningStore is the central state management component that keeps all components synchronized.
Store Capabilities
graph LR
A[Runners] -->|enqueue/dequeue| B[Rollouts]
A -->|register| C[Workers]
D[Tracers] -->|write spans| B
E[Algorithms] -->|query traces| B
F[Dashboard] -->|inspect state| BStore Collections
| Collection | Data Type | Access Pattern |
|---|---|---|
rollouts | Rollout | Enqueue/dequeue by worker |
attempts | Attempt | Link to rollout |
spans | Span | Query by attempt |
workers | Worker | Heartbeat management |
resources | ResourcesUpdate | Model/prompt versioning |
Source: dashboard/test-utils/python-server.py:1-100
Training Algorithms
Agent Lightning integrates with reinforcement learning algorithms to improve agent behavior.
Algorithm Integration
The framework supports pluggable algorithms defined in agentlightning/algorithm/. Algorithms consume traces from the LightningStore and compute policy updates.
Agent-OS Integration
For production safety-critical deployments, Agent Lightning integrates with Agent-OS:
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
runner = AgentOSRunner(kernel, fail_on_violation=False, emit_violations=True)
reward_fn = PolicyReward(kernel)
This integration provides:
- Policy enforcement: Kernel-level safety during training
- Violation penalties: Unsafe actions convert to negative RL rewards
- Audit trail: Complete visibility from training to production
Source: contrib/recipes/agentos/README.md:1-60
Minimal Component Showcase
The examples/minimal/ directory provides isolated demonstrations of individual building blocks.
Available Examples
| Component | File | Demonstrates |
|---|---|---|
| LightningStore + OTLP | write_traces.py | OtelTracer, AgentOpsTracer, rollout/span emission |
| MultiMetrics backend | write_metrics.py | Console and Prometheus metrics simultaneously |
| LLM proxying | llm_proxy.py | Request routing through /rollout/<id>/attempt/<id> |
| vLLM lifecycle | vllm_server.py | Server startup, readiness monitoring, teardown |
Each example is self-documenting with CLI arguments and environment variables embedded in module docstrings.
Source: examples/minimal/README.md:1-30
Command-Line Interface
The agl CLI provides entry points for all major framework operations.
Available Subcommands
| Command | Module | Description |
|---|---|---|
agl vllm | agentlightning.cli.vllm | vLLM server with instrumentation |
agl store | agentlightning.cli.store | LightningStore server |
agl prometheus | agentlightning.cli.prometheus | Prometheus metrics endpoint |
agl agentops | agentlightning.cli.agentops_server | AgentOps server manager |
Starting a LightningStore Server
agl store --port 45993 --log-level DEBUG
The store server enables distributed execution where multiple workers can connect and synchronize state.
Source: agentlightning/cli/__init__.py:1-35
Dashboard
The Agent Lightning Dashboard is a React-based web application for inspecting store state and debugging experiments.
Features
- Real-time state inspection: View rollouts, attempts, and spans
- Worker monitoring: Track worker status and heartbeat statistics
- Resource visualization: Inspect model configurations and prompts
- Experiment debugging: Analyze trace sequences and reward flows
Technology Stack
| Layer | Technology |
|---|---|
| Framework | React |
| UI Components | Mantine UI |
| Documentation | Storybook |
| Testing | Vitest |
Source: dashboard/README.md:1-35
Project Structure
agent-lightning/
├── agentlightning/ # Core library
│ ├── algorithm/ # RL training algorithms
│ ├── cli/ # Command-line interface
│ ├── contrib/ # Third-party integrations
│ ├── runner/ # Execution runners
│ ├── store/ # LightningStore implementations
│ ├── tracer/ # Tracing backends
│ ├── types/ # Data models
│ └── semconv.py # Semantic conventions
├── contrib/
│ └── recipes/ # Integration examples (webshop, agentos)
├── dashboard/ # React web application
├── docs/ # Documentation (mkdocs)
├── examples/ # Runnable workflows
├── scripts/ # Automation scripts
└── tests/ # Test suite
Source: CLAUDE.md:5-15
Development Workflow
Setup
uv sync --group dev
Testing
# Full test suite
uv run --no-sync pytest -v
# Specific tests
uv run --no-sync pytest -v tests/path/to/test.py
uv run --no-sync pytest -v -k "test_pattern"
Type Checking
uv run --no-sync pyright
Pre-commit Checks
uv run --no-sync pre-commit run --all-files --show-diff-on-failure
Documentation
uv run --no-sync mkdocs build --strict
Source: CLAUDE.md:18-30
Contributing
Agent Lightning welcomes contributions through a structured process:
- Branch naming:
feature/<slug>,fix/<slug>,docs/<slug>, orchore/<slug> - Commits: Imperative, scoped commits with issue references (e.g.,
Fixes #123) - Pre-submission: Run pre-commit hooks and relevant pytest/doc builds
- CLA: Contributor License Agreement required (automatically prompted by CLA bot)
Source: README.md:50-70
Citation
If you use Agent Lightning in research, please cite:
@misc{luo2025agentlightningtrainai,
title={Agent Lightning: Train ANY AI Agents with Reinforcement Learning},
author={Xufang Luo and Yuge Zhang and Zhiyuan He and Zilong Wang and Siyun Zhao and Dongsheng Li and Luna K. Qiu and Yuqing Yang},
year={2025},
eprint={2508.03680},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.03680},
}
Source: README.md:15-25
Further Reading
- Minimal Examples Guide - Hands-on with individual components
- Claude Code Integration - Example: Training with SWE-bench
- Agent-OS Integration - Safety-critical training
- API Reference - Detailed type and function documentation
Source: https://github.com/microsoft/agent-lightning / Human Manual
Installation Guide
Related topics: Introduction to Agent Lightning, Tutorial: Train Your First Agent
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to Agent Lightning, Tutorial: Train Your First Agent
Installation Guide
This guide covers all supported methods for installing and configuring Agent Lightning in your environment. Agent Lightning is a reinforcement learning framework for training AI agents, with support for GPU acceleration, distributed training, and various algorithm backends.
Prerequisites
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.10+ | 3.11 or 3.12 |
| OS | Linux (Ubuntu 20.04+), macOS | Linux with CUDA |
| RAM | 8 GB | 32 GB+ |
| GPU | Optional | NVIDIA GPU with CUDA 11.8+ |
| Disk Space | 5 GB | 20 GB+ |
Source: contrib/recipes/webshop/agl/requirements.txt:1-3
Required Tools
- uv: Modern Python package manager (recommended)
- Git: For cloning the repository
- CUDA Toolkit (for GPU training): Version 11.8 or later
Installation Methods
Method 1: Install from Source
This is the recommended approach for development and contributing.
# Clone the repository
git clone https://github.com/microsoft/agent-lightning.git
cd agent-lightning
# Install all dependencies including development tools
uv sync --group dev
# Install optional GPU dependencies
uv sync --group GPU
Source: CLAUDE.md:20-22
Method 2: Install with Specific Algorithm Backends
Agent Lightning supports multiple reinforcement learning algorithms through optional dependency groups:
# Install with VERL backend (recommended for GPU training)
uv sync --group VERL
# Install with APO backend
uv sync --group APO
# Install with GPU optimizations
uv sync --group GPU
Source: CLAUDE.md:22
Method 3: Using setup.sh for GPU Training
For GPU-accelerated training with the webshop recipe:
# From the contrib/recipes/webshop directory
./setup.sh
This script installs VERL extras for GPU training support. Source: contrib/recipes/webshop/agl/requirements.txt:1-8
Dependency Groups
The pyproject.toml defines several optional dependency groups:
| Group | Purpose | Installation Command |
|---|---|---|
dev | Development tools (pytest, pyright, pre-commit) | uv sync --group dev |
GPU | GPU acceleration packages | uv sync --group GPU |
VERL | VERL algorithm backend | uv sync --group VERL |
APO | APO algorithm backend | uv sync --group APO |
Source: CLAUDE.md:20-23
Environment Setup
Creating a Virtual Environment
Using uv (recommended):
# Create and activate a new virtual environment
uv venv
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows
Verifying Installation
Run the test suite to verify your installation:
# Run all tests
uv run --no-sync pytest -v
# Run specific test
uv run --no-sync pytest -v tests/path/to/test.py
# Run tests matching a pattern
uv run --no-sync pytest -v -k "test_pattern"
Source: CLAUDE.md:21
Type Checking
Verify type annotations are correct:
uv run --no-sync pyright
Source: CLAUDE.md:22
Pre-commit Hooks
Before committing code, run pre-commit checks:
uv run --no-sync pre-commit run --all-files --show-diff-on-failure
Source: CLAUDE.md:23
Dashboard Installation
The Agent Lightning Dashboard is a separate React application:
cd dashboard
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
Source: dashboard/README.md:npm scripts section
Dashboard npm Scripts
| Script | Purpose |
|---|---|
dev | Start development server |
build | Build production bundle |
preview | Preview production build locally |
storybook | Start Storybook dev server |
build-storybook | Build Storybook bundle |
eslint | Run ESLint |
stylelint | Run Stylelint |
prettier | Run Prettier |
typecheck | Run TypeScript typecheck |
vitest | Run vitest tests |
Source: dashboard/README.md:npm scripts
Recipe-Specific Installation
Webshop Recipe
The webshop recipe has specific dependencies:
cd contrib/recipes/webshop/agl
# Install requirements
pip install -r requirements.txt
# For GPU training
./setup.sh
Required dependencies include:
pandas>=2.0.0- Data manipulationpyarrow>=14.0.0- Parquet file supportrich>=13.0.0- Terminal formattingtqdm>=4.64.0- Progress bars
Source: contrib/recipes/webshop/agl/requirements.txt:1-15
Development Workflow
Branching Conventions
Create feature branches from a fresh main:
| Branch Type | Naming Convention |
|---|---|
| Feature | feature/<slug> |
| Fix | fix/<slug> |
| Documentation | docs/<slug> |
| Maintenance | chore/<slug> |
Source: CLAUDE.md:8, AGENTS.md:8
Commit and PR Guidelines
- Write imperative, scoped commit messages
- Reference issues with
Fixes #123 - Rerun pre-commit and relevant pytest/doc builds before pushing
- Include verification commands in PR descriptions
- Update documentation via
mkdocs.ymlorexamples/README.md
Source: CLAUDE.md:9-13, AGENTS.md:9-13
GPU Configuration
For optimal GPU training performance:
- Install NVIDIA drivers (CUDA 11.8+)
- Install the
GPUdependency group - For VERL-based training, use
uv sync --group GPU
GPU metrics are tracked via heartbeat statistics in worker nodes:
heartbeat_stats={"queue_depth": 2, "gpu_utilization": 0.82}
Source: dashboard/test-utils/python-server.py:Worker class
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
uv command not found | Install uv: pip install uv |
| CUDA not found | Ensure NVIDIA drivers and CUDA toolkit are installed |
| Import errors | Run uv sync to ensure all dependencies are installed |
| Type checking failures | Run uv run --no-sync pyright to identify issues |
Source: CLAUDE.md:26-30
Lock File Updates
When dependencies change, commit the refreshed uv.lock:
git add uv.lock
git commit -m "chore: update lock file"
Source: CLAUDE.md:24
Next Steps
After installation:
- Explore Minimal Component Showcase to understand individual components
- Set up the LightningStore for trace storage
- Configure tracers for your agent execution
- Review the Algorithm Documentation for training options
Source: https://github.com/microsoft/agent-lightning / Human Manual
System Architecture
Related topics: Trainer Component, Runner Component, LightningStore
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Trainer Component, Runner Component, LightningStore
System Architecture
Overview
Agent Lightning is a reinforcement learning framework for training AI agents, with a distributed system architecture that supports multi-worker training orchestration, resource management, and distributed tracing. The system consists of three primary layers: a Backend Training Engine, a State Store, and a Dashboard Frontend.
The architecture enables parallel training across multiple workers, centralized resource configuration, and real-time monitoring of training workflows through traces and metrics.
Source: dashboard/src/layouts/AppLayout.tsx:1-50
High-Level Architecture Components
The Agent Lightning system comprises the following core entities:
| Component | Description | Key Attributes |
|---|---|---|
| Resources | Configuration templates for prompts, models, and sampling parameters | resources_id, version, resources (dict with PromptTemplate/LLM) |
| Workers | Runner processes that execute training rollouts | worker_id, status, heartbeat_stats, current_rollout_id |
| Rollouts | Complete training episodes with multiple attempts | rollout_id, status, mode, attempts |
| Attempts | Individual training attempts within a rollout | attempt_id, status, metrics |
| Spans | Distributed tracing spans for observability | trace_id, span_id, status, attributes, start_time, end_time |
Source: dashboard/test-utils/python-server.py:1-300
Frontend Dashboard Architecture
The dashboard is a React-based frontend built with Mantine UI components that communicates with the backend via REST APIs.
Navigation Structure
The application uses a sidebar navigation layout with the following sections:
graph TD
A[AppLayout] --> B[Navbar]
A --> C[Main Content Area]
B --> D[Rollouts]
B --> E[Workers]
B --> F[Resources]
B --> G[Traces]
B --> H[Settings]
C --> I[Outlet Component]Source: dashboard/src/layouts/AppLayout.tsx:20-50
Page Components
| Page | File Path | Purpose |
|---|---|---|
| Rollouts | dashboard/src/pages/Rollouts.page.tsx | Display and manage training rollouts with status filtering |
| Workers | dashboard/src/pages/Workers.page.tsx | Monitor worker health and current assignments |
| Resources | dashboard/src/pages/Resources.page.tsx | View and manage configuration resources |
| Traces | dashboard/src/components/TracesTable.component.tsx | Analyze distributed tracing spans |
Source: dashboard/src/pages/Rollouts.page.tsx:1-80
Data Flow Architecture
Worker Heartbeat Flow
Workers periodically send heartbeat signals to indicate their operational state. The dashboard monitors these heartbeats to determine worker availability.
sequenceDiagram
participant W as Worker
participant S as Store
participant D as Dashboard
W->>S: Heartbeat (status, queue_depth, gpu_utilization)
S->>S: Update last_heartbeat_time
D->>S: Poll /workers endpoint
S-->>D: Worker list with statusSource: dashboard/test-utils/python-server.py:100-150
Rollout Execution Flow
Training rollouts follow a multi-attempt execution model:
graph LR
A[Rollout Created] --> B[Attempt 1]
B --> C{Success?}
C -->|Yes| D[Rollout Complete]
C -->|No| E[Attempt 2]
E --> F{Success?}
F -->|Yes| D
F -->|No| G[Attempt N]
G --> H[Max Attempts Reached]Source: dashboard/src/components/TracesTable.component.tsx:50-150
Core Entity Schemas
Resources Entity
Resources define reusable configuration templates used by workers during training.
| Field | Type | Description |
|---|---|---|
resources_id | string | Unique identifier for the resource |
version | integer | Version number for tracking changes |
create_time | timestamp | Creation timestamp |
update_time | timestamp | Last modification timestamp |
resources | dict | Configuration dictionary (PromptTemplate, LLM configs) |
Source: dashboard/test-utils/python-server.py:50-100
Workers Entity
| Field | Type | Description |
|---|---|---|
worker_id | string | Unique worker identifier |
status | enum | Current status: idle, busy, offline |
heartbeat_stats | dict | Metrics including queue_depth, gpu_utilization |
last_heartbeat_time | timestamp | Time of last heartbeat |
current_rollout_id | string | Currently assigned rollout (if busy) |
current_attempt_id | string | Currently executing attempt |
Source: dashboard/src/components/AppDrawer.component.tsx:1-60
Spans Entity (Distributed Tracing)
| Field | Type | Description |
|---|---|---|
rollout_id | string | Associated rollout |
attempt_id | string | Associated attempt |
trace_id | string | Distributed trace identifier |
span_id | string | Unique span identifier |
parent_id | string | Parent span ID for hierarchy |
name | string | Operation name (e.g., classification_pipeline) |
status | TraceStatus | Status with status_code (OK, ERROR) and description |
attributes | dict | Key-value metadata (model, batch_size, accuracy) |
start_time | timestamp | Span start time |
end_time | timestamp | Span end time |
Source: dashboard/src/components/TracesTable.component.tsx:50-120
Component Architecture (Frontend)
Table Components Pattern
The dashboard uses a consistent table component pattern across all pages:
graph TD
A[Page Component] --> B[Table Component]
B --> C[Column Definitions]
B --> D[Filtering Logic]
B --> E[Pagination Controls]
A --> F[useQuery Hook]
F --> G[API Endpoints]| Component | Props | Purpose |
|---|---|---|
RolloutTable | rollouts, totalRecords, statusFilters, onViewTraces | Training rollout display |
WorkersTable | workers, onShowDetails | Worker monitoring |
ResourcesTable | resourcesList, renderRowExpansion | Resource configuration |
TracesTable | spans, onShowSpanDetail | Trace analysis |
Source: dashboard/src/components/WorkersTable.component.tsx:1-80
Drawer Container Pattern
The application uses an AppDrawerContainer for displaying detailed information:
graph TD
A[AppDrawerContainer] --> B[Redux State]
B --> C{Content Type}
C -->|worker-detail| D[WorkerDrawerTitle]
C -->|rollout-detail| E[RolloutDrawer]
C -->|span-detail| F[SpanDetailDrawer]
D --> G[ConnectionIndicator]
G --> H[baseUrl, status, isRefreshing]Source: dashboard/src/components/AppDrawer.component.tsx:60-120
State Management
The frontend uses Redux for state management with the following key selectors:
| Selector | Purpose |
|---|---|
selectConfig | Application configuration (baseUrl, autoRefreshMs) |
selectDrawerIsOpen | Drawer visibility state |
selectDrawerContent | Current drawer content type and data |
selectConnectionState | Backend connection status |
Source: dashboard/src/layouts/AppLayout.tsx:50-80
Connection Management
The dashboard includes a ConnectionIndicator component that displays the connection status to the backend:
| Status | Description |
|---|---|
connected | Successfully connected to backend |
disconnected | Cannot reach backend |
refreshing | Actively reconnecting |
Source: dashboard/src/layouts/AppLayout.tsx:40-45
Training Workflow Integration
Status Lifecycle
Rollouts and attempts follow a defined status lifecycle:
| Status | Description |
|---|---|
pending | Initial state, not yet started |
running | Currently executing |
succeeded | Completed successfully |
failed | Execution failed |
cancelled | Manually cancelled |
Mode Types
| Mode | Description |
|---|---|
train | Training mode with gradient updates |
eval | Evaluation mode without updates |
inference | Production inference mode |
Source: dashboard/src/pages/Rollouts.page.tsx:30-60
Observability Architecture
Trace Hierarchy
Traces are organized in a hierarchical structure:
Trace
└── Span (root)
├── Span (child - preprocess)
├── Span (child - classifier)
└── Span (child - formatter)
Each span captures:
- Execution timing (
start_time,end_time,duration) - Status and error information
- Custom attributes (model, batch_size, accuracy)
- Resource metadata (service name)
Source: dashboard/test-utils/python-server.py:200-300
Attribute Keys
Common span attributes include:
| Attribute | Example Value | Description |
|---|---|---|
type | classification | Operation type |
model | bert-classifier | Model used |
batch_size | 10 | Processing batch size |
accuracy | 0.95 | Achieved accuracy |
timeout | true | Whether operation timed out |
retry | true | Whether this was a retry attempt |
Source: dashboard/src/components/TracesTable.component.tsx:30-50
Resource Configuration Templates
Resources support multiple template engines:
| Engine | Syntax | Example |
|---|---|---|
f-string | {variable} | "Classify: {ticket}" |
jinja | {{ variable }} or {% for %} | "{% for r in results %}{{ r }}{% endfor %}" |
Source: dashboard/test-utils/python-server.py:50-90
Summary
The Agent Lightning system architecture provides:
- Distributed Training - Multiple workers executing rollouts in parallel
- Centralized Configuration - Versioned resource templates for prompts and models
- Real-time Monitoring - Worker heartbeat tracking and status dashboards
- Full Observability - Distributed tracing with hierarchical spans
- State Persistence - Store-based architecture for maintaining system state
The architecture is designed for horizontal scalability, allowing additional workers to be added to increase training throughput while maintaining centralized configuration management and monitoring through the dashboard frontend.
Source: https://github.com/microsoft/agent-lightning / Human Manual
Core Abstractions and Data Models
Related topics: System Architecture, Trainer Component
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Trainer Component
Core Abstractions and Data Models
The Agent Lightning framework relies on a set of foundational abstractions and data models that enable the coordination between runners, tracers, the LightningStore, and training algorithms. These core types are defined in agentlightning/types/ and serve as the canonical data structures used throughout the system for representing tasks, rollouts, attempts, traces, and resources.
Architecture Overview
Agent Lightning operates through a continuous execution loop where multiple components interact. The core abstractions facilitate:
- Trace Emission - Runners and tracers emit spans during execution
- State Synchronization -
LightningStoremaintains synchronized state - Algorithm Consumption - Training algorithms in
agentlightning/algorithm/consume traces to improve agent behavior
graph TD
A[Runners] -->|emit spans| B[Tracers]
B --> C[LightningStore]
C --> D[Algorithms]
D -->|improve behavior| A
C --> E[Dashboard]
F[Resources] -->|configure| ASource: CLAUDE.md
Task and Rollout Models
Task Representation
The Task and related classes define the fundamental unit of work in Agent Lightning. Tasks represent the objectives that agents attempt to accomplish during training and evaluation.
| Class | Purpose |
|---|---|
Task | Core task definition containing input and configuration |
TaskInput | Input data passed to a task |
TaskIfAny | Conditional task input supporting optional parameters |
Dataset | Collection of tasks for batch processing |
Source: agentlightning/types/core.py:1-50
Rollout Lifecycle
Rollouts represent complete execution attempts of a task. The rollout model captures the entire lifecycle from enqueue to completion.
stateDiagram-v2
[*] --> Enqueued: EnqueueRolloutRequest
Enqueued --> InProgress: Runner picks up
InProgress --> Attempted: First attempt completes
Attempted --> InProgress: Retry triggered
InProgress --> [*]: Final attempt
Attempted --> [*]: Success/Failure| Class | Description |
|---|---|
Rollout | Represents a single task execution instance |
RolloutConfig | Configuration for rollout execution |
RolloutMode | Execution mode (training, evaluation, etc.) |
RolloutStatus | Current state of the rollout |
Source: agentlightning/types/core.py:50-100
Attempt Model
Attempts represent individual tries within a rollout, enabling retry mechanisms and granular progress tracking.
| Property | Type | Description |
|---|---|---|
attempt_id | str | Unique identifier for the attempt |
rollout_id | str | Parent rollout identifier |
status | AttemptStatus | Current attempt status |
sequence_id | int | Order within the rollout |
Source: agentlightning/types/core.py:100-150
AttemptedRollout
The AttemptedRollout class aggregates results from all attempts within a rollout:
class AttemptedRollout(BaseModel):
rollout: Rollout
attempts: List[Attempt]
# Aggregated metrics and results
Source: agentlightning/types/core.py:150-180
Tracing Abstractions
OpenTelemetry Integration
Agent Lightning uses OpenTelemetry for distributed tracing. The tracer types provide serialization and interoperability with the broader observability ecosystem.
| Class | Purpose |
|---|---|
Span | Single unit of work in a trace |
SpanCoreFields | Core fields shared across span implementations |
OtelResource | Serializable OpenTelemetry resource representation |
TraceStatus | Span completion status with error information |
Source: agentlightning/types/tracer.py:1-80
Span Structure
Spans form the atomic tracing unit, capturing timing, status, attributes, and relationships:
graph LR
subgraph Span
A[name] --> B[status]
B --> C[attributes]
C --> D[start_time/end_time]
D --> E[parent_id/span_id]
E --> F[resource]
end| Attribute | Description |
|---|---|
name | Human-readable span identifier |
status | TraceStatus with status_code and optional description |
attributes | Key-value metadata dictionary |
parent_id | Reference to parent span (None for root) |
resource | OtelResource containing service metadata |
Source: agentlightning/types/tracer.py:80-120
OtelResource Model
The OtelResource class provides a serializable representation of OpenTelemetry resources:
class OtelResource(BaseModel):
attributes: Attributes
schema_url: str
This model avoids confusion with the application's Resource class and enables span serialization for store persistence.
Source: agentlightning/types/tracer.py:120-150
Span Creation Patterns
#### SpanCoreFields for Lightweight Creation
For span creators that don't require the full span model, SpanCoreFields provides a minimal interface:
class SpanCoreFields(BaseModel):
name: str
status: TraceStatus
attributes: Attributes
start_time: Optional[float]
end_time: Optional[float]
Source: agentlightning/types/tracer.py:150-180
#### Weave Tracer Span Creation
The Weave tracer implementation demonstrates proper span construction with resource attributes:
resource=OtelResource(
attributes={
LightningResourceAttributes.ROLLOUT_ID.value: rollout_id,
LightningResourceAttributes.ATTEMPT_ID.value: attempt_id,
LightningResourceAttributes.SPAN_SEQUENCE_ID.value: sequence_id,
LightningResourceAttributes.TRACER_NAME.value: "weave",
},
schema_url="",
)
Source: agentlightning/tracer/weave.py:1-50
Resource Management
ResourcesUpdate Model
Resources define configurable components that can be versioned and updated:
class ResourcesUpdate(BaseModel):
resources_id: str
version: int
create_time: float
update_time: float
resources: Dict[str, Any]
| Field | Type | Description |
|---|---|---|
resources_id | str | Unique identifier for the resource set |
version | int | Version number for optimistic concurrency |
create_time | float | Unix timestamp of creation |
update_time | float | Unix timestamp of last update |
resources | Dict[str, Any] | Arbitrary resource configuration |
Source: dashboard/test-utils/python-server.py:1-80
Resource Types
Resources support flexible configuration through templates and model definitions:
| Resource Type | Description |
|---|---|
PromptTemplate | Templated prompts with jinja2 or f-string engines |
LLM | Language model configuration with endpoint and sampling parameters |
Custom Dict[str, Any] | Arbitrary configuration dictionaries |
Source: dashboard/test-utils/python-server.py:80-150
Worker Abstraction
Workers represent execution agents that process rollouts:
classDiagram
class Worker {
+worker_id: str
+status: WorkerStatus
+heartbeat_stats: Dict
+last_heartbeat_time: float
+current_rollout_id: Optional[str]
+current_attempt_id: Optional[str]
}| Property | Type | Description |
|---|---|---|
worker_id | str | Unique worker identifier |
status | WorkerStatus | Current status (busy, idle, etc.) |
heartbeat_stats | Dict | Runtime metrics (queue_depth, gpu_utilization) |
last_heartbeat_time | float | Last check-in timestamp |
current_rollout_id | Optional[str] | Currently executing rollout |
Source: agentlightning/types/core.py:180-220
Worker Status States
stateDiagram-v2
[*] --> Idle: Startup
Idle --> Busy: Dequeue rollout
Busy --> Idle: Complete
Busy --> Busy: Heartbeat
Idle --> [*]: Shutdown
Busy --> [*]: ShutdownSource: dashboard/test-utils/python-server.py:150-200
Filtering and Pagination
Query Models
The store supports filtered and paginated queries for efficient data access:
| Class | Purpose |
|---|---|
FilterOptions | Criteria for filtering results |
FilterField | Individual filter condition |
SortOptions | Sorting configuration |
PaginatedResult | Paginated response wrapper |
Source: agentlightning/types/core.py:220-260
Operation Context
The @operation decorator provides a simplified span creation interface for user code:
@operation(name="my_operation")
async def my_function():
# Automatically creates and manages a span
pass
OperationContext Parameters
| Parameter | Type | Description |
|---|---|---|
propagate | bool | Whether spans should use active span processor |
name | Optional[str] | Alias populating OPERATION_NAME attribute |
The decorator supports two usage patterns:
- As a bare decorator:
@operation - As a context manager factory:
with operation(name="custom"):
Source: agentlightning/emitter/annotation.py:1-60
Data Flow Summary
graph TD
subgraph Input
A[Dataset] --> B[Task]
B --> C[EnqueueRolloutRequest]
end
subgraph Execution
C --> D[Runner]
D --> E[Worker]
E --> F[Attempt]
F --> G[Span]
end
subgraph Storage
G --> H[LightningStore]
H --> I[PaginatedResult]
end
subgraph Training
H --> J[Algorithm]
J --> K[Improved Policy]
endKey Type Exports
The agentlightning/types/core.py module exports the following public API:
__all__ = [
"Triplet",
"RolloutLegacy",
"Task",
"TaskInput",
"TaskIfAny",
"RolloutRawResultLegacy",
"RolloutRawResult",
"RolloutMode",
"GenericResponse",
"ParallelWorkerBase",
"Dataset",
"AttemptStatus",
"RolloutStatus",
"RolloutConfig",
"Rollout",
"Attempt",
"AttemptedRollout",
"EnqueueRolloutRequest",
"Hook",
"Worker",
"WorkerStatus",
"PaginatedResult",
"FilterOptions",
"SortOptions",
"FilterField",
]
Source: agentlightning/types/core.py:40-60
Usage Patterns
Creating a Rollout Request
request = EnqueueRolloutRequest(
task_id="task-001",
config=RolloutConfig(mode=RolloutMode.TRAINING),
priority=1
)
Querying with Filters
filters = FilterOptions(
fields=[FilterField(name="status", operator="eq", value="completed")],
sort=SortOptions(field="create_time", direction="desc"),
offset=0,
limit=50
)
Source: https://github.com/microsoft/agent-lightning / Human Manual
Tutorial: Train Your First Agent
Related topics: Tutorial: Writing Agents, Algorithm Zoo
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Tutorial: Writing Agents, Algorithm Zoo
Tutorial: Train Your First Agent
Overview
This tutorial guides you through training your first AI agent using Agent Lightning's reinforcement learning framework. You will learn how to set up a training pipeline, define prompts and resources, create a dataset, and run the APO (Agent Prompt Optimization) algorithm to improve your agent's behavior through feedback-driven learning.
Agent Lightning provides a complete training loop where runners and tracers emit spans, LightningStore keeps them synchronized, and algorithms consume those traces to improve behavior. Source: CLAUDE.md
Prerequisites
Before starting this tutorial, ensure you have:
- Python 3.10+ installed
- Agent Lightning installed following the installation guide
- An OpenAI-compatible API service available
- APO extra dependencies installed
Architecture Overview
Agent Lightning trains agents through a continuous feedback loop:
graph TD
A[Runner - Executes Agent] --> B[Tracer - Emits Spans]
B --> C[LightningStore - Synchronizes Data]
C --> D[Algorithm - Consumes Traces]
D --> E[Improved Agent Behavior]
E --> A
F[Dataset - Training Data] --> D
G[Resources - Prompts/Models] --> ASource: CLAUDE.md
Step 1: Create Your Agent
Begin by defining a simple room booking agent that uses function calling. The agent receives a user request and selects an appropriate room from available options.
# examples/apo/room_selector.py
from agentlightning import Runner, DataProto
from typing import Any
import json
class RoomSelector(Runner):
"""Room booking agent using function calling."""
def run(self, task: str, context: dict | None = None) -> DataProto:
# Define available rooms
rooms = [
{"id": "R001", "name": "Conference A", "capacity": 10},
{"id": "R002", "name": "Meeting Room B", "capacity": 4},
{"id": "R003", "name": "Board Room", "capacity": 20},
]
# Mock LLM response selecting a room
selected_room = rooms[1] # Default to Meeting Room B
return DataProto(
data={
"selected_room": selected_room["name"],
"room_id": selected_room["id"],
},
raw_response=json.dumps(selected_room),
)
Source: examples/apo/room_selector.py
Supported Agent Components
| Component | Description | Usage |
|---|---|---|
Runner | Base class for agent execution | Extend to define custom agent logic |
Trainer | Training orchestration | Manages training loop and workers |
LightningStore | Data synchronization | Stores traces and spans |
OtelTracer | OpenTelemetry span emission | Records execution traces |
Source: examples/apo/apo_debug.py
Step 2: Prepare Your Dataset
Create a training dataset with room booking scenarios. Each task should include the user request and expected room selection.
# examples/apo/room_selector_apo.py
from datasets import load_dataset
def create_room_dataset():
"""Create dataset for room booking tasks."""
# Example tasks for room booking
tasks = [
{
"task": "I need to schedule a meeting for 3 people tomorrow at 2 PM",
"expected_room": "Meeting Room B",
},
{
"task": "We are hosting a team event for 15 team members",
"expected_room": "Board Room",
},
{
"task": "Quick 1-on-1 sync needed this afternoon",
"expected_room": "Meeting Room B",
},
]
return tasks
Source: examples/apo/room_selector_apo.py
Step 3: Define Training Resources
Resources define the prompts and model configurations used by your agent during training. You can tune any resource—typically prompt templates—using reinforcement learning.
from agentlightning.prompts import PromptTemplate
from agentlightning.models import LLM
# Define a tunable prompt template
main_prompt = PromptTemplate(
template="""You are a helpful assistant that helps users book meeting rooms.
Available rooms:
- Conference A: capacity 10
- Meeting Room B: capacity 4
- Board Room: capacity 20
User request: {user_request}
Select the most appropriate room and explain your choice.""",
engine="f-string",
)
Source: examples/apo/apo_debug.py
Resource Types
| Type | Description | Tunable |
|---|---|---|
PromptTemplate | Text templates with variable substitution | Yes |
LLM | Model configuration (endpoint, sampling params) | No |
SystemPrompt | System-level instructions | Yes |
SamplingParameters | Temperature, top_p, max_tokens | No |
Source: examples/apo/README.md
Step 4: Configure the Trainer
The Trainer class orchestrates the training loop. It manages workers, coordinates with the LightningStore, and applies the optimization algorithm.
from agentlightning import Trainer
# Initialize trainer with one worker
trainer = Trainer(
n_workers=1,
# Resources to tune - only these will be optimized
initial_resources={
"main_prompt": main_prompt,
},
)
# Configure the APO algorithm
trainer.configure(
algorithm="APO",
lr=1e-3,
epochs=10,
)
Source: examples/apo/apo_debug.py
Trainer Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
n_workers | int | 1 | Number of parallel training workers |
initial_resources | dict | Required | Resources to optimize |
algorithm | str | Required | Optimization algorithm name |
lr | float | 1e-3 | Learning rate |
epochs | int | 10 | Number of training epochs |
Source: examples/apo/apo_debug.py
Step 5: Implement Reward Function
The reward function evaluates agent outputs and provides feedback signals for reinforcement learning.
from typing import Any
def room_booking_reward(output: Any, expected: dict) -> float:
"""
Calculate reward based on room selection accuracy.
Args:
output: Agent's room selection
expected: Expected room from dataset
Returns:
float: Reward score between 0.0 and 1.0
"""
if not output or not output.data:
return 0.0
selected_room = output.data.get("selected_room", "")
expected_room = expected.get("expected_room", "")
# Exact match gets full reward
if selected_room == expected_room:
return 1.0
# Partial match gets partial reward
if expected_room.lower() in selected_room.lower():
return 0.5
return 0.0
Source: examples/apo/room_selector_apo.py
Step 6: Run the Training Loop
Execute the training with your runner, dataset, and reward function.
import asyncio
from agentlightning import setup_logging
async def train_room_selector():
setup_logging()
# Initialize agent and trainer
agent = RoomSelector()
dataset = create_room_dataset()
trainer = Trainer(
n_workers=1,
initial_resources={"main_prompt": main_prompt},
)
# Run training
results = await trainer.train(
runner=agent,
dataset=dataset,
reward_fn=room_booking_reward,
max_iterations=100,
)
print(f"Training completed: {results}")
if __name__ == "__main__":
asyncio.run(train_room_selector())
Source: examples/apo/apo_debug.py
Understanding the Training Flow
sequenceDiagram
participant User as User Code
participant Trainer as Trainer
participant Runner as RoomSelector
participant Store as LightningStore
participant Algo as APO Algorithm
User->>Trainer: train(runner, dataset, reward_fn)
Trainer->>Runner: execute_task(task)
Runner->>Runner: select_room()
Runner-->>Trainer: output
Trainer->>Store: record_span(rollout_id, attempt_id)
Trainer->>Trainer: calculate_reward(output, expected)
Trainer->>Algo: optimize_step(rewards, traces)
Algo-->>Trainer: updated_resources
Trainer->>Runner: update_resources()
Note over Trainer,Algo: Repeat for max_iterationsDebugging Your Training
Agent Lightning provides multiple debugging approaches:
Approach 1: Runner Mode
Direct execution without training to verify agent logic:
python apo_debug.py --mode runner
Source: examples/apo/apo_debug.py
Approach 2: Hook Mode
Debug with tracing hooks enabled:
python apo_debug.py --mode hook
Approach 3: Trainer Mode
Full training debug with detailed logging:
python apo_debug.py --mode trainer
Viewing Training Traces
During and after training, spans are recorded to the LightningStore. View them in the dashboard:
graph LR
A[Training Run] --> B[Spans Emitted]
B --> C[LightningStore]
C --> D[Dashboard]
D --> E[Trace Visualization]
D --> F[Span Details]The dashboard displays:
| View | Description |
|---|---|
| Rollouts | Complete training iterations |
| Spans | Individual function calls and operations |
| Resources | Tunable prompt templates |
| Metrics | Reward scores and training statistics |
Source: examples/minimal/README.md
Common Issues and Solutions
Issue: Tracer Conflicts
Running multiple modes consecutively in one process may cause tracer conflicts.
Solution: Run each mode in a separate process or ensure proper tracer cleanup between runs.
Source: examples/apo/apo_debug.py
Issue: Missing Dependencies
APO requires additional dependencies not in the core installation.
Solution: Install with extras:
pip install agentlightning[apo]
Source: examples/apo/README.md
Next Steps
After completing this tutorial:
- Advanced Algorithms: Explore custom algorithms in
apo_custom_algorithm.py - Integration: Learn Agent-OS integration for policy-aware training
- Dashboard: Use the dashboard to visualize training progress
- Production: Scale training with multiple workers and distributed execution
Summary
This tutorial covered the essential steps to train your first agent with Agent Lightning:
- Define a
Runnerimplementing your agent logic - Prepare a dataset with tasks and expected outputs
- Configure
PromptTemplateresources for tuning - Implement a reward function for RL feedback
- Use
Trainerto orchestrate the training loop - Debug with multiple modes and visualize traces in the dashboard
The training loop continuously improves your agent by optimizing prompt resources based on reward signals, enabling agents to learn from feedback without manual prompt engineering.
Source: https://github.com/microsoft/agent-lightning / Human Manual
Tutorial: Writing Agents
Related topics: Tutorial: Train Your First Agent, Runner Component
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Tutorial: Train Your First Agent, Runner Component
Tutorial: Writing Agents
This tutorial provides a comprehensive guide to building AI agents using the Agent Lightning framework. It covers the core concepts, architecture, and practical implementation patterns for creating agents that can be trained with reinforcement learning.
Overview
Agent Lightning is a framework designed to train AI agents using reinforcement learning. The framework provides a complete execution stack including tracing, storage, and algorithm components that work together in a continuous loop. Source: CLAUDE.md:1-5
Agents in this framework are built using the LightningStore architecture, which synchronizes data between runners, tracers, and algorithms. The tracers emit spans that capture the agent's execution behavior, and these spans are consumed by algorithms to improve the agent's performance over time. Source: AGENTS.md:1-5
Architecture Overview
The Agent Lightning framework follows a continuous loop architecture where multiple components interact to enable training of AI agents.
graph TD
A[Agent / Runner] -->|Emits Spans| B[Tracer]
B -->|Traces| C[LightningStore]
C -->|Synchronized Data| D[Algorithms]
D -->|Training Signals| A
E[Dashboard] -->|Inspect & Debug| CCore Components
| Component | Purpose | Location |
|---|---|---|
LightningStore | Central data store for traces and rollouts | agentlightning/store/ |
OtelTracer | OpenTelemetry-based span emission | Via OtelTracer class |
AgentOpsTracer | AgentOps integration for tracing | Via AgentOpsTracer class |
Span | Individual trace unit | Data model |
emit_reward | Reward signal emission | API function |
Source: examples/minimal/write_traces.py:1-40
Writing Your First Agent
Basic Agent Structure
An agent in Agent Lightning is built around the tracing and store infrastructure. The minimal component showcase in examples/minimal/ demonstrates how individual building blocks behave in isolation. Source: examples/minimal/README.md:1-10
Setting Up the Tracer
The framework supports two primary tracing mechanisms:
- OtelTracer: OpenTelemetry-based tracing that can forward spans to a remote store client
- AgentOpsTracer: AgentOps integration for agent operations tracking
from agentlightning import OtelTracer, LightningStoreClient, setup_logging
# Initialize logging
setup_logging()
# Create tracer with optional remote store client
tracer = OtelTracer(
rollout_id="ro-001",
attempt_id="at-001",
store_client=None # Or LightningStoreClient(endpoint="...")
)
Source: examples/minimal/write_traces.py:40-60
Opening Rollouts and Emitting Spans
Rollouts represent a single execution attempt of an agent, and attempts within rollouts allow for retry logic and tracking.
# Open a new rollout
tracer.open_rollout(rollout_id="ro-001", user_id="user-123")
# Open an attempt within the rollout
tracer.open_attempt(attempt_id="at-001", sequence_id=1)
# Emit spans during agent execution
tracer.emit_span(
name="tool_execution",
attributes={
"tool": "web_search",
"query": "onboarding summary"
}
)
# Close attempt and rollout
tracer.close_attempt()
tracer.close_rollout()
Source: examples/minimal/write_traces.py:60-85
Span Data Model
Spans are the fundamental unit of tracing in Agent Lightning. Each span captures a discrete unit of work within an agent's execution.
Span Attributes
| Attribute | Type | Description |
|---|---|---|
rollout_id | string | Unique identifier for the rollout |
attempt_id | string | Unique identifier for the attempt |
sequence_id | integer | Order of the span within the attempt |
trace_id | string | Trace grouping identifier |
span_id | string | Unique span identifier |
parent_id | string | Parent span ID for hierarchy |
name | string | Human-readable span name |
status | TraceStatus | Execution status (OK, ERROR) |
attributes | dict | Key-value metadata |
start_time | datetime | Span start timestamp |
end_time | datetime | Span end timestamp |
Source: dashboard/test-utils/python-server.py:1-100
Example Span Creation
from agentlightning import Span, TraceStatus
from datetime import datetime
span = Span(
rollout_id="ro-story-001",
attempt_id="at-story-010",
sequence_id=3,
trace_id="trace-001-main",
span_id="span-003-tool",
parent_id="span-001-root",
name="tool_execution",
status=TraceStatus(status_code="OK", description=None),
attributes={"tool": "web_search", "query": "onboarding summary"},
events=[],
links=[],
start_time=datetime.now(),
end_time=datetime.now(),
context=None,
parent=None,
resource=OtelResource(attributes={"service.name": "tool-service"}, schema_url="")
)
Source: dashboard/test-utils/python-server.py:100-130
Using Operations
The framework provides an operation decorator for recording synthetic operation spans with additional linking capabilities.
from agentlightning.operation import operation
from agentlightning.utils.otel import make_link_attributes, make_tag_attributes
# Record an operation span
@operation(name="classify_ticket")
def classify_ticket(ticket: str):
with make_link_attributes(linked_rollout_id="ro-001", linked_attempt_id="at-001"):
# Operation execution
result = llm.classify(ticket)
# Tag the reward
make_tag_attributes(tags={"accuracy": 0.95})
emit_reward(reward=0.95, name="classification_accuracy")
return result
Source: examples/minimal/write_traces.py:20-35
LightningStore Integration
The LightningStore keeps tracers and runners synchronized, serving as the central data repository.
from agentlightning.store import InMemoryLightningStore
# Use in-memory store for local development
store = InMemoryLightningStore()
# Or connect to a remote store server
store = LightningStoreClient(endpoint="http://localhost:45993")
Source: examples/minimal/write_traces.py:25-35
Store Server CLI
Start a LightningStore server with OTLP enabled:
agl store --port 45993 --log-level DEBUG
Source: examples/minimal/write_traces.py:15-20
Workflow Execution Model
Agents in Agent Lightning follow a structured execution model with rollouts, attempts, and spans.
graph LR
subgraph Rollout[Rollout: ro-001]
subgraph Attempt1[Attempt: at-001]
S1[Span: root]
S2[Span: preprocess]
S3[Span: classify]
S1 --> S2
S2 --> S3
end
subgraph Attempt2[Attempt: at-002]
S4[Span: root]
S5[Span: preprocess]
S6[Span: classify]
S4 --> S5
S5 --> S6
end
endState Transitions
| State | Description |
|---|---|
pending | Rollout/attempt created but not started |
running | Currently executing |
completed | Successfully finished |
failed | Execution failed |
cancelled | Manually cancelled |
Source: dashboard/src/components/RolloutTable.component.tsx:1-50
Reward Emission
Agents emit reward signals that algorithms consume during training.
from agentlightning import emit_reward
# Emit a reward with metadata
emit_reward(
reward=0.85,
name="task_success",
attributes={
"task_id": "classification",
"accuracy": 0.85,
"latency_ms": 150
}
)
Reward Span Attributes
| Attribute | Type | Description |
|---|---|---|
reward.value | float | Numeric reward value |
reward.name | string | Reward signal identifier |
reward.attributes | dict | Additional metadata |
Dashboard Integration
The Agent Lightning Dashboard provides real-time inspection of store data and debugging capabilities for running experiments. Source: dashboard/README.md:1-10
Drawer Components
The dashboard uses drawer components to display detailed information:
// Worker detail drawer
if (content.type === 'worker-detail') {
const { worker } = content;
const title = <WorkerDrawerTitle worker={worker} />;
const body = <JsonEditor value={worker} />;
return { title, body };
}
// Trace detail drawer
if (content.type === 'trace-detail') {
const { span } = content;
const title = <TraceDrawerTitle span={span} />;
const body = <JsonEditor value={span} />;
return { title, body };
}
Source: dashboard/src/components/AppDrawer.component.tsx:1-50
Minimal Examples Reference
The examples/minimal/ directory provides documented examples for each building block:
| Component | File | Purpose |
|---|---|---|
| LightningStore + OTLP | write_traces.py | Shows OtelTracer and AgentOpsTracer for rollouts and spans |
| MultiMetrics | write_metrics.py | Console and Prometheus metrics backends |
| LLM Proxying | llm_proxy.py | Request routing through /rollout/<id>/attempt/<id> namespaces |
| vLLM Lifecycle | vllm_server.py | Context manager for vLLM server lifecycle |
Source: examples/minimal/README.md:10-30
Best Practices
- Use descriptive span names: Names like
tool_executionandclassification_pipelinemake debugging easier in the dashboard. - Set appropriate parent IDs: Maintain span hierarchy for better trace visualization.
- Emit rewards consistently: Use
emit_rewardafter each task completion to enable algorithm training. - Handle failures explicitly: Set appropriate
TraceStatuscodes and descriptions for failed spans. - Use operations for complex workflows: The
@operationdecorator simplifies recording complex multi-step processes.
Next Steps
- Explore the API Reference for detailed method signatures
- Learn about Training Algorithms that consume traces
- Set up the Dashboard for real-time monitoring
- Review Examples for complete agent implementations
Source: https://github.com/microsoft/agent-lightning / Human Manual
Trainer Component
Related topics: Runner Component, LightningStore, Algorithm Zoo
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Runner Component, LightningStore, Algorithm Zoo
Trainer Component
The Trainer is the core orchestration component in Agent Lightning responsible for managing the reinforcement learning training loop. It coordinates runners, algorithms, and the LightningStore to execute agent training with scalable execution strategies.
Overview
The Trainer serves as the central control plane that:
- Manages worker processes for parallel rollout execution
- Coordinates between the agent runner and learning algorithm
- Persists training traces to the LightningStore
- Provides pluggable execution strategies for different deployment scenarios
Source: agentlightning/trainer/registry.py:1-6
Architecture
Component Interactions
graph TD
T[Trainer] --> R[Runner<br/>Agent Execution]
T --> A[Algorithm<br/>Policy Update]
T --> S[LightningStore<br/>Trace Storage]
T --> E[ExecutionStrategy]
E --> SHM[SharedMemory<br/>Local Workers]
E --> CS[ClientServer<br/>Remote Workers]
R --> S
A --> STraining Loop Flow
sequenceDiagram
participant T as Trainer
participant R as Runner
participant S as LightningStore
participant A as Algorithm
T->>R: Initialize with config
T->>A: Load algorithm
T->>S: Connect store
loop Training Steps
T->>R: Execute rollouts
R->>S: Emit spans
T->>S: Retrieve traces
T->>A: Process traces
A->>T: Policy update
endCore Configuration
Constructor Parameters
| Parameter | Type | Default | Description | |
|---|---|---|---|---|
n_workers | int | 1 | Number of parallel worker processes | |
algorithm | `Algorithm \ | str` | None | Learning algorithm (name or instance) |
runner | `Runner \ | None` | None | Agent runner for execution |
reward_fn | `RewardFn \ | None` | None | Reward function for training |
execution_strategy | str | "shm" | Strategy: "shm", "cs" |
Source: examples/apo/apo_custom_algorithm_trainer.py:35-37
Execution Strategy Registry
The Trainer supports multiple execution strategies through a registry pattern:
ExecutionStrategyRegistry = {
"shm": "agentlightning.execution.shared_memory.SharedMemoryExecutionStrategy",
"cs": "agentlightning.execution.client_server.ClientServerExecutionStrategy",
}
Source: agentlightning/trainer/registry.py:1-6
| Strategy | Description | Use Case |
|---|---|---|
shm | Shared Memory - Local multi-process execution | Single-node GPU training |
cs | Client-Server - Remote worker communication | Distributed deployments |
Usage Patterns
Basic Training with GRPO Algorithm
from agentlightning import Trainer
trainer = Trainer(
runner=runner,
reward_fn=reward_fn,
algorithm="GRPO"
)
trainer.train()
Source: contrib/recipes/agentos/README.md:40-47
Custom Algorithm Integration
The Trainer accepts custom algorithms decorated with the @algo decorator:
from agentlightning import Trainer
from agentlightning.algorithm import algo
from agentlightning.store import LightningStore
@algo
async def custom_algorithm(*, store: LightningStore):
# Process traces from store
return policy_update
trainer = Trainer(n_workers=1, algorithm=custom_algorithm)
trainer.fit(rollout_fn)
Source: examples/apo/apo_custom_algorithm_trainer.py:28-37
Parallel Training with Multiple Workers
from agentlightning import Trainer
trainer = Trainer(
n_workers=4, # 4 parallel workers
execution_strategy="shm", # Shared memory for local execution
algorithm="PPO",
runner=runner
)
trainer.train()
Integration with Agent-OS
The Trainer integrates with Agent-OS for policy-governed training:
from agentlightning import Trainer
from agentlightning.contrib.runner.agentos import AgentOSRunner
from agentlightning.contrib.reward.agentos import PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy
# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(deny=["DROP", "DELETE"]))
# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)
# Train with policy-aware rewards
trainer = Trainer(
runner=runner,
reward_fn=PolicyReward(kernel),
algorithm="GRPO"
)
trainer.train()
Source: contrib/recipes/agentos/README.md:25-45
Workflow Phases
| Phase | Description |
|---|---|
| Initialization | Load algorithm, connect store, spawn workers |
| Rollout | Execute agent episodes in parallel workers |
| Trace Collection | Retrieve spans from LightningStore |
| Algorithm Update | Process traces and update policy |
| Iteration | Repeat rollout-collect-update cycle |
LightningStore Integration
The Trainer maintains bidirectional synchronization with LightningStore:
- Span Emission: Workers emit execution traces during rollout
- Trace Retrieval: Algorithm reads completed traces for learning
- Persistence: Training state survives worker restarts
Source: CLAUDE.md:4-6
Command-Line Interface
The Trainer can be invoked via the agl CLI:
# Start training
agl store
python my_training_script.py algo
python my_training_script.py runner
Or programmatically:
python my_training_script.py
Source: examples/apo/apo_custom_algorithm_trainer.py:12-20
Extending the Trainer
Custom Execution Strategy
Add new strategies to the registry:
# In agentlightning/trainer/registry.py
ExecutionStrategyRegistry["custom"] = "mymodule.CustomExecutionStrategy"
Custom Algorithm
Decorate async functions with @algo:
from agentlightning.algorithm import algo
@algo
async def my_algorithm(*, store: LightningStore):
traces = await store.traces.get_all()
# Process traces
return update
Dependencies
| Dependency | Purpose |
|---|---|
LightningStore | Trace persistence and retrieval |
Algorithm | Policy learning logic |
Runner | Agent execution environment |
ExecutionStrategy | Worker orchestration |
RewardFn | Training signal computation |
See Also
- Agent Lightning Architecture - System-wide architecture overview
- Algorithm Component - Learning algorithm details
- LightningStore - Trace storage system
- Execution Strategies - Available execution modes
Source: https://github.com/microsoft/agent-lightning / Human Manual
Runner Component
Related topics: Trainer Component, Tutorial: Writing Agents
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Trainer Component, Tutorial: Writing Agents
Runner Component
The Runner component is the core execution engine in Agent Lightning responsible for managing agent lifecycle, task processing, and telemetry collection. Runners serve as the bridge between the high-level Trainer orchestration and the underlying LitAgent implementation, handling initialization, worker management, and graceful shutdown.
Overview
Runners execute agents in a continuous loop where they poll the LightningStore for tasks, execute agent logic, and emit tracing spans for algorithm consumption. The Runner architecture supports both standard execution through LitAgentRunner and legacy compatibility through LegacyAgentRunner.
Source: agentlightning/runner/__init__.py:1-11
from .agent import LitAgentRunner
from .base import Runner
from .legacy import LegacyAgentRunner
__all__ = [
"Runner",
"LegacyAgentRunner",
"LitAgentRunner",
]
Architecture
graph TD
A[Trainer] --> B[Runner Fleet]
B --> C[LitAgentRunner]
B --> D[LegacyAgentRunner]
C --> E[LitAgent]
D --> F[AgentLightningClient]
E --> G[LightningStore]
F --> G
E --> H[Tracer]
H --> GRunner Hierarchy
| Class | Purpose | Source |
|---|---|---|
Runner | Abstract base class defining the runner interface | base.py |
LitAgentRunner | Primary runner implementation for standard agent execution | agent.py |
LegacyAgentRunner | Runner for backward compatibility with AgentOps integration | legacy.py |
Runner Base Class
The Runner class defines the core interface that all runner implementations must follow. It establishes the lifecycle methods and execution patterns.
Source: agentlightning/runner/base.py:1-20
Lifecycle Methods
The runner lifecycle consists of four key phases:
graph LR
A[init] --> B[init_worker]
B --> C[iter/step]
C --> D[teardown_worker]
D --> E[teardown]| Method | Purpose | Must Implement |
|---|---|---|
init(agent, hooks) | Initialize runner with agent and hooks | Yes |
init_worker(worker_id, store) | Per-worker initialization with store | Yes |
teardown() | Release resources from init() | Yes |
teardown_worker(worker_id) | Release per-worker resources | Yes |
Context Manager Pattern
Runners support a context manager pattern for automatic resource management:
with runner.run_context(agent=agent, store=store, hooks=hooks) as runner:
# Runner is initialized and ready
await runner.iter()
# Automatic teardown on exit
Source: agentlightning/runner/base.py:52-86
The run_context helper ensures proper cleanup even when exceptions occur:
try:
self.init(agent=agent, hooks=hooks)
_initialized = True
self.init_worker(worker_id=0, store=store)
_worker_initialized = True
yield self
finally:
try:
if _worker_initialized:
self.teardown_worker(worker_id=worker_id if worker_id is not None else 0)
except Exception:
logger.error("Error during runner worker teardown", exc_info=True)
try:
if _initialized:
self.teardown()
except Exception:
logger.error("Error during runner teardown", exc_info=True)
Execution Methods
| Method | Description | Behavior |
|---|---|---|
iter(event) | Run continuously until event or no tasks | Abstract - subclasses implement |
step() | Execute single unit of work | Abstract - subclasses implement |
run() | Legacy run method | Raises RuntimeError - use iter() or step() |
Source: agentlightning/runner/base.py:88-102
Warning: Therun()method raisesRuntimeErrorbecause its behavior is undefined. Always useiter()orstep()instead.
LitAgentRunner
LitAgentRunner is the primary runner implementation that manages the agent-runner relationship, hook registration, and tracer integration.
Source: agentlightning/runner/agent.py:1-30
Initialization Flow
sequenceDiagram
participant Trainer
participant LitAgentRunner
participant LitAgent
participant Tracer
participant LightningStore
Trainer->>LitAgentRunner: init(agent, hooks)
LitAgentRunner->>LitAgent: set_runner(self)
LitAgentRunner->>Tracer: init()
Trainer->>LitAgentRunner: init_worker(worker_id, store)
LitAgentRunner->>Tracer: init_worker(worker_id, store)Key Properties
| Property | Type | Description |
|---|---|---|
agent | LitAgent[T_task] | The agent instance (via get_agent()) |
store | LightningStore | The backing store (via get_store()) |
worker_id | Optional[int] | Unique worker identifier |
tracer | Tracer | Tracer for span emission |
Source: agentlightning/runner/agent.py:90-110
Accessor Methods
def get_agent(self) -> LitAgent[T_task]:
"""Get the agent instance."""
if self._agent is None:
raise ValueError("Agent not initialized. Call init() first.")
return self._agent
def get_store(self) -> LightningStore:
"""Get the store instance."""
if self._store is None:
raise ValueError("Store not initialized. Call init_worker() first.")
return self._store
def get_worker_id(self) -> str:
"""Get the formatted worker ID string."""
return f"Worker-{self.worker_id}" if self.worker_id is not None else "Worker-Unknown"
Logging Prefix
The _log_prefix() method generates consistent log prefixes for traceability:
def _log_prefix(self, rollout_id: Optional[str] = None) -> str:
"""Generate a standardized log prefix for the current worker."""
# Returns format: "[Worker-{id}] [{rollout_id}]"
LegacyAgentRunner
LegacyAgentRunner provides backward compatibility for workflows using the AgentOps integration and AgentLightningClient communication pattern.
Source: agentlightning/runner/legacy.py:1-35
Attributes
| Attribute | Type | Description |
|---|---|---|
agent | LitAgent[Any] | The agent instance |
client | AgentLightningClient | Server communication client |
tracer | Tracer | Tracer instance for span emission |
worker_id | Optional[str] | Worker identifier |
max_tasks | Optional[int] | Maximum tasks before stopping |
Architecture
graph TD
A[LegacyAgentRunner] --> B[LitAgent]
A --> C[AgentLightningClient]
A --> D[Tracer]
C --> E[Server]
D --> F[LightningStore]
B --> FHook System Integration
Runners integrate with the hook system to provide extensibility at key lifecycle points:
Source: agentlightning/types/core.py:1-30
| Hook | Timing | Purpose |
|---|---|---|
on_trace_start | Before tracer enters trace context | Logging, metric collection, resource setup |
on_trace_end | After rollout completes, before tracer exits | Logging, cleanup |
on_rollout_start | Before rollout attempt begins | Per-attempt initialization |
on_rollout_end | After rollout attempt completes | Result processing, cleanup |
Hooks are registered during initialization and called by the runner at appropriate points during execution.
Trainer Integration
Runners are instantiated and managed by the Trainer class, which orchestrates the entire training loop:
Source: agentlightning/trainer/trainer.py:40-60
class Trainer(TrainerLegacy):
"""High-level orchestration layer that wires Algorithm <-> Runner <-> Store."""
# Runner fleet configuration
n_runners: int # Number of agent runners to run in parallel
max_rollouts: Optional[int] # Maximum rollouts per runner
strategy: ExecutionStrategy # Process management strategy
tracer: Tracer # Tracer instance for telemetry
hooks: Sequence[Hook] # Lifecycle callbacks
Training Configuration
| Parameter | Type | Description |
|---|---|---|
n_runners | int | Number of parallel agent runners |
max_rollouts | Optional[int] | Stop after N rollouts (None = unlimited) |
strategy | ExecutionStrategy | Spawning strategy (shared memory, client/server) |
tracer | Tracer | Tracer class or config for span collection |
hooks | Sequence[Hook] | Lifecycle callback instances |
Execution Flow
graph TD
A[Trainer.fit/dev] --> B[Spawn Runner Fleet]
B --> C[For each Runner]
C --> D[runner.run_context]
D --> E[init + init_worker]
E --> F[iter/event loop]
F --> G{Tasks available?}
G -->|Yes| H[Execute step]
H --> I[Emit spans to Store]
I --> F
G -->|No| J[Exit loop]
J --> K[teardown_worker]
K --> L[teardown]Context Manager Usage
For debugging or standalone usage outside the Trainer stack:
from agentlightning import LitAgentRunner, InMemoryLightningStore
# Create store and agent
store = InMemoryLightningStore()
agent = MyLitAgent()
# Use context manager
runner = LitAgentRunner(tracer=AgentOpsTracer())
with runner.run_context(agent=agent, store=store) as runner:
# Runner initialized and ready
worker_id = runner.get_worker_id()
print(f"Running on {worker_id}")
# Run until complete
await runner.iter()
# Automatic cleanup
Source: agentlightning/runner/base.py:88-113
Error Handling
Runners implement robust error handling during teardown:
| Phase | Error Behavior | Recovery |
|---|---|---|
teardown_worker | Logged but doesn't propagate | Continue to teardown |
teardown | Logged but doesn't propagate | Context manager completes |
This ensures that multiple cleanup errors don't mask the original failure and that partial cleanup still occurs.
Summary
The Runner component provides:
- Lifecycle Management - Consistent init/teardown patterns via context managers
- Worker Isolation - Per-worker initialization with dedicated store connections
- Hook Integration - Extensibility through lifecycle callbacks
- Telemetry - Built-in tracer integration for span emission
- Trainer Integration - Seamless orchestration within the training loop
Runners are the execution backbone of Agent Lightning, translating high-level training commands into agent task processing while maintaining observability through distributed tracing.
Source: https://github.com/microsoft/agent-lightning / Human Manual
LightningStore
Related topics: System Architecture, Trainer Component
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Trainer Component
LightningStore
LightningStore is the central data persistence and synchronization layer in Agent Lightning. It manages the lifecycle of AI agent training workflows, including rollouts, attempts, spans, resources, and worker state. The store serves as the backbone for the training loop, enabling distributed execution, tracing, and experiment tracking.
Overview
LightningStore provides a unified interface for:
- Rollout Management: Tracking agent task executions from enqueue to completion
- Span Recording: Capturing fine-grained traces of agent operations via OpenTelemetry
- Resource Management: Storing and versioning agent configurations, prompts, and model definitions
- Worker Coordination: Managing distributed worker states and heartbeats
- Metrics Collection: Aggregating training metrics through Prometheus integration
Source: agentlightning/store/base.py
Architecture
LightningStore follows a pluggable backend architecture with a unified async interface.
graph TD
subgraph "Client Layer"
Runner[Runner] --> Tracer[Tracers<br/>OtelTracer<br/>AgentOpsTracer]
Tracer --> Client[LightningStoreClient]
end
subgraph "Server Layer"
Client --> |HTTP/gRPC| Server[LightningStoreServer]
Server --> Collections[LightningCollections]
end
subgraph "Storage Backends"
Collections --> InMemory[InMemoryLightningStore]
Collections --> SQLite[SQLiteLightningStore]
Collections --> Mongo[MongoLightningStore]
end
subgraph "Thread Safety"
Store[Any Store] --> Threaded[LightningStoreThreaded]
endCore Components
| Component | Purpose |
|---|---|
LightningStore | Abstract base class defining the store interface |
LightningStoreClient | HTTP client for remote store communication |
LightningStoreServer | FastAPI-based server handling store operations |
LightningCollections | Organized data collections (rollouts, spans, resources, workers) |
LightningStoreThreaded | Thread-safe wrapper for concurrent access |
Source: agentlightning/store/client_server.py
Data Models
Core Types
The store operates on these fundamental data structures:
| Model | Description |
|---|---|
Rollout | A complete task execution with status, timestamps, and metadata |
Attempt | A single attempt within a rollout (supports retries) |
Span | Fine-grained trace data for agent operations |
TaskInput | Input data for a task (prompt, parameters) |
Worker | Worker node state and heartbeat information |
ResourcesUpdate | Versioned resource configuration storage |
RolloutConfig | Configuration for rollout execution |
Source: agentlightning/types/core.py
Rollout Status Lifecycle
stateDiagram-v2
[*] --> Pending: enqueue_rollout
Pending --> Running: start_rollout
Running --> Completed: finish_rollout
Running --> Failed: fail_rollout
Completed --> [*]
Failed --> [*]
Running --> Attempted: start_attempt
Attempted --> Running: finish_attemptThe status values are:
pending- Queued for executionrunning- Currently executingcompleted- Successfully finishedfailed- Execution failed
Source: agentlightning/store/base.py
API Endpoints
The server exposes REST endpoints under /v1/agl:
| Endpoint | Method | Description |
|---|---|---|
/rollouts | POST | Enqueue new rollouts |
/rollouts/{id} | GET | Retrieve rollout by ID |
/rollouts/{id}/start | POST | Mark rollout as started |
/rollouts/{id}/finish | POST | Complete a rollout |
/rollouts/{id}/attempt | POST | Start a new attempt |
/rollouts/{id}/attempt/{aid}/finish | POST | Finish an attempt |
/spans | POST | Record span data |
/spans/search | POST | Query spans with filters |
/resources | POST | Add new resources |
/resources/{id} | GET/PUT | Get or update resources |
/workers | POST | Register worker |
/workers/{id}/heartbeat | POST | Worker heartbeat |
/statistics | GET | Store statistics |
Source: agentlightning/store/client_server.py
Implementation Backends
In-Memory Store
The InMemoryLightningStore provides a lightweight, zero-dependency backend suitable for single-node execution and testing.
Key characteristics:
- All data stored in process memory
- Supports collections with atomic transactions
- Built-in size estimation for memory monitoring
- Fast for development and small-scale experiments
from agentlightning import InMemoryLightningStore
store = InMemoryLightningStore()
Source: agentlightning/store/memory.py
SQLite Store
SQLite backend provides persistent storage with ACID guarantees, suitable for single-node deployments requiring durability.
MongoDB Store
MongoDB backend supports distributed deployments with horizontal scaling, providing high throughput for large-scale training runs.
Thread Safety
The LightningStoreThreaded class wraps any store implementation to provide thread-safe access:
from agentlightning.store.threading import LightningStoreThreaded
# Wrap any store with thread safety
threaded_store = LightningStoreThreaded(store)
Thread safety features:
- Uses
threading.Lockfor synchronization - Guarantees atomic operations across concurrent requests
- Maintains all original store capabilities
- Exposes
thread_safe: Trueandasync_safe: Truein capabilities
Source: agentlightning/store/threading.py
Collection Operations
LightningStore uses a collection-based data organization pattern:
# Atomic write operation
async with store.collections.atomic(mode="w", snapshot=..., labels=["resources"]) as collections:
await collections.resources.insert([update])
Supported Collections
| Collection | Purpose |
|---|---|
rollouts | Task execution records |
attempts | Individual attempt tracking |
spans | OpenTelemetry trace spans |
resources | Versioned configurations |
workers | Worker state management |
Source: agentlightning/store/collection_based.py
Decorators and Instrumentation
The store layer uses several decorators for observability and reliability:
| Decorator | Purpose |
|---|---|
@tracked | Records operation metrics and timing |
@healthcheck_before | Validates store health before operations |
@_with_collections_execute | Manages collection lifecycle and error handling |
Integration with Tracers
LightningStore integrates with OpenTelemetry through tracers:
from agentlightning import OtelTracer, AgentOpsTracer
tracer = OtelTracer(store=store)
Tracing workflow:
sequenceDiagram
participant Agent
participant Tracer
participant Store
participant OTLP
Agent->>Tracer: Create span
Tracer->>Store: Record span data
Store->>Store: Persist to backend
Tracer->>OTLP: Export spans (optional)Source: examples/minimal/write_traces.py
Usage Examples
Basic Store Operations
from agentlightning import InMemoryLightningStore
# Create store
store = InMemoryLightningStore()
# Enqueue a task
rollout = await store.enqueue_rollout(
input={"prompt": "Solve this problem"},
mode="train"
)
# Dequeue for processing
task = await store.dequeue_rollout(worker_id="worker-1")
# Complete the rollout
await store.finish_rollout(
rollout_id=task.rollout.rollout_id,
attempt_id=task.attempt.attempt_id,
response={"answer": "42"}
)
Server Setup
# Start a LightningStore server
agl store --port 45993 --log-level DEBUG
Client Connection
from agentlightning import LightningStoreClient
client = LightningStoreClient(base_url="http://localhost:45993")
# All operations work through the client
rollouts = await client.list_rollouts()
Capabilities
The store reports its capabilities through the capabilities property:
| Capability | Description |
|---|---|
async_safe | Supports async operations |
thread_safe | Supports concurrent thread access |
distributed | Supports multi-node deployment |
persistence | Data survives restarts |
Source: agentlightning/store/threading.py
CLI Commands
The agl CLI provides store management:
# Start store server
agl store --port 45993 --log-level DEBUG
# Prometheus metrics endpoint
agl prometheus
Source: agentlightning/cli/__init__.py
Source: https://github.com/microsoft/agent-lightning / Human Manual
Algorithm Zoo
Related topics: Tutorial: Train Your First Agent
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Tutorial: Train Your First Agent
Algorithm Zoo
Overview
The Algorithm Zoo is a modular collection of training algorithms that consume execution traces from the Agent Lightning runtime to improve agent behavior through reinforcement learning and prompt optimization. Source: CLAUDE.md
Agent Lightning runs through a continuous loop where runners and tracers emit spans, LightningStore keeps them synchronized, and algorithms in agentlightning/algorithm/ consume those traces to improve behavior. Source: CLAUDE.md
Architecture
The Algorithm Zoo follows a producer-consumer pattern where the store acts as the central synchronization hub:
graph TD
A[Runners] -->|emit spans| B[LightningStore]
C[Tracers] -->|emit spans| B
B -->|traces| D[Algorithm Zoo]
D -->|policy updates| E[Improved Agent Behavior]
B -->|traces| F[Dashboard]Available Algorithms
APO (Adaptive Prompt Optimization)
APO is a prompt optimization algorithm that iteratively refines prompt templates based on reward signals collected from agent rollouts.
#### How APO Works
The APO algorithm maintains a collection of prompt candidates and evaluates each one against task objectives. Based on the reward signals, it selects and refines the most effective prompts. Source: examples/apo/apo_custom_algorithm.py:34-37
async def apo_algorithm(*, store: agl.LightningStore):
"""
An example of how a prompt optimization works.
"""
prompt_candidates = [
"You are a helpful assistant. {any_question}",
"You are a knowledgeable AI. {any_question}",
"You are a friendly chatbot. {any_question}",
]
prompt_and_rewards: list[tuple[str, float]] = []
#### Custom APO Algorithm
To create a custom algorithm, wrap your async function with the @algo decorator. Source: examples/apo/apo_custom_algorithm_trainer.py:28-39
from agentlightning.algorithm import algo
@algo
async def apo_algorithm_usable_in_trainer(*, store: LightningStore):
"""
You need to wrap the apo_algorithm in an algo decorator to make it usable in trainer.
"""
return await apo_algorithm(store=store)
VERL (Value-Enhanced Reinforcement Learning)
VERL is a full training algorithm that integrates with the VERL library for GPU-accelerated reinforcement learning. Source: examples/tinker/q20_train.py:43-52
algo_verl_parser = subparsers.add_parser("verl", help="Launch the full training algorithm with VERL.")
algo_verl_parser.add_argument("--port", type=int, default=4747, help="Port for the AgentLightning store.")
algo_verl_parser.add_argument(
"--model",
choices=("qwen25", "qwen3"),
default="qwen3",
help="Model variant to train.",
)
algo_verl_parser.add_argument("--search", action="store_true", help="Enable search tool.")
FAST (Fast Algorithm Suite Toolkit)
The FAST algorithm provides lightweight optimization capabilities for rapid experimentation.
Running Algorithms
Option A: Separate Components
Start the store, algorithm, and runner in three separate terminals: Source: examples/apo/README.md:10-24
# Terminal 1: Start the store
agl store
# Terminal 2: Run the algorithm
python apo_custom_algorithm.py algo
# Terminal 3: Run the rollout runner
python apo_custom_algorithm.py runner
Option B: Integrated Trainer
Use the integrated trainer that handles all components: Source: examples/apo/apo_custom_algorithm_trainer.py:47-49
from agentlightning import Trainer, setup_logging
trainer = Trainer(n_workers=1, algorithm=apo_algorithm_usable_in_trainer)
trainer.fit(apo_rollout)
Algorithm Decorator
The @algo decorator transforms any async algorithm function into a component that can be used with the Trainer. It injects the LightningStore as a keyword argument. Source: examples/apo/apo_custom_algorithm_trainer.py:28-39
Algorithm Configuration
Common Parameters
| Parameter | Type | Description |
|---|---|---|
store | LightningStore | Central store for traces and resources |
n_workers | int | Number of parallel workers |
port | int | Port for store connection (default: 4747) |
VERL-Specific Options
| Option | Choices | Default | Description |
|---|---|---|---|
--model | qwen25, qwen3 | qwen3 | Model variant to train |
--port | int | 4747 | Store connection port |
--search | flag | False | Enable search tool |
Workflow
graph LR
A[Define Prompt Candidates] --> B[Loop Through Candidates]
B --> C[Update Resources in Store]
C --> D[Run Rollout with Runner]
D --> E[Collect Reward Signal]
E --> F[Update Prompt Template]
F --> BExtending the Algorithm Zoo
Creating Custom Algorithms
- Define an async function that takes
store: LightningStoreas a keyword argument - Wrap it with the
@algodecorator - Implement your optimization logic
- Use the trainer or run separately
Example pattern: Source: examples/apo/apo_custom_algorithm.py:54-72
async def apo_algorithm(*, store: agl.LightningStore):
for prompt in prompt_candidates:
# 1. The optimization algorithm updates the prompt template
console.print(f"[Algo] Updating prompt template to: '{prompt}'")
resources: agl.NamedResources = {
# The "main_prompt" can be replaced with any name
}
# 2. Update resources in store
# 3. Collect reward signals
# 4. Refine prompt based on rewards
Requirements for Custom Algorithms
- Must be async functions
- Must accept
storeas keyword argument - Should be wrapped with
@algodecorator for trainer integration - Must interact with
LightningStorefor state synchronization
Integration with RAG
The Algorithm Zoo can be extended to work with retrieval-augmented generation systems. See the RAG example for integrating FAISS-based retrieval with prompt optimization. Source: examples/rag/README.md
See Also
Source: https://github.com/microsoft/agent-lightning / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
The project may affect permissions, credentials, data exposure, or host boundaries.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 7 source-linked risk signals. Review them before installing or handing real data to the project.
1. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | README/documentation is current enough for a first validation pass.
2. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | last_activity_observed missing
3. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium
4. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.
- Severity: medium
- Finding: No sandbox install has been executed yet; downstream must verify before user use.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.safety_notes | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | No sandbox install has been executed yet; downstream must verify before user use.
5. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | no_demo; severity=medium
6. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | issue_or_pr_quality=unknown
7. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | art_9b504779cfa046a894eeb7c9d3a298c6 | https://github.com/microsoft/agent-lightning#readme | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using agent-lightning with real data or production workflows.
- Intermittent missing openai.chat.completion spans from query_spans (RLin - github / github_issue
- Question about Code Availability for EMPO^2 Paper - github / github_issue
- calc-x example fails on next - github / github_issue
- APO's TraceToMessages adapter fails with multi-turn agent rollouts (KeyE - github / github_issue
- Installation Problem - github / github_issue
- GRPO grouping in multi-turn agent RL: is it valid to mix samples with di - github / github_issue
- Announcing Solantra: Next-Gen Blockchain on Solana - github / github_issue
- Add interaction scripts and token utilities - github / github_issue
- blockchain project - github / github_issue
- Agent Lightning v0.3.0 - github / github_release
- Agent Lightning v0.2.2 - github / github_release
- Agent Lightning v0.2.1 - github / github_release
Source: Project Pack community evidence and pitfall evidence