# https://github.com/letta-ai/letta Project Manual

Generated at: 2026-06-21 16:59:43 UTC

## Table of Contents

- [Project Overview & System Architecture](#page-1)
- [Agent Loop, Memory Blocks & Multi-Agent Groups](#page-2)
- [LLM Provider Integration: Cloud APIs & Local Models](#page-3)
- [Tools, Sandboxes, MCP Servers & Extensibility](#page-4)

<a id='page-1'></a>

## Project Overview & System Architecture

### Related Pages

Related topics: [Agent Loop, Memory Blocks & Multi-Agent Groups](#page-2), [LLM Provider Integration: Cloud APIs & Local Models](#page-3), [Tools, Sandboxes, MCP Servers & Extensibility](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/letta-ai/letta/blob/main/README.md)
- [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py)
- [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py)
- [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py)
- [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py)
- [letta/agents/helpers.py](https://github.com/letta-ai/letta/blob/main/letta/agents/helpers.py)
- [letta/client/streaming.py](https://github.com/letta-ai/letta/blob/main/letta/client/streaming.py)
- [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py)
- [letta/schemas/providers/sglang.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/sglang.py)
- [letta/schemas/providers/cerebras.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/cerebras.py)
- [sandbox/resources/server/README.md](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/README.md)
- [sandbox/resources/server/package.json](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/package.json)
</details>

# Project Overview & System Architecture

## Purpose and Scope

Letta (formerly MemGPT) is an open-source framework for building AI agents with advanced, stateful memory and self-improvement capabilities. As described in [README.md](https://github.com/letta-ai/letta/blob/main/README.md), the project ships two primary entry points: the **Letta Code** CLI for local terminal-based agents, and the **Letta API** with Python and TypeScript SDKs for application integration. The framework is model-agnostic and exposes a full-featured agents API centered on `memory_blocks`, `tools`, and a long-lived `agent_state`.

The repository combines a server runtime, a sandboxed tool-execution environment, multiple LLM provider adapters, and a streaming client. The community's long-standing interest in local LLM support (e.g., [issue #18 "Support for local LLMs like Ollama"](https://github.com/letta-ai/letta/issues/18)) and Azure model compatibility ([issue #2582](https://github.com/letta-ai/letta/issues/2582)) has driven much of the provider-layer design seen in the codebase.

## High-Level Architecture

Letta is organized into cooperating subsystems: agent loops, LLM provider adapters, message and step persistence, streaming clients, and a sandboxed tool runtime.

```mermaid
flowchart LR
    User[User / SDK Client] -->|messages.create| API[Letta API / REST + WS]
    API --> Agent[Agent Loop v1 / v2 / v3]
    Agent -->|build request| Provider[Provider Adapter<br/>OpenAI / Anthropic / Ollama / SGLang / Cerebras / Azure]
    Provider -->|HTTP / SSE| Upstream[Upstream LLM]
    Upstream --> Provider
    Provider --> Agent
    Agent -->|tool calls| Sandbox[Sandboxed TS Tool Server]
    Sandbox --> Agent
    Agent -->|persist| DB[(Messages, Steps, Memory)]
    Agent -->|SSE stream| Client[Streaming Client]
    Client --> User
```

Each request flows from the SDK through the agent loop, which dispatches to a configured provider, persists results to the data store, and streams events back to the client. The sandbox runs user-defined tool functions in an isolated TypeScript process, returning JSON-encoded results to the host.

## Core Subsystems

### Agent Loops

Three agent-loop implementations coexist to support different execution models:

- **`LettaAgent` (v1)** — the classical streaming loop in [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py). It creates a `Step` early with `StepStatus.PENDING`, performs the LLM request, runs `_handle_ai_response`, updates the step with usage statistics, and emits Server-Sent Events including a final `LettaStopReason` chunk.
- **`LettaAgentV2`** — refactored for tool-call correctness in [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py), adding explicit approval/denial message construction, `tool_rule_violated` enforcement, and finer-grained `pre_computed_assistant_message_id` handling.
- **`LettaAgentV3`** — adds advanced context management and parallel-tool-use gating in [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py). It toggles `disable_parallel_tool_use` for `anthropic`/`bedrock` and `parallel_tool_calls` for `openai` only when no non-approval tool rules are attached, and surfaces `logprobs` plus token IDs for RL training via the SGLang native path.

A specialized `VoiceAgent` in [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py) builds OpenAI-style `stream=True` completions and exposes tools like `search_memory` for voice-driven recall.

### Provider Adapters

The provider layer in [letta/schemas/providers/](https://github.com/letta-ai/letta/tree/main/letta/schemas/providers) abstracts model discovery, base URL construction, and prompt formatting.

| Provider | File | Key Behavior |
|---|---|---|
| Ollama | [ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py) | Strips trailing `/v1` for native `/api/tags` and `/api/show` calls; avoids capability filtering for older versions. |
| SGLang | [sglang.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/sglang.py) | Treats SGLang as an OpenAI-compatible endpoint and ensures the base URL ends in `/v1`. |
| Cerebras | [cerebras.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/cerebras.py) | Returns a tier-dependent context window (8K on free, 128K on paid). |

These adapters directly address community demand: Ollama support resolves [#18](https://github.com/letta-ai/letta/issues/18), while Azure-style env wiring is handled alongside other base providers in the same layer (see [#2582](https://github.com/letta-ai/letta/issues/2582)).

### Streaming Client

[letta/client/streaming.py](https://github.com/letta-ai/letta/blob/main/letta/client/streaming.py) parses SSE chunks into typed message objects: `AssistantMessage`, `HiddenReasoningMessage`, `ToolCallMessage`, `ToolReturnMessage`, and `LettaUsageStatistics`. On `SSEError` with an `application/json` body, it falls back to a POST retry and logs the structured error, providing resilient reconnection for long agent runs.

### Sandboxed Tool Runtime

User-defined tool functions run in a Modal-hosted TypeScript container. The skeleton in [sandbox/resources/server/](https://github.com/letta-ai/letta/tree/main/sandbox/resources/server) listens on a Unix socket, deserializes JSON input, and dispatches to `user-function.ts`. As of [v0.16.8](https://github.com/letta-ai/letta/releases), the host now uses JSON instead of pickle for sandbox→server transport, addressing a security hardening item in the recent release notes.

## Configuration and Lifecycle

`StepProgression` constants in [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py) (`START`, `RESPONSE_RECEIVED`, `STEP_LOGGED`, `FINISHED`) drive the lifecycle: the loop logs a pending step, records the LLM response, persists tool-call messages via `message_manager.create_many_messages_async`, and finalizes the step with `stop_reason` and usage. Helpers in [letta/agents/helpers.py](https://github.com/letta-ai/letta/blob/main/letta/agents/helpers.py) build rule-violation messages, decode the last function response, and detect paired approval request/response messages for human-in-the-loop flows.

## See Also

- Agents API Reference — REST endpoints exposed to the SDKs
- Provider Configuration — how to add or tune a provider
- Memory Blocks & Context Window Management
- Sandboxed Tool Execution
- Streaming Client Protocol

---

<a id='page-2'></a>

## Agent Loop, Memory Blocks & Multi-Agent Groups

### Related Pages

Related topics: [Project Overview & System Architecture](#page-1), [LLM Provider Integration: Cloud APIs & Local Models](#page-3), [Tools, Sandboxes, MCP Servers & Extensibility](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py)
- [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py)
- [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py)
- [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py)
- [letta/agents/voice_sleeptime_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_sleeptime_agent.py)
- [letta/agents/ephemeral_summary_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/ephemeral_summary_agent.py)
- [letta/agents/letta_agent_batch.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_batch.py)
- [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py)
- [README.md](https://github.com/letta-ai/letta/blob/main/README.md)
</details>

# Agent Loop, Memory Blocks & Multi-Agent Groups

Letta (formerly MemGPT) is an open-source framework for building stateful agents with advanced memory that can learn and self-improve over time [Source: [README.md:1-3]()](https://github.com/letta-ai/letta/blob/main/README.md). At the heart of the system sits a recursive agent loop that consumes user input, rebuilds in-context memory, calls an LLM, executes any returned tool calls, and persists messages. This page documents how the loop is structured across the v1/v2/v3 implementations, how memory blocks are compiled and refreshed, and how multiple agent variants (batch, voice, sleeptime) coordinate work in groups.

## Agent Loop Architecture

The core execution pattern is implemented in three parallel files that share the same contract but differ in transport and tooling. The v1 implementation in `letta_agent.py` exposes the canonical streaming loop. Each iteration: (1) rebuilds memory, (2) generates an LLM request, (3) fetches a response, and (4) processes the response [Source: [letta/agents/letta_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py). It uses a `ToolRulesSolver` to enforce tool ordering and instantiates an `LLMClient` per provider: `LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, put_inner_thoughts_first=True, actor=self.actor)` [Source: [letta/agents/letta_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py).

The loop drives a finite state machine via the `StepProgression` enum, advancing from `START` to `STEP_LOGGED`, `RESPONSE_RECEIVED`, and finally `FINISHED`. Each step is logged early with `StepStatus.PENDING`, then mutated as the LLM call completes and usage statistics arrive [Source: [letta/agents/letta_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py). On failure, the step is updated with the exception type, message, and `traceback.format_exc()`; on success, token details (cached, cache-creation, reasoning) are populated from `LettaUsageStatistics` only when the provider reported them [Source: [letta/agents/letta_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py).

The v2 implementation in `letta_agent_v2.py` factors the loop into `_execute_step` and adds an explicit `_decide_continuation` helper that checks `request_heartbeat`, tool-rule violations, and `is_final_step` to decide whether to keep stepping [Source: [letta/agents/letta_agent_v2.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py). v2 also introduces an approval/denial branch where the assistant can pause for user confirmation, with `is_approval` and `is_denial` flags propagated into persisted messages via `create_letta_messages_from_llm_response` [Source: [letta/agents/letta_agent_v2.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py).

The v3 implementation in `letta_agent_v3.py` introduces an `LLMAdapter` abstraction that wraps the LLM client for blocking, streaming, OpenAI Responses WebSocket, and SGLang-native RL training transports [Source: [letta/agents/letta_agent_v3.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py). It also exposes a `compaction_trigger_threshold` via `get_compaction_trigger_threshold(llm_config)` so the loop can decide when to summarize before the context window fills [Source: [letta/agents/letta_agent_v3.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py).

```mermaid
flowchart TD
    A[User Input] --> B[Prepare In-Context Messages]
    B --> C{Memory or System Changed?}
    C -- Yes --> D[Rebuild Memory via memory.compile]
    C -- No --> E[Reuse cached system message]
    D --> F[Build LLM Request]
    E --> F
    F --> G[LLMCallType.agent_step]
    G --> H{Tool Call Returned?}
    H -- Yes --> I[Execute Tool via ToolRulesSolver]
    I --> J[Persist Tool Messages]
    H -- No --> K[Persist Assistant Message]
    J --> L{Continue Stepping?}
    K --> L
    L -- Yes --> B
    L -- No --> M[Update Step FINISHED + Token Details]
```

## Memory Blocks, Refresh & Summarization

Every agent owns an in-memory representation of memory blocks (e.g. `human`, `persona`, summary) that are compiled into the system prompt on each step. The v2 loop calls `agent_state.memory.compile(tool_usage_rules=..., sources=..., max_files_open=..., llm_config=...)` and short-circuits the rebuild when neither the system prompt nor the compiled memory string changed [Source: [letta/agents/letta_agent_v2.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py). Before compilation, `refresh_memory_async` updates memory references and `refresh_file_blocks` re-syncs attached file content [Source: [letta/agents/letta_agent_v2.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py). Archival memory is loaded via `archive_manager.get_default_archive_for_agent_async` and its tag list is injected as `archive_tags` [Source: [letta/agents/letta_agent_v2.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py).

When the in-context buffer grows past the configured threshold, the v1 agent instantiates an `EphemeralSummaryAgent` (only when `enable_summarization` is set and an OpenAI key is present) alongside a `Summarizer` configured with `partial_evict_summarizer_percentage`, `message_buffer_limit`, and `message_buffer_min` [Source: [letta/agents/letta_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py). The ephemeral summary agent constructs a `MessageCreate` carrying the `summary_system_prompt`, prepends `--- Previous Summary ---` to the target block, and writes the condensed text back via `block_manager.update_block_async` [Source: [letta/agents/ephemeral_summary_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/ephemeral_summary_agent.py).

## Multi-Agent Groups and Specialized Variants

Letta exposes several agent variants that share the same step contract but specialize in different transports or group semantics. The **batch agent** in `letta_agent_batch.py` runs many agents in lockstep, grouping them by current tool call, then calls `bulk_update_block_values_async` to apply memory updates once per round and `_persist_tool_messages` to fan tool results back to each agent's message list [Source: [letta/agents/letta_agent_batch.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_batch.py). This is the implementation behind multi-agent groups where several agents process the same user message concurrently.

The **voice agent** in `voice_agent.py` filters tools down to a voice-safe subset (`ToolType.CUSTOM`, `LETTA_FILES_CORE`, `LETTA_BUILTIN`, `EXTERNAL_MCP`) and injects a special `search_memory` tool whose description instructs the model to surface conversational filler while memory is being re-contextualized [Source: [letta/agents/voice_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py). Strict mode is applied per-agent via `enable_strict_mode`, gated on `agent_state.llm_config.strict` [Source: [letta/agents/voice_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py).

The **voice sleeptime agent** in `voice_sleeptime_agent.py` stores transcript ranges into memory and then forces a `rebuild_system_prompt(force=True)`. It only exposes a synchronous path — `step_stream` raises `NotImplementedError("VoiceSleeptimeAgent does not support async step.")` [Source: [letta/agents/voice_sleeptime_agent.py]()](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_sleeptime_agent.py).

## Provider Plug-Ins and Community-Reported Limitations

Letta is model-agnostic. The Ollama provider in `schemas/providers/ollama.py` exposes `raw_base_url` (strips `/v1`) for native `/api/tags` and `/api/show` calls and `openai_compat_base_url` for compatibility-mode clients; it deliberately avoids filtering on the `capabilities` field because older Ollama builds do not emit it [Source: [letta/schemas/providers/ollama.py]()](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py). Community request #18 ("Support for local LLMs like Ollama") is therefore addressed at the provider-schema layer. Azure models have had recurring issues — issue #2582 reports that `AZURE_API_KEY` / `AZURE_BASE_URL` are detected but the model list does not populate in the agent dropdown, and explicit model configuration during agent creation is required as a workaround.

## See Also

- [README.md](https://github.com/letta-ai/letta/blob/main/README.md) — project overview and SDK quickstarts
- `letta/agents/base_agent.py` — shared base class for all agent variants
- `letta/services/step_manager.py` — persistent storage of step records
- `letta/schemas/agent.py` — `AgentState`, `LLMConfig`, and memory-block schemas

---

<a id='page-3'></a>

## LLM Provider Integration: Cloud APIs & Local Models

### Related Pages

Related topics: [Project Overview & System Architecture](#page-1), [Agent Loop, Memory Blocks & Multi-Agent Groups](#page-2), [Tools, Sandboxes, MCP Servers & Extensibility](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/letta-ai/letta/blob/main/README.md)
- [letta/local_llm/README.md](https://github.com/letta-ai/letta/blob/main/letta/local_llm/README.md)
- [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py)
- [letta/schemas/providers/vllm.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/vllm.py)
- [letta/schemas/providers/sglang.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/sglang.py)
- [letta/schemas/providers/cerebras.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/cerebras.py)
- [letta/schemas/providers/deepseek.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/deepseek.py)
- [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py)
- [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py)
- [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py)
</details>

# LLM Provider Integration: Cloud APIs & Local Models

## Overview

Letta is model-agnostic and integrates with a broad spectrum of language model backends, ranging from hosted cloud APIs (OpenAI, Anthropic, Google Vertex, DeepSeek, Cerebras) to self-hosted inference engines (Ollama, vLLM, SGLang). Source: [README.md](README.md) explicitly states "Letta is fully model-agnostic" and recommends Opus 4.5 and GPT-5.2 for best performance.

Provider configuration is exposed through a typed schema layer at `letta/schemas/providers/`, where each provider is implemented as a subclass of `Provider` carrying its own base URL, API key handling, context-window logic, and model-discovery routine. Agent code then selects an appropriate `LLMClient` and adapter at runtime. Source: [letta/agents/letta_agent.py](letta/agents/letta_agent.py) creates the client via `LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, ...)`.

## Provider Schema Architecture

Each provider implementation follows the same shape: a Pydantic model inheriting from `Provider`, parameterized by `ProviderType`, and exposing two key methods — `get_model_context_window_size` and `async list_llm_models_async`. The enum `ProviderType` discriminates between categories.

| Provider | File | Endpoint Style | Notable Behavior |
|---|---|---|---|
| Ollama | `letta/schemas/providers/ollama.py` | Native `/api/tags` and `/api/show` | Strips trailing `/v1` from `base_url` via `raw_base_url`; avoids filtering on `capabilities` for older Ollama versions |
| vLLM | `letta/schemas/providers/vllm.py` | OpenAI-compatible `/v1` | Appends `/v1` if missing; supports `default_prompt_formatter` for completions endpoint |
| SGLang | `letta/schemas/providers/sglang.py` | OpenAI-compatible `/v1` | Same `/v1` normalization as vLLM |
| Cerebras | `letta/schemas/providers/cerebras.py` | OpenAI-compatible | Context window hardcoded to 8192 (free tier) or 128000 (paid); note `api_key` is marked `deprecated` |
| DeepSeek | `letta/schemas/providers/deepseek.py` | OpenAI-compatible | `deepseek-reasoner` uses `put_inner_thoughts_in_kwargs=False` |

Source: [letta/schemas/providers/ollama.py](letta/schemas/providers/ollama.py), [letta/schemas/providers/vllm.py](letta/schemas/providers/vllm.py), [letta/schemas/providers/sglang.py](letta/schemas/providers/sglang.py), [letta/schemas/providers/cerebras.py](letta/schemas/providers/cerebras.py), [letta/schemas/providers/deepseek.py](letta/schemas/providers/deepseek.py).

## Cloud API Providers

Cloud providers rely on the OpenAI-compatible request envelope where possible. DeepSeek and Cerebras both delegate to `openai_get_model_list_async` for model discovery and only override context-window heuristics. Source: [letta/schemas/providers/deepseek.py](letta/schemas/providers/deepseek.py) and [letta/schemas/providers/cerebras.py](letta/schemas/providers/cerebras.py).

Cerebras exposes a context-window override that depends on plan tier. Source: [letta/schemas/providers/cerebras.py](letta/schemas/providers/cerebras.py) defines `get_model_context_window_size`, returning 8192 for the free tier and 128000 otherwise, hardcoded with `is_free_tier = True` as a placeholder.

For the agent loop, the V2/V3 agents inject the provider into `LLMConfig` and propagate `run_id`, `agent_id`, and billing context to the LLM client. Source: [letta/agents/letta_agent_v2.py](letta/agents/letta_agent_v2.py) constructs `LettaLLMRequestAdapter` or `LettaLLMStreamAdapter` depending on `stream_tokens`.

## Local LLM Providers

Local inference backends are first-class providers in the schema layer. Source: [letta/local_llm/README.md](letta/local_llm/README.md) directs users to the external docs at `letta.readme.io/docs/local_llm` for setup.

Ollama uses native `/api` endpoints rather than OpenAI compatibility. Source: [letta/schemas/providers/ollama.py](letta/schemas/providers/ollama.py) defines `raw_base_url` to drop a trailing `/v1` and queries `/api/tags` to enumerate models, deliberately skipping a `capabilities` filter because older Ollama builds do not expose it.

vLLM and SGLang both expose OpenAI-style chat completions on port `:v1`. Source: [letta/schemas/providers/vllm.py](letta/schemas/providers/vllm.py) appends `/v1` if absent; [letta/schemas/providers/sglang.py](letta/schemas/providers/sglang.py) applies the same convention. SGLang additionally supports a `SGLangNativeAdapter` path used for multi-turn RL training. Source: [letta/agents/letta_agent_v3.py](letta/agents/letta_agent_v3.py) selects this adapter when `use_sglang_native` is set, and it captures per-turn `TurnTokenData` (output_ids, logprobs) for RL workloads.

## Adapter Selection Flow

```mermaid
flowchart TD
    A[Agent Step] --> B{stream_tokens?}
    B -- yes --> C[LettaLLMStreamAdapter]
    B -- no --> D{use_sglang_native?}
    D -- yes --> E[SGLangNativeAdapter]
    D -- no --> F[SimpleLLMRequestAdapter / LettaLLMRequestAdapter]
    C --> G[Provider HTTP Client]
    E --> G
    F --> G
    G --> H[Persist messages + tool calls]
```

Source: [letta/agents/letta_agent_v2.py](letta/agents/letta_agent_v2.py) branches on `stream_tokens`; [letta/agents/letta_agent_v3.py](letta/agents/letta_agent_v3.py) adds the `use_sglang_native` branch.

## Community Considerations

Two long-standing community threads shape how providers are consumed:

- **Local LLM support** (Issue #18, 19 comments) — users requested Ollama-style backends to avoid vendor lock-in. Source: [letta/schemas/providers/ollama.py](letta/schemas/providers/ollama.py) implements Ollama as a typed provider, and the `[Feature Request] Support for local LLMs like Ollama` discussion motivated the local-llm module.
- **Azure model selection** (Issue #2582, 11 comments) — users report that setting `AZURE_API_KEY` and `AZURE_BASE_URL` does not activate Azure models in the model picker, so providers must be created explicitly per agent. This pattern is consistent with how `LLMConfig.model_endpoint` is bound to a specific `Provider` instance.

## Common Failure Modes

1. **Trailing slash / missing `/v1`** — vLLM and SGLang append `/v1` defensively, but a misconfigured `base_url` causes `openai_get_model_list_async` to 404. Source: [letta/schemas/providers/vllm.py](letta/schemas/providers/vllm.py).
2. **Ollama capabilities filter** — filtering on missing fields silently drops models. Source: [letta/schemas/providers/ollama.py](letta/schemas/providers/ollama.py) avoids filtering and infers support from `model_info`.
3. **Parallel tool calls** — when `parallel_tool_calls=False`, some providers ignore the flag, so the agent truncates client-side. Source: [letta/agents/letta_agent_v3.py](letta/agents/letta_agent_v3.py) logs a warning and reduces to the first tool call.
4. **Deprecated API keys** — Cerebras marks `api_key` as `deprecated`; callers must supply credentials via the encrypted path. Source: [letta/schemas/providers/cerebras.py](letta/schemas/providers/cerebras.py).

## See Also

- Agents API overview in [README.md](README.md)
- Local LLM setup guide referenced by [letta/local_llm/README.md](letta/local_llm/README.md)
- Sandbox tool-result transport (v0.16.8 release notes mention a JSON-over-pickle fix)
- Voice agent tool schemas in [letta/agents/voice_agent.py](letta/agents/voice_agent.py)

---

<a id='page-4'></a>

## Tools, Sandboxes, MCP Servers & Extensibility

### Related Pages

Related topics: [Project Overview & System Architecture](#page-1), [Agent Loop, Memory Blocks & Multi-Agent Groups](#page-2), [LLM Provider Integration: Cloud APIs & Local Models](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py)
- [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py)
- [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py)
- [letta/agents/helpers.py](https://github.com/letta-ai/letta/blob/main/letta/agents/helpers.py)
- [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py)
- [letta/client/streaming.py](https://github.com/letta-ai/letta/blob/main/letta/client/streaming.py)
- [sandbox/resources/server/README.md](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/README.md)
- [sandbox/resources/server/package.json](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/package.json)
- [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py)
- [letta/schemas/providers/vllm.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/vllm.py)
- [letta/schemas/providers/sglang.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/sglang.py)
- [letta/schemas/providers/bedrock.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/bedrock.py)
- [README.md](https://github.com/letta-ai/letta/blob/main/README.md)
</details>

# Tools, Sandboxes, MCP Servers & Extensibility

## Overview

Letta exposes a layered extensibility model that lets agents reach outside the LLM: **tools** are callable functions attached to an agent, **sandboxes** provide a secure, isolated runtime for executing user-defined tool code, and **MCP servers** (plus other external transports) let third-party tool ecosystems plug into the same agent loop. The agent runtime (`letta_agent.py`, `letta_agent_v2.py`, `letta_agent_v3.py`) is responsible for calling these tools, enforcing approval and tool-rule policies, persisting the resulting messages, and streaming them back to clients.

This page describes how the four pieces fit together, based strictly on the source files in this repository.

## Tool Taxonomy and Registration

The voice agent's `_build_tool_schemas` filters an `AgentState`'s `tools` list by `tool_type`, which gives a canonical list of where a tool can come from. Source: [letta/agents/voice_agent.py:letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py).

| `ToolType` | Origin | Typical use |
|---|---|---|
| `CUSTOM` | User-defined function | Tool calls authored by the developer |
| `LETTA_BUILTIN` | Bundled with Letta | First-party capabilities (e.g., `search_memory`) |
| `LETTA_FILES_CORE` | Core file API | File-system style helpers |
| `EXTERNAL_MCP` | MCP server | Third-party tool ecosystem |

`voice_agent.py` adds a *virtual* tool, `search_memory`, by wrapping a function schema with `add_pre_execution_message` and `enable_strict_mode`. The injected `pre_exec_msg` is the filler phrase the agent should say aloud ("Let me double-check my notes—one moment, please.") while the memory search runs in the background. Source: [letta/agents/voice_agent.py:letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py).

## Sandbox Runtime (Modal + TypeScript Server)

Custom tool code must run somewhere safe. Letta ships a skeleton TypeScript server that executes user-defined functions inside a Modal container. The transport is a Unix domain socket, and the wire format is JSON (the v0.16.8 release specifically fixed a security issue by switching from pickle to JSON for sandbox→server tool-result transport). Source: [sandbox/resources/server/README.md:sandbox/resources/server/README.md](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/README.md), [sandbox/resources/server/package.json:sandbox/resources/server/package.json](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/package.json).

The skeleton has three files:

- `server.ts` — a Node process listening on a Unix socket.
- `entrypoint.ts` — deserializes a JSON-encoded input string into the user-defined function's arguments.
- `user-function.ts` — fully defined by the end user.

Build and run:

```bash
npm install
npm run build
npm run start
```

Source: [sandbox/resources/server/README.md:sandbox/resources/server/README.md](https://github.com/letta-ai/letta/blob/main/sandbox/resources/server/README.md).

## MCP and External Tool Execution

Beyond sandboxes, Letta supports bring-your-own providers and tool transports. `EXTERNAL_MCP` is a first-class `ToolType` in `_build_tool_schemas`, meaning MCP-served tools flow through the same schema-generation and dispatch path as `CUSTOM` tools. Source: [letta/agents/voice_agent.py:letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py).

The provider layer (`letta/schemas/providers/`) is similarly pluggable. For example, `OllamaProvider` derives both the raw and OpenAI-compatible base URLs from a single `base_url` field, and `list_llm_models_async` calls `/api/tags` rather than a `/v1/models` endpoint — reflecting the fact that Ollama is not a drop-in OpenAI replacement, a recurring community concern (see issue #18 in the community context). Source: [letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py). `SGLangProvider` and `VLLMProvider` follow the OpenAI-compatible pattern and append `/v1` to the base URL automatically. Source: [letta/schemas/providers/sglang.py:letta/schemas/providers/sglang.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/sglang.py), [letta/schemas/providers/vllm.py:letta/schemas/providers/vllm.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/vllm.py).

## Tool Execution Lifecycle: Approval, Rule Validation, and Streaming

Every tool call passes through the same lifecycle regardless of whether it originated from a `CUSTOM` function, a sandbox, or an MCP server:

1. **Schema construction.** `_build_tool_schemas` produces OpenAI-function-call schemas. Strict JSON-schema output is enabled via `enable_strict_mode` when the agent's `llm_config.strict` flag is set. Source: [letta/agents/voice_agent.py:letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/voice_agent.py).
2. **Parallel-tool gating.** v3's `letta_agent_v3.py` toggles `disable_parallel_tool_use` for Anthropic/Bedrock and `parallel_tool_calls` for OpenAI based on whether the agent declares `tool_rules` (excluding `requires_approval`). Source: [letta/agents/letta_agent_v3.py:letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v3.py).
3. **Rule violation handling.** If a model emits a tool name outside the allowed set, `_build_rule_violation_result` produces a `ToolConstraintError` whose `func_return` includes a hint generated by `ToolRulesSolver.guess_rule_violation`. Source: [letta/agents/helpers.py:letta/agents/helpers.py](https://github.com/letta-ai/letta/blob/main/letta/agents/helpers.py).
4. **Approval handling.** When `requires_approval` rules are present, v2 emits an approval request message. On the next turn, `_maybe_get_approval_messages` detects the back-to-back `approval`/`approval` pair, and the agent loop records `is_approval` / `is_denial` on the resulting `Message` and threads `denial_reason` into telemetry. Source: [letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py), [letta/agents/helpers.py:letta/agents/helpers.py](https://github.com/letta-ai/letta/blob/main/letta/agents/helpers.py).
5. **Persistence.** Persisted messages are produced by `create_letta_messages_from_llm_response` (v2) or the equivalent v1 path in `letta_agent.py`, with `step_id` and `run_id` stamped on each message. Source: [letta/agents/letta_agent.py:letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent.py), [letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/main/letta/agents/letta_agent_v2.py).
6. **Streaming.** The Python streaming client parses SSE chunks, dispatching on `tool_call`, `tool_return`, `step_count` (a `LettaUsageStatistics`), and OpenAI-style `ChatCompletionChunk`. On SSE errors whose message mentions `application/json`, it falls back to a POST and logs the JSON error. Source: [letta/client/streaming.py:letta/client/streaming.py](https://github.com/letta-ai/letta/blob/main/letta/client/streaming.py).

```mermaid
sequenceDiagram
    participant LLM
    participant Agent as Agent Loop
    participant Rules as ToolRulesSolver
    participant Sandbox
    participant MCP
    LLM->>Agent: tool_call(name, args)
    Agent->>Rules: validate(name)
    alt allowed
        Agent->>Sandbox: dispatch (CUSTOM)
        Agent->>MCP: dispatch (EXTERNAL_MCP)
        Sandbox-->>Agent: ToolExecutionResult
        MCP-->>Agent: ToolExecutionResult
    else requires_approval
        Agent-->>LLM: approval_request message
        LLM->>Agent: approval response
    else rule violated
        Agent->>Rules: guess_rule_violation
        Agent-->>LLM: ToolConstraintError + hint
    end
    Agent->>Agent: persist with step_id / run_id
    Agent-->>Client: SSE tool_return / step_count
```

## Common Failure Modes

- **Sandbox transport regression** — the v0.16.8 release explicitly fixed a vulnerability by replacing pickle with JSON between the sandbox and server. Operators upgrading from older versions should re-verify that no legacy tool result payloads rely on Python-specific serialization. Source: community release notes (v0.16.8).
- **Provider misconfiguration** — community reports (issue #2582) describe Azure models not appearing in the agent's model dropdown after env variables are set, only working when passed at agent creation. Provider configuration must be applied through the provider schema, not just environment variables. Source: [letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py) (illustrative of the provider-config pattern).
- **Ollama capability detection** — older Ollama versions do not expose a `capabilities` field on `/api/show`, so `OllamaProvider.list_llm_models_async` deliberately avoids filtering on capabilities and instead infers support from `model_info`. Deployments on stale Ollama builds should not assume capability-based filtering. Source: [letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/ollama.py).
- **Bedrock model naming** — `BedrockProvider.extract_anthropic_model_name` strips the inference-profile prefix so a Bedrock model can be used with the same checks as native Anthropic. Misconfigured `inference_profile_id`s will fail this extraction. Source: [letta/schemas/providers/bedrock.py:letta/schemas/providers/bedrock.py](https://github.com/letta-ai/letta/blob/main/letta/schemas/providers/bedrock.py).
- **SSE parse errors** — the streaming client raises on unknown chunk shapes, so downstream consumers must handle `ValueError` from the iterator. Source: [letta/client/streaming.py:letta/client/streaming.py](https://github.com/letta-ai/letta/blob/main/letta/client/streaming.py).

## See Also

- [README.md:README.md](https://github.com/letta-ai/letta/blob/main/README.md) — top-level product overview and SDK install steps.
- Provider schemas in `letta/schemas/providers/` for the full list of supported LLM backends.
- Agent-loop internals: `letta/agents/letta_agent.py`, `letta/agents/letta_agent_v2.py`, `letta/agents/letta_agent_v3.py`.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: letta-ai/letta

Summary: Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

## 1. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/letta-ai/letta

## 2. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/letta-ai/letta

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/letta-ai/letta

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

<!-- canonical_name: letta-ai/letta; human_manual_source: deepwiki_human_wiki -->
