Doramagic Project Pack · Human Manual
letta
Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.
Project Overview & System Architecture
Related topics: Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility
Project Overview & System Architecture
Purpose and Scope
Letta (formerly MemGPT) is an open-source framework for building AI agents with advanced, stateful memory and self-improvement capabilities. As described in README.md, the project ships two primary entry points: the Letta Code CLI for local terminal-based agents, and the Letta API with Python and TypeScript SDKs for application integration. The framework is model-agnostic and exposes a full-featured agents API centered on memory_blocks, tools, and a long-lived agent_state.
The repository combines a server runtime, a sandboxed tool-execution environment, multiple LLM provider adapters, and a streaming client. The community's long-standing interest in local LLM support (e.g., issue #18 "Support for local LLMs like Ollama") and Azure model compatibility (issue #2582) has driven much of the provider-layer design seen in the codebase.
High-Level Architecture
Letta is organized into cooperating subsystems: agent loops, LLM provider adapters, message and step persistence, streaming clients, and a sandboxed tool runtime.
flowchart LR
User[User / SDK Client] -->|messages.create| API[Letta API / REST + WS]
API --> Agent[Agent Loop v1 / v2 / v3]
Agent -->|build request| Provider[Provider Adapter<br/>OpenAI / Anthropic / Ollama / SGLang / Cerebras / Azure]
Provider -->|HTTP / SSE| Upstream[Upstream LLM]
Upstream --> Provider
Provider --> Agent
Agent -->|tool calls| Sandbox[Sandboxed TS Tool Server]
Sandbox --> Agent
Agent -->|persist| DB[(Messages, Steps, Memory)]
Agent -->|SSE stream| Client[Streaming Client]
Client --> UserEach request flows from the SDK through the agent loop, which dispatches to a configured provider, persists results to the data store, and streams events back to the client. The sandbox runs user-defined tool functions in an isolated TypeScript process, returning JSON-encoded results to the host.
Core Subsystems
Agent Loops
Three agent-loop implementations coexist to support different execution models:
LettaAgent(v1) — the classical streaming loop in letta/agents/letta_agent.py. It creates aStepearly withStepStatus.PENDING, performs the LLM request, runs_handle_ai_response, updates the step with usage statistics, and emits Server-Sent Events including a finalLettaStopReasonchunk.LettaAgentV2— refactored for tool-call correctness in letta/agents/letta_agent_v2.py, adding explicit approval/denial message construction,tool_rule_violatedenforcement, and finer-grainedpre_computed_assistant_message_idhandling.LettaAgentV3— adds advanced context management and parallel-tool-use gating in letta/agents/letta_agent_v3.py. It togglesdisable_parallel_tool_useforanthropic/bedrockandparallel_tool_callsforopenaionly when no non-approval tool rules are attached, and surfaceslogprobsplus token IDs for RL training via the SGLang native path.
A specialized VoiceAgent in letta/agents/voice_agent.py builds OpenAI-style stream=True completions and exposes tools like search_memory for voice-driven recall.
Provider Adapters
The provider layer in letta/schemas/providers/ abstracts model discovery, base URL construction, and prompt formatting.
| Provider | File | Key Behavior |
|---|---|---|
| Ollama | ollama.py | Strips trailing /v1 for native /api/tags and /api/show calls; avoids capability filtering for older versions. |
| SGLang | sglang.py | Treats SGLang as an OpenAI-compatible endpoint and ensures the base URL ends in /v1. |
| Cerebras | cerebras.py | Returns a tier-dependent context window (8K on free, 128K on paid). |
These adapters directly address community demand: Ollama support resolves #18, while Azure-style env wiring is handled alongside other base providers in the same layer (see #2582).
Streaming Client
letta/client/streaming.py parses SSE chunks into typed message objects: AssistantMessage, HiddenReasoningMessage, ToolCallMessage, ToolReturnMessage, and LettaUsageStatistics. On SSEError with an application/json body, it falls back to a POST retry and logs the structured error, providing resilient reconnection for long agent runs.
Sandboxed Tool Runtime
User-defined tool functions run in a Modal-hosted TypeScript container. The skeleton in sandbox/resources/server/ listens on a Unix socket, deserializes JSON input, and dispatches to user-function.ts. As of v0.16.8, the host now uses JSON instead of pickle for sandbox→server transport, addressing a security hardening item in the recent release notes.
Configuration and Lifecycle
StepProgression constants in letta/agents/letta_agent.py (START, RESPONSE_RECEIVED, STEP_LOGGED, FINISHED) drive the lifecycle: the loop logs a pending step, records the LLM response, persists tool-call messages via message_manager.create_many_messages_async, and finalizes the step with stop_reason and usage. Helpers in letta/agents/helpers.py build rule-violation messages, decode the last function response, and detect paired approval request/response messages for human-in-the-loop flows.
See Also
- Agents API Reference — REST endpoints exposed to the SDKs
- Provider Configuration — how to add or tune a provider
- Memory Blocks & Context Window Management
- Sandboxed Tool Execution
- Streaming Client Protocol
Source: https://github.com/letta-ai/letta / Human Manual
Agent Loop, Memory Blocks & Multi-Agent Groups
Related topics: Project Overview & System Architecture, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview & System Architecture, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility
Agent Loop, Memory Blocks & Multi-Agent Groups
Letta (formerly MemGPT) is an open-source framework for building stateful agents with advanced memory that can learn and self-improve over time Source: [README.md:1-3](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/README.md). At the heart of the system sits a recursive agent loop that consumes user input, rebuilds in-context memory, calls an LLM, executes any returned tool calls, and persists messages. This page documents how the loop is structured across the v1/v2/v3 implementations, how memory blocks are compiled and refreshed, and how multiple agent variants (batch, voice, sleeptime) coordinate work in groups.
Agent Loop Architecture
The core execution pattern is implemented in three parallel files that share the same contract but differ in transport and tooling. The v1 implementation in letta_agent.py exposes the canonical streaming loop. Each iteration: (1) rebuilds memory, (2) generates an LLM request, (3) fetches a response, and (4) processes the response Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). It uses a ToolRulesSolver to enforce tool ordering and instantiates an LLMClient per provider: LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, put_inner_thoughts_first=True, actor=self.actor) Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py).
The loop drives a finite state machine via the StepProgression enum, advancing from START to STEP_LOGGED, RESPONSE_RECEIVED, and finally FINISHED. Each step is logged early with StepStatus.PENDING, then mutated as the LLM call completes and usage statistics arrive Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). On failure, the step is updated with the exception type, message, and traceback.format_exc(); on success, token details (cached, cache-creation, reasoning) are populated from LettaUsageStatistics only when the provider reported them Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py).
The v2 implementation in letta_agent_v2.py factors the loop into _execute_step and adds an explicit _decide_continuation helper that checks request_heartbeat, tool-rule violations, and is_final_step to decide whether to keep stepping Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). v2 also introduces an approval/denial branch where the assistant can pause for user confirmation, with is_approval and is_denial flags propagated into persisted messages via create_letta_messages_from_llm_response Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py).
The v3 implementation in letta_agent_v3.py introduces an LLMAdapter abstraction that wraps the LLM client for blocking, streaming, OpenAI Responses WebSocket, and SGLang-native RL training transports Source: [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v3.py). It also exposes a compaction_trigger_threshold via get_compaction_trigger_threshold(llm_config) so the loop can decide when to summarize before the context window fills Source: [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v3.py).
flowchart TD
A[User Input] --> B[Prepare In-Context Messages]
B --> C{Memory or System Changed?}
C -- Yes --> D[Rebuild Memory via memory.compile]
C -- No --> E[Reuse cached system message]
D --> F[Build LLM Request]
E --> F
F --> G[LLMCallType.agent_step]
G --> H{Tool Call Returned?}
H -- Yes --> I[Execute Tool via ToolRulesSolver]
I --> J[Persist Tool Messages]
H -- No --> K[Persist Assistant Message]
J --> L{Continue Stepping?}
K --> L
L -- Yes --> B
L -- No --> M[Update Step FINISHED + Token Details]Memory Blocks, Refresh & Summarization
Every agent owns an in-memory representation of memory blocks (e.g. human, persona, summary) that are compiled into the system prompt on each step. The v2 loop calls agent_state.memory.compile(tool_usage_rules=..., sources=..., max_files_open=..., llm_config=...) and short-circuits the rebuild when neither the system prompt nor the compiled memory string changed Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). Before compilation, refresh_memory_async updates memory references and refresh_file_blocks re-syncs attached file content Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). Archival memory is loaded via archive_manager.get_default_archive_for_agent_async and its tag list is injected as archive_tags Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py).
When the in-context buffer grows past the configured threshold, the v1 agent instantiates an EphemeralSummaryAgent (only when enable_summarization is set and an OpenAI key is present) alongside a Summarizer configured with partial_evict_summarizer_percentage, message_buffer_limit, and message_buffer_min Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). The ephemeral summary agent constructs a MessageCreate carrying the summary_system_prompt, prepends --- Previous Summary --- to the target block, and writes the condensed text back via block_manager.update_block_async Source: [letta/agents/ephemeral_summary_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/ephemeral_summary_agent.py).
Multi-Agent Groups and Specialized Variants
Letta exposes several agent variants that share the same step contract but specialize in different transports or group semantics. The batch agent in letta_agent_batch.py runs many agents in lockstep, grouping them by current tool call, then calls bulk_update_block_values_async to apply memory updates once per round and _persist_tool_messages to fan tool results back to each agent's message list Source: [letta/agents/letta_agent_batch.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_batch.py). This is the implementation behind multi-agent groups where several agents process the same user message concurrently.
The voice agent in voice_agent.py filters tools down to a voice-safe subset (ToolType.CUSTOM, LETTA_FILES_CORE, LETTA_BUILTIN, EXTERNAL_MCP) and injects a special search_memory tool whose description instructs the model to surface conversational filler while memory is being re-contextualized Source: [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_agent.py). Strict mode is applied per-agent via enable_strict_mode, gated on agent_state.llm_config.strict Source: [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_agent.py).
The voice sleeptime agent in voice_sleeptime_agent.py stores transcript ranges into memory and then forces a rebuild_system_prompt(force=True). It only exposes a synchronous path — step_stream raises NotImplementedError("VoiceSleeptimeAgent does not support async step.") Source: [letta/agents/voice_sleeptime_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_sleeptime_agent.py).
Provider Plug-Ins and Community-Reported Limitations
Letta is model-agnostic. The Ollama provider in schemas/providers/ollama.py exposes raw_base_url (strips /v1) for native /api/tags and /api/show calls and openai_compat_base_url for compatibility-mode clients; it deliberately avoids filtering on the capabilities field because older Ollama builds do not emit it Source: [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/schemas/providers/ollama.py). Community request #18 ("Support for local LLMs like Ollama") is therefore addressed at the provider-schema layer. Azure models have had recurring issues — issue #2582 reports that AZURE_API_KEY / AZURE_BASE_URL are detected but the model list does not populate in the agent dropdown, and explicit model configuration during agent creation is required as a workaround.
See Also
- README.md — project overview and SDK quickstarts
letta/agents/base_agent.py— shared base class for all agent variantsletta/services/step_manager.py— persistent storage of step recordsletta/schemas/agent.py—AgentState,LLMConfig, and memory-block schemas
Source: https://github.com/letta-ai/letta / Human Manual
LLM Provider Integration: Cloud APIs & Local Models
Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, Tools, Sandboxes, MCP Servers & Extensibility
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, Tools, Sandboxes, MCP Servers & Extensibility
LLM Provider Integration: Cloud APIs & Local Models
Overview
Letta is model-agnostic and integrates with a broad spectrum of language model backends, ranging from hosted cloud APIs (OpenAI, Anthropic, Google Vertex, DeepSeek, Cerebras) to self-hosted inference engines (Ollama, vLLM, SGLang). Source: README.md explicitly states "Letta is fully model-agnostic" and recommends Opus 4.5 and GPT-5.2 for best performance.
Provider configuration is exposed through a typed schema layer at letta/schemas/providers/, where each provider is implemented as a subclass of Provider carrying its own base URL, API key handling, context-window logic, and model-discovery routine. Agent code then selects an appropriate LLMClient and adapter at runtime. Source: letta/agents/letta_agent.py creates the client via LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, ...).
Provider Schema Architecture
Each provider implementation follows the same shape: a Pydantic model inheriting from Provider, parameterized by ProviderType, and exposing two key methods — get_model_context_window_size and async list_llm_models_async. The enum ProviderType discriminates between categories.
| Provider | File | Endpoint Style | Notable Behavior |
|---|---|---|---|
| Ollama | letta/schemas/providers/ollama.py | Native /api/tags and /api/show | Strips trailing /v1 from base_url via raw_base_url; avoids filtering on capabilities for older Ollama versions |
| vLLM | letta/schemas/providers/vllm.py | OpenAI-compatible /v1 | Appends /v1 if missing; supports default_prompt_formatter for completions endpoint |
| SGLang | letta/schemas/providers/sglang.py | OpenAI-compatible /v1 | Same /v1 normalization as vLLM |
| Cerebras | letta/schemas/providers/cerebras.py | OpenAI-compatible | Context window hardcoded to 8192 (free tier) or 128000 (paid); note api_key is marked deprecated |
| DeepSeek | letta/schemas/providers/deepseek.py | OpenAI-compatible | deepseek-reasoner uses put_inner_thoughts_in_kwargs=False |
Source: letta/schemas/providers/ollama.py, letta/schemas/providers/vllm.py, letta/schemas/providers/sglang.py, letta/schemas/providers/cerebras.py, letta/schemas/providers/deepseek.py.
Cloud API Providers
Cloud providers rely on the OpenAI-compatible request envelope where possible. DeepSeek and Cerebras both delegate to openai_get_model_list_async for model discovery and only override context-window heuristics. Source: letta/schemas/providers/deepseek.py and letta/schemas/providers/cerebras.py.
Cerebras exposes a context-window override that depends on plan tier. Source: letta/schemas/providers/cerebras.py defines get_model_context_window_size, returning 8192 for the free tier and 128000 otherwise, hardcoded with is_free_tier = True as a placeholder.
For the agent loop, the V2/V3 agents inject the provider into LLMConfig and propagate run_id, agent_id, and billing context to the LLM client. Source: letta/agents/letta_agent_v2.py constructs LettaLLMRequestAdapter or LettaLLMStreamAdapter depending on stream_tokens.
Local LLM Providers
Local inference backends are first-class providers in the schema layer. Source: letta/local_llm/README.md directs users to the external docs at letta.readme.io/docs/local_llm for setup.
Ollama uses native /api endpoints rather than OpenAI compatibility. Source: letta/schemas/providers/ollama.py defines raw_base_url to drop a trailing /v1 and queries /api/tags to enumerate models, deliberately skipping a capabilities filter because older Ollama builds do not expose it.
vLLM and SGLang both expose OpenAI-style chat completions on port :v1. Source: letta/schemas/providers/vllm.py appends /v1 if absent; letta/schemas/providers/sglang.py applies the same convention. SGLang additionally supports a SGLangNativeAdapter path used for multi-turn RL training. Source: letta/agents/letta_agent_v3.py selects this adapter when use_sglang_native is set, and it captures per-turn TurnTokenData (output_ids, logprobs) for RL workloads.
Adapter Selection Flow
flowchart TD
A[Agent Step] --> B{stream_tokens?}
B -- yes --> C[LettaLLMStreamAdapter]
B -- no --> D{use_sglang_native?}
D -- yes --> E[SGLangNativeAdapter]
D -- no --> F[SimpleLLMRequestAdapter / LettaLLMRequestAdapter]
C --> G[Provider HTTP Client]
E --> G
F --> G
G --> H[Persist messages + tool calls]Source: letta/agents/letta_agent_v2.py branches on stream_tokens; letta/agents/letta_agent_v3.py adds the use_sglang_native branch.
Community Considerations
Two long-standing community threads shape how providers are consumed:
- Local LLM support (Issue #18, 19 comments) — users requested Ollama-style backends to avoid vendor lock-in. Source: letta/schemas/providers/ollama.py implements Ollama as a typed provider, and the
[Feature Request] Support for local LLMs like Ollamadiscussion motivated the local-llm module. - Azure model selection (Issue #2582, 11 comments) — users report that setting
AZURE_API_KEYandAZURE_BASE_URLdoes not activate Azure models in the model picker, so providers must be created explicitly per agent. This pattern is consistent with howLLMConfig.model_endpointis bound to a specificProviderinstance.
Common Failure Modes
- Trailing slash / missing
/v1— vLLM and SGLang append/v1defensively, but a misconfiguredbase_urlcausesopenai_get_model_list_asyncto 404. Source: letta/schemas/providers/vllm.py. - Ollama capabilities filter — filtering on missing fields silently drops models. Source: letta/schemas/providers/ollama.py avoids filtering and infers support from
model_info. - Parallel tool calls — when
parallel_tool_calls=False, some providers ignore the flag, so the agent truncates client-side. Source: letta/agents/letta_agent_v3.py logs a warning and reduces to the first tool call. - Deprecated API keys — Cerebras marks
api_keyasdeprecated; callers must supply credentials via the encrypted path. Source: letta/schemas/providers/cerebras.py.
See Also
- Agents API overview in README.md
- Local LLM setup guide referenced by letta/local_llm/README.md
- Sandbox tool-result transport (v0.16.8 release notes mention a JSON-over-pickle fix)
- Voice agent tool schemas in letta/agents/voice_agent.py
Source: https://github.com/letta-ai/letta / Human Manual
Tools, Sandboxes, MCP Servers & Extensibility
Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models
Tools, Sandboxes, MCP Servers & Extensibility
Overview
Letta exposes a layered extensibility model that lets agents reach outside the LLM: tools are callable functions attached to an agent, sandboxes provide a secure, isolated runtime for executing user-defined tool code, and MCP servers (plus other external transports) let third-party tool ecosystems plug into the same agent loop. The agent runtime (letta_agent.py, letta_agent_v2.py, letta_agent_v3.py) is responsible for calling these tools, enforcing approval and tool-rule policies, persisting the resulting messages, and streaming them back to clients.
This page describes how the four pieces fit together, based strictly on the source files in this repository.
Tool Taxonomy and Registration
The voice agent's _build_tool_schemas filters an AgentState's tools list by tool_type, which gives a canonical list of where a tool can come from. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.
ToolType | Origin | Typical use |
|---|---|---|
CUSTOM | User-defined function | Tool calls authored by the developer |
LETTA_BUILTIN | Bundled with Letta | First-party capabilities (e.g., search_memory) |
LETTA_FILES_CORE | Core file API | File-system style helpers |
EXTERNAL_MCP | MCP server | Third-party tool ecosystem |
voice_agent.py adds a *virtual* tool, search_memory, by wrapping a function schema with add_pre_execution_message and enable_strict_mode. The injected pre_exec_msg is the filler phrase the agent should say aloud ("Let me double-check my notes—one moment, please.") while the memory search runs in the background. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.
Sandbox Runtime (Modal + TypeScript Server)
Custom tool code must run somewhere safe. Letta ships a skeleton TypeScript server that executes user-defined functions inside a Modal container. The transport is a Unix domain socket, and the wire format is JSON (the v0.16.8 release specifically fixed a security issue by switching from pickle to JSON for sandbox→server tool-result transport). Source: sandbox/resources/server/README.md:sandbox/resources/server/README.md, sandbox/resources/server/package.json:sandbox/resources/server/package.json.
The skeleton has three files:
server.ts— a Node process listening on a Unix socket.entrypoint.ts— deserializes a JSON-encoded input string into the user-defined function's arguments.user-function.ts— fully defined by the end user.
Build and run:
npm install
npm run build
npm run start
Source: sandbox/resources/server/README.md:sandbox/resources/server/README.md.
MCP and External Tool Execution
Beyond sandboxes, Letta supports bring-your-own providers and tool transports. EXTERNAL_MCP is a first-class ToolType in _build_tool_schemas, meaning MCP-served tools flow through the same schema-generation and dispatch path as CUSTOM tools. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.
The provider layer (letta/schemas/providers/) is similarly pluggable. For example, OllamaProvider derives both the raw and OpenAI-compatible base URLs from a single base_url field, and list_llm_models_async calls /api/tags rather than a /v1/models endpoint — reflecting the fact that Ollama is not a drop-in OpenAI replacement, a recurring community concern (see issue #18 in the community context). Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py. SGLangProvider and VLLMProvider follow the OpenAI-compatible pattern and append /v1 to the base URL automatically. Source: letta/schemas/providers/sglang.py:letta/schemas/providers/sglang.py, letta/schemas/providers/vllm.py:letta/schemas/providers/vllm.py.
Tool Execution Lifecycle: Approval, Rule Validation, and Streaming
Every tool call passes through the same lifecycle regardless of whether it originated from a CUSTOM function, a sandbox, or an MCP server:
- Schema construction.
_build_tool_schemasproduces OpenAI-function-call schemas. Strict JSON-schema output is enabled viaenable_strict_modewhen the agent'sllm_config.strictflag is set. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py. - Parallel-tool gating. v3's
letta_agent_v3.pytogglesdisable_parallel_tool_usefor Anthropic/Bedrock andparallel_tool_callsfor OpenAI based on whether the agent declarestool_rules(excludingrequires_approval). Source: letta/agents/letta_agent_v3.py:letta/agents/letta_agent_v3.py. - Rule violation handling. If a model emits a tool name outside the allowed set,
_build_rule_violation_resultproduces aToolConstraintErrorwhosefunc_returnincludes a hint generated byToolRulesSolver.guess_rule_violation. Source: letta/agents/helpers.py:letta/agents/helpers.py. - Approval handling. When
requires_approvalrules are present, v2 emits an approval request message. On the next turn,_maybe_get_approval_messagesdetects the back-to-backapproval/approvalpair, and the agent loop recordsis_approval/is_denialon the resultingMessageand threadsdenial_reasoninto telemetry. Source: letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py, letta/agents/helpers.py:letta/agents/helpers.py. - Persistence. Persisted messages are produced by
create_letta_messages_from_llm_response(v2) or the equivalent v1 path inletta_agent.py, withstep_idandrun_idstamped on each message. Source: letta/agents/letta_agent.py:letta/agents/letta_agent.py, letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py. - Streaming. The Python streaming client parses SSE chunks, dispatching on
tool_call,tool_return,step_count(aLettaUsageStatistics), and OpenAI-styleChatCompletionChunk. On SSE errors whose message mentionsapplication/json, it falls back to a POST and logs the JSON error. Source: letta/client/streaming.py:letta/client/streaming.py.
sequenceDiagram
participant LLM
participant Agent as Agent Loop
participant Rules as ToolRulesSolver
participant Sandbox
participant MCP
LLM->>Agent: tool_call(name, args)
Agent->>Rules: validate(name)
alt allowed
Agent->>Sandbox: dispatch (CUSTOM)
Agent->>MCP: dispatch (EXTERNAL_MCP)
Sandbox-->>Agent: ToolExecutionResult
MCP-->>Agent: ToolExecutionResult
else requires_approval
Agent-->>LLM: approval_request message
LLM->>Agent: approval response
else rule violated
Agent->>Rules: guess_rule_violation
Agent-->>LLM: ToolConstraintError + hint
end
Agent->>Agent: persist with step_id / run_id
Agent-->>Client: SSE tool_return / step_countCommon Failure Modes
- Sandbox transport regression — the v0.16.8 release explicitly fixed a vulnerability by replacing pickle with JSON between the sandbox and server. Operators upgrading from older versions should re-verify that no legacy tool result payloads rely on Python-specific serialization. Source: community release notes (v0.16.8).
- Provider misconfiguration — community reports (issue #2582) describe Azure models not appearing in the agent's model dropdown after env variables are set, only working when passed at agent creation. Provider configuration must be applied through the provider schema, not just environment variables. Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py (illustrative of the provider-config pattern).
- Ollama capability detection — older Ollama versions do not expose a
capabilitiesfield on/api/show, soOllamaProvider.list_llm_models_asyncdeliberately avoids filtering on capabilities and instead infers support frommodel_info. Deployments on stale Ollama builds should not assume capability-based filtering. Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py. - Bedrock model naming —
BedrockProvider.extract_anthropic_model_namestrips the inference-profile prefix so a Bedrock model can be used with the same checks as native Anthropic. Misconfiguredinference_profile_ids will fail this extraction. Source: letta/schemas/providers/bedrock.py:letta/schemas/providers/bedrock.py. - SSE parse errors — the streaming client raises on unknown chunk shapes, so downstream consumers must handle
ValueErrorfrom the iterator. Source: letta/client/streaming.py:letta/client/streaming.py.
See Also
- README.md:README.md — top-level product overview and SDK install steps.
- Provider schemas in
letta/schemas/providers/for the full list of supported LLM backends. - Agent-loop internals:
letta/agents/letta_agent.py,letta/agents/letta_agent_v2.py,letta/agents/letta_agent_v3.py.
Source: https://github.com/letta-ai/letta / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.
1. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/letta-ai/letta
2. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta
3. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/letta-ai/letta
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/letta-ai/letta
5. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using letta with real data or production workflows.
- Error wrapper mislabels upstream provider rate limits as 'Rate limited b - github / github_issue
- Tool execution sandbox: NameError 'DynamicModel' when args_json_schema h - github / github_issue
- Graph causal support + contradiction detection + financial domain exampl - github / github_issue
- Bedrock provider ignores encrypted AWS credentials - github / github_issue
- [[Bug]: Cross-Session State Leakage via Persistent Core Memory Poisoning](https://github.com/letta-ai/letta/issues/3388) - github / github_issue
- VLLM provider ignores encrypted API keys during model discovery - github / github_issue
- VLLM provider crashes when /v1/models omits max_model_len - github / github_issue
- LM Studio provider stores /api/v0 as the inference endpoint - github / github_issue
- LM Studio chat-completions wrapper hides HTTP failures and mutates calle - github / github_issue
- OpenAI-compatible agent chat completions returns 500 for empty messages - github / github_issue
- Documented all-extras install fails by building psycopg2 from source - github / github_issue
- Local LLM completion settings are mutated across requests - github / github_issue
Source: Project Pack community evidence and pitfall evidence