Doramagic Project Pack · Human Manual

letta

Platform for stateful agents: AI with advanced memory that can learn and self-improve over time.

Project Overview & System Architecture

Related topics: Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Agent Loops

Continue reading this section for the full explanation and source context.

Section Provider Adapters

Continue reading this section for the full explanation and source context.

Section Streaming Client

Continue reading this section for the full explanation and source context.

Related topics: Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility

Project Overview & System Architecture

Purpose and Scope

Letta (formerly MemGPT) is an open-source framework for building AI agents with advanced, stateful memory and self-improvement capabilities. As described in README.md, the project ships two primary entry points: the Letta Code CLI for local terminal-based agents, and the Letta API with Python and TypeScript SDKs for application integration. The framework is model-agnostic and exposes a full-featured agents API centered on memory_blocks, tools, and a long-lived agent_state.

The repository combines a server runtime, a sandboxed tool-execution environment, multiple LLM provider adapters, and a streaming client. The community's long-standing interest in local LLM support (e.g., issue #18 "Support for local LLMs like Ollama") and Azure model compatibility (issue #2582) has driven much of the provider-layer design seen in the codebase.

High-Level Architecture

Letta is organized into cooperating subsystems: agent loops, LLM provider adapters, message and step persistence, streaming clients, and a sandboxed tool runtime.

flowchart LR
    User[User / SDK Client] -->|messages.create| API[Letta API / REST + WS]
    API --> Agent[Agent Loop v1 / v2 / v3]
    Agent -->|build request| Provider[Provider Adapter<br/>OpenAI / Anthropic / Ollama / SGLang / Cerebras / Azure]
    Provider -->|HTTP / SSE| Upstream[Upstream LLM]
    Upstream --> Provider
    Provider --> Agent
    Agent -->|tool calls| Sandbox[Sandboxed TS Tool Server]
    Sandbox --> Agent
    Agent -->|persist| DB[(Messages, Steps, Memory)]
    Agent -->|SSE stream| Client[Streaming Client]
    Client --> User

Each request flows from the SDK through the agent loop, which dispatches to a configured provider, persists results to the data store, and streams events back to the client. The sandbox runs user-defined tool functions in an isolated TypeScript process, returning JSON-encoded results to the host.

Core Subsystems

Agent Loops

Three agent-loop implementations coexist to support different execution models:

  • LettaAgent (v1) — the classical streaming loop in letta/agents/letta_agent.py. It creates a Step early with StepStatus.PENDING, performs the LLM request, runs _handle_ai_response, updates the step with usage statistics, and emits Server-Sent Events including a final LettaStopReason chunk.
  • LettaAgentV2 — refactored for tool-call correctness in letta/agents/letta_agent_v2.py, adding explicit approval/denial message construction, tool_rule_violated enforcement, and finer-grained pre_computed_assistant_message_id handling.
  • LettaAgentV3 — adds advanced context management and parallel-tool-use gating in letta/agents/letta_agent_v3.py. It toggles disable_parallel_tool_use for anthropic/bedrock and parallel_tool_calls for openai only when no non-approval tool rules are attached, and surfaces logprobs plus token IDs for RL training via the SGLang native path.

A specialized VoiceAgent in letta/agents/voice_agent.py builds OpenAI-style stream=True completions and exposes tools like search_memory for voice-driven recall.

Provider Adapters

The provider layer in letta/schemas/providers/ abstracts model discovery, base URL construction, and prompt formatting.

ProviderFileKey Behavior
Ollamaollama.pyStrips trailing /v1 for native /api/tags and /api/show calls; avoids capability filtering for older versions.
SGLangsglang.pyTreats SGLang as an OpenAI-compatible endpoint and ensures the base URL ends in /v1.
Cerebrascerebras.pyReturns a tier-dependent context window (8K on free, 128K on paid).

These adapters directly address community demand: Ollama support resolves #18, while Azure-style env wiring is handled alongside other base providers in the same layer (see #2582).

Streaming Client

letta/client/streaming.py parses SSE chunks into typed message objects: AssistantMessage, HiddenReasoningMessage, ToolCallMessage, ToolReturnMessage, and LettaUsageStatistics. On SSEError with an application/json body, it falls back to a POST retry and logs the structured error, providing resilient reconnection for long agent runs.

Sandboxed Tool Runtime

User-defined tool functions run in a Modal-hosted TypeScript container. The skeleton in sandbox/resources/server/ listens on a Unix socket, deserializes JSON input, and dispatches to user-function.ts. As of v0.16.8, the host now uses JSON instead of pickle for sandbox→server transport, addressing a security hardening item in the recent release notes.

Configuration and Lifecycle

StepProgression constants in letta/agents/letta_agent.py (START, RESPONSE_RECEIVED, STEP_LOGGED, FINISHED) drive the lifecycle: the loop logs a pending step, records the LLM response, persists tool-call messages via message_manager.create_many_messages_async, and finalizes the step with stop_reason and usage. Helpers in letta/agents/helpers.py build rule-violation messages, decode the last function response, and detect paired approval request/response messages for human-in-the-loop flows.

See Also

  • Agents API Reference — REST endpoints exposed to the SDKs
  • Provider Configuration — how to add or tune a provider
  • Memory Blocks & Context Window Management
  • Sandboxed Tool Execution
  • Streaming Client Protocol

Source: https://github.com/letta-ai/letta / Human Manual

Agent Loop, Memory Blocks & Multi-Agent Groups

Related topics: Project Overview & System Architecture, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Project Overview & System Architecture, LLM Provider Integration: Cloud APIs & Local Models, Tools, Sandboxes, MCP Servers & Extensibility

Agent Loop, Memory Blocks & Multi-Agent Groups

Letta (formerly MemGPT) is an open-source framework for building stateful agents with advanced memory that can learn and self-improve over time Source: [README.md:1-3](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/README.md). At the heart of the system sits a recursive agent loop that consumes user input, rebuilds in-context memory, calls an LLM, executes any returned tool calls, and persists messages. This page documents how the loop is structured across the v1/v2/v3 implementations, how memory blocks are compiled and refreshed, and how multiple agent variants (batch, voice, sleeptime) coordinate work in groups.

Agent Loop Architecture

The core execution pattern is implemented in three parallel files that share the same contract but differ in transport and tooling. The v1 implementation in letta_agent.py exposes the canonical streaming loop. Each iteration: (1) rebuilds memory, (2) generates an LLM request, (3) fetches a response, and (4) processes the response Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). It uses a ToolRulesSolver to enforce tool ordering and instantiates an LLMClient per provider: LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, put_inner_thoughts_first=True, actor=self.actor) Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py).

The loop drives a finite state machine via the StepProgression enum, advancing from START to STEP_LOGGED, RESPONSE_RECEIVED, and finally FINISHED. Each step is logged early with StepStatus.PENDING, then mutated as the LLM call completes and usage statistics arrive Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). On failure, the step is updated with the exception type, message, and traceback.format_exc(); on success, token details (cached, cache-creation, reasoning) are populated from LettaUsageStatistics only when the provider reported them Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py).

The v2 implementation in letta_agent_v2.py factors the loop into _execute_step and adds an explicit _decide_continuation helper that checks request_heartbeat, tool-rule violations, and is_final_step to decide whether to keep stepping Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). v2 also introduces an approval/denial branch where the assistant can pause for user confirmation, with is_approval and is_denial flags propagated into persisted messages via create_letta_messages_from_llm_response Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py).

The v3 implementation in letta_agent_v3.py introduces an LLMAdapter abstraction that wraps the LLM client for blocking, streaming, OpenAI Responses WebSocket, and SGLang-native RL training transports Source: [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v3.py). It also exposes a compaction_trigger_threshold via get_compaction_trigger_threshold(llm_config) so the loop can decide when to summarize before the context window fills Source: [letta/agents/letta_agent_v3.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v3.py).

flowchart TD
    A[User Input] --> B[Prepare In-Context Messages]
    B --> C{Memory or System Changed?}
    C -- Yes --> D[Rebuild Memory via memory.compile]
    C -- No --> E[Reuse cached system message]
    D --> F[Build LLM Request]
    E --> F
    F --> G[LLMCallType.agent_step]
    G --> H{Tool Call Returned?}
    H -- Yes --> I[Execute Tool via ToolRulesSolver]
    I --> J[Persist Tool Messages]
    H -- No --> K[Persist Assistant Message]
    J --> L{Continue Stepping?}
    K --> L
    L -- Yes --> B
    L -- No --> M[Update Step FINISHED + Token Details]

Memory Blocks, Refresh & Summarization

Every agent owns an in-memory representation of memory blocks (e.g. human, persona, summary) that are compiled into the system prompt on each step. The v2 loop calls agent_state.memory.compile(tool_usage_rules=..., sources=..., max_files_open=..., llm_config=...) and short-circuits the rebuild when neither the system prompt nor the compiled memory string changed Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). Before compilation, refresh_memory_async updates memory references and refresh_file_blocks re-syncs attached file content Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py). Archival memory is loaded via archive_manager.get_default_archive_for_agent_async and its tag list is injected as archive_tags Source: [letta/agents/letta_agent_v2.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_v2.py).

When the in-context buffer grows past the configured threshold, the v1 agent instantiates an EphemeralSummaryAgent (only when enable_summarization is set and an OpenAI key is present) alongside a Summarizer configured with partial_evict_summarizer_percentage, message_buffer_limit, and message_buffer_min Source: [letta/agents/letta_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent.py). The ephemeral summary agent constructs a MessageCreate carrying the summary_system_prompt, prepends --- Previous Summary --- to the target block, and writes the condensed text back via block_manager.update_block_async Source: [letta/agents/ephemeral_summary_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/ephemeral_summary_agent.py).

Multi-Agent Groups and Specialized Variants

Letta exposes several agent variants that share the same step contract but specialize in different transports or group semantics. The batch agent in letta_agent_batch.py runs many agents in lockstep, grouping them by current tool call, then calls bulk_update_block_values_async to apply memory updates once per round and _persist_tool_messages to fan tool results back to each agent's message list Source: [letta/agents/letta_agent_batch.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/letta_agent_batch.py). This is the implementation behind multi-agent groups where several agents process the same user message concurrently.

The voice agent in voice_agent.py filters tools down to a voice-safe subset (ToolType.CUSTOM, LETTA_FILES_CORE, LETTA_BUILTIN, EXTERNAL_MCP) and injects a special search_memory tool whose description instructs the model to surface conversational filler while memory is being re-contextualized Source: [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_agent.py). Strict mode is applied per-agent via enable_strict_mode, gated on agent_state.llm_config.strict Source: [letta/agents/voice_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_agent.py).

The voice sleeptime agent in voice_sleeptime_agent.py stores transcript ranges into memory and then forces a rebuild_system_prompt(force=True). It only exposes a synchronous path — step_stream raises NotImplementedError("VoiceSleeptimeAgent does not support async step.") Source: [letta/agents/voice_sleeptime_agent.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/agents/voice_sleeptime_agent.py).

Provider Plug-Ins and Community-Reported Limitations

Letta is model-agnostic. The Ollama provider in schemas/providers/ollama.py exposes raw_base_url (strips /v1) for native /api/tags and /api/show calls and openai_compat_base_url for compatibility-mode clients; it deliberately avoids filtering on the capabilities field because older Ollama builds do not emit it Source: [letta/schemas/providers/ollama.py](https://github.com/letta-ai/letta/blob/1131535716e8a31c9a437f8695e25ac98f203a24/letta/schemas/providers/ollama.py). Community request #18 ("Support for local LLMs like Ollama") is therefore addressed at the provider-schema layer. Azure models have had recurring issues — issue #2582 reports that AZURE_API_KEY / AZURE_BASE_URL are detected but the model list does not populate in the agent dropdown, and explicit model configuration during agent creation is required as a workaround.

See Also

  • README.md — project overview and SDK quickstarts
  • letta/agents/base_agent.py — shared base class for all agent variants
  • letta/services/step_manager.py — persistent storage of step records
  • letta/schemas/agent.pyAgentState, LLMConfig, and memory-block schemas

Source: https://github.com/letta-ai/letta / Human Manual

LLM Provider Integration: Cloud APIs & Local Models

Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, Tools, Sandboxes, MCP Servers & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, Tools, Sandboxes, MCP Servers & Extensibility

LLM Provider Integration: Cloud APIs & Local Models

Overview

Letta is model-agnostic and integrates with a broad spectrum of language model backends, ranging from hosted cloud APIs (OpenAI, Anthropic, Google Vertex, DeepSeek, Cerebras) to self-hosted inference engines (Ollama, vLLM, SGLang). Source: README.md explicitly states "Letta is fully model-agnostic" and recommends Opus 4.5 and GPT-5.2 for best performance.

Provider configuration is exposed through a typed schema layer at letta/schemas/providers/, where each provider is implemented as a subclass of Provider carrying its own base URL, API key handling, context-window logic, and model-discovery routine. Agent code then selects an appropriate LLMClient and adapter at runtime. Source: letta/agents/letta_agent.py creates the client via LLMClient.create(provider_type=agent_state.llm_config.model_endpoint_type, ...).

Provider Schema Architecture

Each provider implementation follows the same shape: a Pydantic model inheriting from Provider, parameterized by ProviderType, and exposing two key methods — get_model_context_window_size and async list_llm_models_async. The enum ProviderType discriminates between categories.

ProviderFileEndpoint StyleNotable Behavior
Ollamaletta/schemas/providers/ollama.pyNative /api/tags and /api/showStrips trailing /v1 from base_url via raw_base_url; avoids filtering on capabilities for older Ollama versions
vLLMletta/schemas/providers/vllm.pyOpenAI-compatible /v1Appends /v1 if missing; supports default_prompt_formatter for completions endpoint
SGLangletta/schemas/providers/sglang.pyOpenAI-compatible /v1Same /v1 normalization as vLLM
Cerebrasletta/schemas/providers/cerebras.pyOpenAI-compatibleContext window hardcoded to 8192 (free tier) or 128000 (paid); note api_key is marked deprecated
DeepSeekletta/schemas/providers/deepseek.pyOpenAI-compatibledeepseek-reasoner uses put_inner_thoughts_in_kwargs=False

Source: letta/schemas/providers/ollama.py, letta/schemas/providers/vllm.py, letta/schemas/providers/sglang.py, letta/schemas/providers/cerebras.py, letta/schemas/providers/deepseek.py.

Cloud API Providers

Cloud providers rely on the OpenAI-compatible request envelope where possible. DeepSeek and Cerebras both delegate to openai_get_model_list_async for model discovery and only override context-window heuristics. Source: letta/schemas/providers/deepseek.py and letta/schemas/providers/cerebras.py.

Cerebras exposes a context-window override that depends on plan tier. Source: letta/schemas/providers/cerebras.py defines get_model_context_window_size, returning 8192 for the free tier and 128000 otherwise, hardcoded with is_free_tier = True as a placeholder.

For the agent loop, the V2/V3 agents inject the provider into LLMConfig and propagate run_id, agent_id, and billing context to the LLM client. Source: letta/agents/letta_agent_v2.py constructs LettaLLMRequestAdapter or LettaLLMStreamAdapter depending on stream_tokens.

Local LLM Providers

Local inference backends are first-class providers in the schema layer. Source: letta/local_llm/README.md directs users to the external docs at letta.readme.io/docs/local_llm for setup.

Ollama uses native /api endpoints rather than OpenAI compatibility. Source: letta/schemas/providers/ollama.py defines raw_base_url to drop a trailing /v1 and queries /api/tags to enumerate models, deliberately skipping a capabilities filter because older Ollama builds do not expose it.

vLLM and SGLang both expose OpenAI-style chat completions on port :v1. Source: letta/schemas/providers/vllm.py appends /v1 if absent; letta/schemas/providers/sglang.py applies the same convention. SGLang additionally supports a SGLangNativeAdapter path used for multi-turn RL training. Source: letta/agents/letta_agent_v3.py selects this adapter when use_sglang_native is set, and it captures per-turn TurnTokenData (output_ids, logprobs) for RL workloads.

Adapter Selection Flow

flowchart TD
    A[Agent Step] --> B{stream_tokens?}
    B -- yes --> C[LettaLLMStreamAdapter]
    B -- no --> D{use_sglang_native?}
    D -- yes --> E[SGLangNativeAdapter]
    D -- no --> F[SimpleLLMRequestAdapter / LettaLLMRequestAdapter]
    C --> G[Provider HTTP Client]
    E --> G
    F --> G
    G --> H[Persist messages + tool calls]

Source: letta/agents/letta_agent_v2.py branches on stream_tokens; letta/agents/letta_agent_v3.py adds the use_sglang_native branch.

Community Considerations

Two long-standing community threads shape how providers are consumed:

  • Local LLM support (Issue #18, 19 comments) — users requested Ollama-style backends to avoid vendor lock-in. Source: letta/schemas/providers/ollama.py implements Ollama as a typed provider, and the [Feature Request] Support for local LLMs like Ollama discussion motivated the local-llm module.
  • Azure model selection (Issue #2582, 11 comments) — users report that setting AZURE_API_KEY and AZURE_BASE_URL does not activate Azure models in the model picker, so providers must be created explicitly per agent. This pattern is consistent with how LLMConfig.model_endpoint is bound to a specific Provider instance.

Common Failure Modes

  1. Trailing slash / missing /v1 — vLLM and SGLang append /v1 defensively, but a misconfigured base_url causes openai_get_model_list_async to 404. Source: letta/schemas/providers/vllm.py.
  2. Ollama capabilities filter — filtering on missing fields silently drops models. Source: letta/schemas/providers/ollama.py avoids filtering and infers support from model_info.
  3. Parallel tool calls — when parallel_tool_calls=False, some providers ignore the flag, so the agent truncates client-side. Source: letta/agents/letta_agent_v3.py logs a warning and reduces to the first tool call.
  4. Deprecated API keys — Cerebras marks api_key as deprecated; callers must supply credentials via the encrypted path. Source: letta/schemas/providers/cerebras.py.

See Also

Source: https://github.com/letta-ai/letta / Human Manual

Tools, Sandboxes, MCP Servers & Extensibility

Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Project Overview & System Architecture, Agent Loop, Memory Blocks & Multi-Agent Groups, LLM Provider Integration: Cloud APIs & Local Models

Tools, Sandboxes, MCP Servers & Extensibility

Overview

Letta exposes a layered extensibility model that lets agents reach outside the LLM: tools are callable functions attached to an agent, sandboxes provide a secure, isolated runtime for executing user-defined tool code, and MCP servers (plus other external transports) let third-party tool ecosystems plug into the same agent loop. The agent runtime (letta_agent.py, letta_agent_v2.py, letta_agent_v3.py) is responsible for calling these tools, enforcing approval and tool-rule policies, persisting the resulting messages, and streaming them back to clients.

This page describes how the four pieces fit together, based strictly on the source files in this repository.

Tool Taxonomy and Registration

The voice agent's _build_tool_schemas filters an AgentState's tools list by tool_type, which gives a canonical list of where a tool can come from. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.

ToolTypeOriginTypical use
CUSTOMUser-defined functionTool calls authored by the developer
LETTA_BUILTINBundled with LettaFirst-party capabilities (e.g., search_memory)
LETTA_FILES_CORECore file APIFile-system style helpers
EXTERNAL_MCPMCP serverThird-party tool ecosystem

voice_agent.py adds a *virtual* tool, search_memory, by wrapping a function schema with add_pre_execution_message and enable_strict_mode. The injected pre_exec_msg is the filler phrase the agent should say aloud ("Let me double-check my notes—one moment, please.") while the memory search runs in the background. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.

Sandbox Runtime (Modal + TypeScript Server)

Custom tool code must run somewhere safe. Letta ships a skeleton TypeScript server that executes user-defined functions inside a Modal container. The transport is a Unix domain socket, and the wire format is JSON (the v0.16.8 release specifically fixed a security issue by switching from pickle to JSON for sandbox→server tool-result transport). Source: sandbox/resources/server/README.md:sandbox/resources/server/README.md, sandbox/resources/server/package.json:sandbox/resources/server/package.json.

The skeleton has three files:

  • server.ts — a Node process listening on a Unix socket.
  • entrypoint.ts — deserializes a JSON-encoded input string into the user-defined function's arguments.
  • user-function.ts — fully defined by the end user.

Build and run:

npm install
npm run build
npm run start

Source: sandbox/resources/server/README.md:sandbox/resources/server/README.md.

MCP and External Tool Execution

Beyond sandboxes, Letta supports bring-your-own providers and tool transports. EXTERNAL_MCP is a first-class ToolType in _build_tool_schemas, meaning MCP-served tools flow through the same schema-generation and dispatch path as CUSTOM tools. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.

The provider layer (letta/schemas/providers/) is similarly pluggable. For example, OllamaProvider derives both the raw and OpenAI-compatible base URLs from a single base_url field, and list_llm_models_async calls /api/tags rather than a /v1/models endpoint — reflecting the fact that Ollama is not a drop-in OpenAI replacement, a recurring community concern (see issue #18 in the community context). Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py. SGLangProvider and VLLMProvider follow the OpenAI-compatible pattern and append /v1 to the base URL automatically. Source: letta/schemas/providers/sglang.py:letta/schemas/providers/sglang.py, letta/schemas/providers/vllm.py:letta/schemas/providers/vllm.py.

Tool Execution Lifecycle: Approval, Rule Validation, and Streaming

Every tool call passes through the same lifecycle regardless of whether it originated from a CUSTOM function, a sandbox, or an MCP server:

  1. Schema construction. _build_tool_schemas produces OpenAI-function-call schemas. Strict JSON-schema output is enabled via enable_strict_mode when the agent's llm_config.strict flag is set. Source: letta/agents/voice_agent.py:letta/agents/voice_agent.py.
  2. Parallel-tool gating. v3's letta_agent_v3.py toggles disable_parallel_tool_use for Anthropic/Bedrock and parallel_tool_calls for OpenAI based on whether the agent declares tool_rules (excluding requires_approval). Source: letta/agents/letta_agent_v3.py:letta/agents/letta_agent_v3.py.
  3. Rule violation handling. If a model emits a tool name outside the allowed set, _build_rule_violation_result produces a ToolConstraintError whose func_return includes a hint generated by ToolRulesSolver.guess_rule_violation. Source: letta/agents/helpers.py:letta/agents/helpers.py.
  4. Approval handling. When requires_approval rules are present, v2 emits an approval request message. On the next turn, _maybe_get_approval_messages detects the back-to-back approval/approval pair, and the agent loop records is_approval / is_denial on the resulting Message and threads denial_reason into telemetry. Source: letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py, letta/agents/helpers.py:letta/agents/helpers.py.
  5. Persistence. Persisted messages are produced by create_letta_messages_from_llm_response (v2) or the equivalent v1 path in letta_agent.py, with step_id and run_id stamped on each message. Source: letta/agents/letta_agent.py:letta/agents/letta_agent.py, letta/agents/letta_agent_v2.py:letta/agents/letta_agent_v2.py.
  6. Streaming. The Python streaming client parses SSE chunks, dispatching on tool_call, tool_return, step_count (a LettaUsageStatistics), and OpenAI-style ChatCompletionChunk. On SSE errors whose message mentions application/json, it falls back to a POST and logs the JSON error. Source: letta/client/streaming.py:letta/client/streaming.py.
sequenceDiagram
    participant LLM
    participant Agent as Agent Loop
    participant Rules as ToolRulesSolver
    participant Sandbox
    participant MCP
    LLM->>Agent: tool_call(name, args)
    Agent->>Rules: validate(name)
    alt allowed
        Agent->>Sandbox: dispatch (CUSTOM)
        Agent->>MCP: dispatch (EXTERNAL_MCP)
        Sandbox-->>Agent: ToolExecutionResult
        MCP-->>Agent: ToolExecutionResult
    else requires_approval
        Agent-->>LLM: approval_request message
        LLM->>Agent: approval response
    else rule violated
        Agent->>Rules: guess_rule_violation
        Agent-->>LLM: ToolConstraintError + hint
    end
    Agent->>Agent: persist with step_id / run_id
    Agent-->>Client: SSE tool_return / step_count

Common Failure Modes

  • Sandbox transport regression — the v0.16.8 release explicitly fixed a vulnerability by replacing pickle with JSON between the sandbox and server. Operators upgrading from older versions should re-verify that no legacy tool result payloads rely on Python-specific serialization. Source: community release notes (v0.16.8).
  • Provider misconfiguration — community reports (issue #2582) describe Azure models not appearing in the agent's model dropdown after env variables are set, only working when passed at agent creation. Provider configuration must be applied through the provider schema, not just environment variables. Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py (illustrative of the provider-config pattern).
  • Ollama capability detection — older Ollama versions do not expose a capabilities field on /api/show, so OllamaProvider.list_llm_models_async deliberately avoids filtering on capabilities and instead infers support from model_info. Deployments on stale Ollama builds should not assume capability-based filtering. Source: letta/schemas/providers/ollama.py:letta/schemas/providers/ollama.py.
  • Bedrock model namingBedrockProvider.extract_anthropic_model_name strips the inference-profile prefix so a Bedrock model can be used with the same checks as native Anthropic. Misconfigured inference_profile_ids will fail this extraction. Source: letta/schemas/providers/bedrock.py:letta/schemas/providers/bedrock.py.
  • SSE parse errors — the streaming client raises on unknown chunk shapes, so downstream consumers must handle ValueError from the iterator. Source: letta/client/streaming.py:letta/client/streaming.py.

See Also

Source: https://github.com/letta-ai/letta / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

1. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/letta-ai/letta

2. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

3. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/letta-ai/letta

4. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/letta-ai/letta

5. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

6. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/letta-ai/letta

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using letta with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence