Doramagic Project Pack · Human Manual
pydantic-ai
AI Agent Framework, the Pydantic way
Overview & Core Agent System
Related topics: Models, Providers & Structured Outputs, Tools, Toolsets, MCP & Durable Execution
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Models, Providers & Structured Outputs, Tools, Toolsets, MCP & Durable Execution
Overview & Core Agent System
Pydantic AI is a Python agent framework designed to help developers build production-grade applications and workflows with Generative AI. Its core abstraction is the Agent, a model-agnostic, type-safe object that coordinates model calls, tool execution, dependency injection, structured outputs, and observability. The core agent system lives inside the pydantic_ai package (shipped as pydantic-ai-slim for minimal installs) and is extended by companion libraries such as pydantic-evals and pydantic-graph. Source: README.md, pydantic_ai_slim/README.md.
Purpose and Scope
The core agent system provides the foundational building blocks for constructing AI agents:
- A generic, parameterized
Agent[DependenciesT, OutputT]that statically enforces dependency and output types at write time, moving entire classes of errors from runtime to compile time. Source: pydantic_ai_slim/pydantic_ai/agent/abstract.py. - A composable capabilities layer (
Thinking,WebSearch, MCP integrations, etc.) that bundles tools, hooks, instructions, and model settings into reusable units. Source: pydantic_ai_slim/pydantic_ai/capabilities/capability.py. - A wrapper hierarchy (
AbstractAgent→ concreteAgent→WrapperAgent) that allows agents to delegate to other agents, enabling sub-agent delegation and handoff patterns frequently requested by the community. Source: pydantic_ai_slim/pydantic_ai/agent/wrapper.py. - A YAML/JSON agent spec format (
AgentSpec) for declaring agents declaratively, with no code required. Source: pydantic_ai_slim/pydantic_ai/agent/spec.py.
The README positions Pydantic AI as "the FastAPI feeling for GenAI," built on top of Pydantic Validation and modern Python type hints. Source: README.md.
Architecture and Component Layout
The core agent package is organized into a small set of focused modules under pydantic_ai_slim/pydantic_ai/agent/. The __init__ module re-exports the public Agent class and supporting types, while internal concerns are split across dedicated files. The diagram below shows the high-level relationship between the principal components:
graph TD
A[AbstractAgent] --> B[Agent]
B --> C[WrapperAgent]
D[AgentSpec YAML/JSON] --> B
E[Capabilities] --> B
F[Tools / RunContext] --> B
G[Model Providers] --> B
B --> H[Structured Output Pydantic]
B --> I[Logfire / OTel Observability]
C --> B
C --> CKey architectural notes:
AbstractAgentdefines the contract (run,run_sync,iter, hooks) implemented by all agent variants. Source: pydantic_ai_slim/pydantic_ai/agent/abstract.py.WrapperAgentcomposes one or more inner agents, supporting the sub-agent delegation pattern discussed in community issue #1978 ("Handoffs / sub-agent delegation"). Source: pydantic_ai_slim/pydantic_ai/agent/wrapper.py.- Capabilities are a first-class composition primitive introduced and stabilized across the v1.10x releases, allowing tools, hooks, and model settings to be packaged together. Source: pydantic_ai_slim/pydantic_ai/capabilities/capability.py.
AgentSpecdecouples the agent definition from Python source, making YAML/JSON-defined agents loadable at runtime. Source: pydantic_ai_slim/pydantic_ai/agent/spec.py.
Core Usage Patterns
The most common patterns exposed by the core agent system are summarized below.
| Pattern | Primary API | Notes |
|---|---|---|
| Hello-world agent | Agent(model).run_sync(prompt) | Minimal entry point, no dependencies or output type. Source: README.md. |
| Typed output | Agent(..., output_type=SupportOutput) | Uses Pydantic models for validated structured output; re-prompts the model on validation failure. Source: README.md. |
| Dependency injection | deps_type=SupportDependencies and @agent.instructions/@agent.tool decorators | RunContext[D] carries dependencies to instructions and tools. Source: README.md. |
| Capabilities | capabilities=[Thinking(), WebSearch()] | Bundles tools, hooks, and settings; supports on-demand/deferred loading as added in v1.105.0. Source: pydantic_ai_slim/pydantic_ai/capabilities/__init__.py. |
| Declarative agent | YAML/JSON loaded by AgentSpec | Useful for tooling and non-code agent definitions. Source: pydantic_ai_slim/pydantic_ai/agent/spec.py. |
| CLI | clai -m openai:gpt-5.2 "prompt" and clai web | Command-line and web chat interfaces for one-shot or interactive runs. Source: clai/README.md. |
Extensibility and Community Considerations
The core agent system is deliberately extensible. New model providers, toolsets (MCP, OpenAPI), graphs, and durable execution layers all plug into the same Agent interface, which is one reason the project advertises a "extensible by design" philosophy. Source: README.md. Companion packages extend the system in orthogonal directions:
pydantic-evalsprovides dataset-based evaluation of stochastic functions (including but not limited to Pydantic AI agents) and integrates with Pydantic Logfire for trace-aware evaluators. Source: pydantic_evals/README.md.pydantic-graphis a type-hint-driven graph and finite state machine library that uses the same dependency-injection and validation conventions as the agent system. Source: pydantic_graph/README.md.- A growing catalog of runnable examples demonstrates banking support agents, RAG pipelines, and other real-world patterns. Source: examples/README.md.
Several open community issues track limitations that interact with the core agent system and are worth knowing about:
- Issue #5760 reports that the
model_request_parametersOpenTelemetry span attribute serializes the entireModelRequestParametersdataclass, which can bloat traces when large fields are present. Source: Issue #5760. - Issue #5764 highlights silent data loss in Vercel AI/AG-UI adapters, where
FileUrl.vendor_metadataandBinaryContent.vendor_metadataare dropped on round-trip. These fields are documented as load-bearing for several model providers, so round-trip behavior is a core-system concern, not just an adapter concern. Source: Issue #5764. - Issue #5095 requests support for Vertex AI Priority PayGo under
google_service_tier, building on the earlier Flex PayGo pattern in v1.x. Source: Issue #5095. - Issue #5730 proposes a policy/audit gating layer on top of the existing deferred-tools mechanism, indicating ongoing community interest in human-in-the-loop and approval workflows. Source: Issue #5730.
See Also
- Models overview and provider-specific adapters
- Tools, toolsets, and the Model Context Protocol (MCP) integration
- Capabilities (Thinking, WebSearch, deferred loading) introduced in v1.105.0
- Durable execution and human-in-the-loop tool approval
pydantic-evalsandpydantic-graphcompanion packages
Source: https://github.com/pydantic/pydantic-ai / Human Manual
Models, Providers & Structured Outputs
Related topics: Overview & Core Agent System, Tools, Toolsets, MCP & Durable Execution
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent System, Tools, Toolsets, MCP & Durable Execution
Models, Providers & Structured Outputs
1. Overview and Scope
Pydantic AI is a model-agnostic agent framework built on top of Pydantic Validation. Its "Models, Providers & Structured Outputs" layer is the boundary between user code and the dozens of LLM vendors the framework supports. The same Agent class is reused across providers; the provider-specific adapter translates between Pydantic AI's internal message/tool schema and the vendor's wire format. Source: README.md.
The core slim package keeps required dependencies minimal; the model adapters live in optional extras (e.g. pydantic-ai[openai], pydantic-ai[anthropic]). Source: pydantic_ai_slim/README.md.
Structured outputs are not a separate subsystem — they are produced by the same model layer, but the returned payload is validated against a user-supplied BaseModel and re-prompted on failure. The README explicitly demonstrates this with the SupportOutput example, where output_type=SupportOutput guarantees a typed result. Source: README.md.
2. Provider Surface
2.1 First-Party and Hosted Providers
Pydantic AI ships adapters for the major hosted and self-hosted vendors. The README's "Why use Pydantic AI" section enumerates them: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, Perplexity, plus hosted platforms (Azure AI Foundry, Amazon Bedrock, Google Cloud, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, Alibaba Cloud, SambaNova). Source: README.md.
If a provider is missing, a custom model can be implemented against the Model protocol. Source: README.md.
2.2 Provider Selection at Runtime
Models are addressed with the <provider>:<model> string format. The CLI (clai) accepts -m/--model in the same shape, defaulting to openai-chat:gpt-5. Source: clai/README.md. The same syntax is accepted in Agent(...) constructors and in YAML/JSON agent specs, so a deployment can switch providers without code changes. Source: README.md.
2.3 Architecture at a Glance
flowchart LR
User[User Code / Agent] --> Core[pydantic-ai Core<br/>messages, tools, output schema]
Core --> Adapter[Provider Adapter]
Adapter --> OpenAI[OpenAI / Anthropic / Gemini / ...]
Adapter --> Bedrock[Bedrock / Vertex / Azure]
Adapter --> Custom[Custom Model Implementation]
Core --> Validator[Pydantic Validation<br/>structured output retry]Source: README.md, pydantic_ai_slim/README.md.
3. Structured Outputs and the Output Schema
The structured-output contract is the killer feature that keeps users inside Pydantic AI even when they need type-safe results. The README's bank-support example defines:
class SupportOutput(BaseModel):
support_advice: str = Field(description='Advice returned to the customer')
block_card: bool = Field(description="Whether to block the customer's card")
risk: int = Field(description='Risk level of query', ge=0, le=10)
support_agent = Agent(
'openai:gpt-5.2',
deps_type=SupportDependencies,
output_type=SupportOutput,
...
)
Validation failures are passed back to the LLM as a retry prompt. Source: README.md. Streamed variants exist (agent.run_stream) so consumers can receive partial validated output continuously. Source: README.md.
A long-running community thread (issue #582, "Structured outputs as an alternative to Tool Calling") discusses inconsistencies between vendors for the same Pydantic schema. The maintainer response points at toolsets and the capabilities layer as the forward path, but the immediate workaround is provider-specific option tuning (output_type + output_tool / output_schema). Source: community context, issue #582.
4. Capabilities, Tools, and Observability Hooks
In v1.105 / v2-beta, the model layer is wrapped by a composable capabilities system. Each capability bundles instructions, tools, model settings, and hooks. Built-ins include Thinking, WebSearch, and MCP-backed providers. The README's "Hello World" example demonstrates the on-demand capability flow:
from pydantic_ai import Agent
from pydantic_ai.capabilities import Thinking, WebSearch
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Be concise, reply with one sentence.',
capabilities=[Thinking(), WebSearch()],
)
Source: README.md. This was reinforced in v1.105.0, which added "On-demand (deferred loading) capabilities, including instructions, tools, model settings, and hooks" via PR #5230. Source: community context, release v1.105.0.
4.1 Known Sharp Edges
| Issue | Symptom | Workaround / Status |
|---|---|---|
Vercel AI / AG-UI adapters drop FileUrl.vendor_metadata and BinaryContent.vendor_metadata on round-trip | Provider-specific metadata is silently lost | Issue #5764 |
model_request_parameters OTel attribute serializes the whole dataclass on every model-invoke span | Inflated traces, slow serialisation | Issue #5760 |
Vertex AI "Priority PayGo" not yet a value of google_service_tier | Header X-Vertex-AI-LLM-Shared-Request-Type: priority is dropped | Issue #5095 |
xAI adapter missing newer SDK options (conversation_id, seed, ...) | Provider options are silently dropped | Issue #5662 |
OpenRouter / xAI / Bedrock hybrid routes drop thinking=False | Forwarded as True even when caller disables | Fixed in v1.104.0 (release notes) |
Source: community context, linked issues and release notes.
5. Related Subsystems
The model layer does not stand alone — three sibling packages in the monorepo interact with it:
- Pydantic Evals — measures accuracy/performance of the agent; designed to be used with any stochastic function but integrates directly with Pydantic AI agents and emits OTel traces that can be viewed in Logfire. Source: pydantic_evals/README.md.
- Pydantic Graph — a standalone graph/finite-state-machine library usable independently of the model layer, but reachable from agents. Source: pydantic_graph/README.md.
- clai — a CLI front-end that picks a model with the same
<provider>:<model>syntax and supports an interactive web chat UI. Source: clai/README.md.
End-user examples, including RAG, are collected under examples/README.md and rendered on the docs site via Cloudflare Workers static assets (see docs-site/README.md and the release aggregator in docs-site/src/index.ts).
See Also
Source: https://github.com/pydantic/pydantic-ai / Human Manual
Tools, Toolsets, MCP & Durable Execution
Related topics: Overview & Core Agent System, UI Adapters, Embeddings & Evaluation
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent System, UI Adapters, Embeddings & Evaluation
Tools, Toolsets, MCP & Durable Execution
Pydantic AI provides a layered mechanism for extending an Agent with capabilities that are external to the LLM itself: callable Python functions ("tools"), composable collections of tools ("toolsets"), the Model Context Protocol for cross-process tool access, and durable execution for long-running and resumable workflows. The project is designed so that tools and toolsets integrate naturally with Pydantic Validation, dependency injection, and observability, while remaining model-agnostic across providers such as OpenAI, Anthropic, Google Vertex, Bedrock, and others. Source: README.md
Tools
Purpose and Registration
A *tool* in Pydantic AI is a Python function that the LLM is allowed to invoke during a run. Tools are registered on an Agent via the @agent.tool decorator. Their function signature is introspected to build the JSON schema that is sent to the model, and arguments are validated through Pydantic before the function executes. Source: README.md
The simplest registration takes only the function:
@support_agent.tool
async def customer_balance(
ctx: RunContext[SupportDependencies], include_pending: bool
) -> float:
"""Returns the customer's current account balance."""
return await ctx.deps.db.customer_balance(
id=ctx.deps.customer_id,
include_pending=include_pending,
)
The function's docstring is forwarded to the LLM as the tool description, and individual parameter descriptions are extracted from the docstring and merged into the tool schema sent to the model. Source: README.md
Dependency Injection via `RunContext`
Tools and dynamic instructions can request runtime context by typing their first parameter as RunContext[DepT], where DepT is the agent's declared deps_type. The agent carries the dependency instance from agent.run_sync(..., deps=...) (or its async equivalent) into the tool, providing a type-safe way to thread database connections, user identity, configuration, or any other state into tool implementations. Static type checkers validate that the declared deps_type matches the RunContext annotation. Source: README.md
Deferred (Human-in-the-Loop) Tools
Pydantic AI supports flagging tool calls so that they require explicit approval before they execute. This "deferred tools" mechanism is useful for human-in-the-loop scenarios and for gated, policy-driven tool authorization, as discussed by the community in the Agent_Sudo proposal (issue #5730). When a tool call is deferred, the agent pauses until the call is approved or rejected, at which point execution resumes. Source: README.md
`McpServer` Prompt Access
The v1.103.0 release added list_prompts and get_prompt functionality to McpServer, allowing agents to discover and pull reusable prompt templates exposed by an MCP server and treat them like any other instruction. Source: v1.103.0 release notes
Toolsets
Why Toolsets?
A single agent often needs to combine tools from multiple sources: locally defined functions, remote services via MCP, third-party APIs, and per-run overrides. A "toolset" is a reusable abstraction that groups tools together behind a uniform interface, so an agent can be configured with a list of toolsets rather than a flat list of functions. The "toolsets" concept was first described publicly in the Toolsets/OpenAPI/MCP planning issue (#110) as the unifying layer for tool sources. Source: Issue #110
Core Toolset Types
The toolsets package exposes several building blocks. The abstract base class, AbstractToolset, defines the contract that all toolsets implement, including how tools are listed and how tool calls are dispatched at run time. Source: pydantic_ai_slim/pydantic_ai/toolsets/abstract.py
FunctionToolset is the standard way to register a set of plain Python functions as tools; it backs the @agent.tool decorator and provides the schema-generation, validation, and dispatch logic shared with the legacy tools.py API. Source: pydantic_ai_slim/pydantic_ai/toolsets/function.py
CombinedToolset composes multiple toolsets into one logical set, so an agent configured with a CombinedToolset sees the union of the tools from each child toolset. This is the recommended way to mix local functions, MCP servers, and other sources in a single agent. Source: pydantic_ai_slim/pydantic_ai/toolsets/combined.py
FilteredToolset wraps another toolset and exposes only a subset of its tools, optionally transforming names or descriptions. It is useful when an agent has a large general-purpose toolset but should only see a curated subset for a particular task, or when exposing the same toolset to multiple agents with different permissions. Source: pydantic_ai_slim/pydantic_ai/toolsets/filtered.py
The package's public API is re-exported from the toolsets module's __init__.py, so users import concrete toolset classes (and any future ones) from a single location. Source: pydantic_ai_slim/pydantic_ai/toolsets/__init__.py
The legacy tools.py module continues to host the lower-level Tool, ToolDefinition, and argument validation primitives, which the FunctionToolset builds on top of. Source: pydantic_ai_slim/pydantic_ai/tools.py
Composable Capabilities
The v1.105.0 release introduced on-demand ("deferred loading") capabilities that bundle tools, instructions, model settings, and hooks into reusable units. Capabilities such as Thinking and WebSearch are passed via the capabilities=[...] argument of Agent, allowing an agent to be assembled from composable building blocks. Source: v1.105.0 release notes and README.md
MCP (Model Context Protocol)
Pydantic AI integrates the Model Context Protocol as a first-class toolset implementation. An MCP server — local subprocess or remote endpoint — is wrapped by an McpServer toolset and added to the agent just like any other toolset, typically inside a CombinedToolset. Once attached, the agent can list and call any tool that the MCP server advertises, and from v1.103.0 onward it can also list and fetch server-defined prompt templates. Source: README.md and v1.103.0 release notes
MCP tools share the same RunContext and validation pipeline as locally defined tools, so a tool call that fails validation is reported back to the LLM as an error, allowing the model to retry. Source: README.md
Durable Execution
For workflows that must survive transient API failures, application restarts, long pauses (for example while waiting on human-in-the-loop approval), or asynchronous external events, Pydantic AI provides a durable execution layer. Durable agents preserve their progress across failures and restarts, and support long-running and human-in-the-loop workflows with production-grade reliability. The mechanism relies on serializable agent state, persisted tool results, and resumption from the last completed step rather than re-running the entire conversation. Source: README.md
For complex flows whose control flow cannot be expressed as a single linear agent run, Pydantic AI also offers Pydantic Graph: a graph and finite state machine library whose nodes are typed Python callables, including Pydantic AI agents. Graphs are useful for stateful multi-step workflows, retries, and explicit handoffs between agents. Source: pydantic_graph/README.md
Architecture at a Glance
flowchart LR
A[Agent] --> TS[CombinedToolset]
TS --> FTS[FunctionToolset<br/>@agent.tool functions]
TS --> MCP[McpServer<br/>MCP / A2A / UI]
TS --> F[FilteredToolset<br/>curated subset]
FTS --> CTX[RunContext[DepsType]]
MCP --> CTX
A --> DUR[Durable Execution<br/>resume on failure]
A --> OBS[Logfire / OTel<br/>observability]Common Failure Modes and Considerations
- Round-trip data loss in UI adapters. The community reports (issue #5764) that
FileUrl.vendor_metadataandBinaryContent.vendor_metadataare silently dropped by the Vercel AI and AG-UI adapters, even though these fields are documented as load-bearing for several model providers. Treat vendor metadata as adapter-dependent and verify it survives the round trip when using a UI stream. Source: Issue #5764 - Span attribute bloat. Issue #5760 reports that the
model_request_parametersspan attribute serializes the entireModelRequestParametersdataclass on every model-invoke span, including fields that are not actually sent to the model. This can bloat traces and should be considered when sizing your observability backend. Source: Issue #5760 - Tool gating and audit. A community proposal (issue #5730) suggests introducing a policy/audit layer that wraps the deferred-tools mechanism to authorize tool calls externally (e.g., via a local permission gateway) and produce cryptographic audit logs. Source: Issue #5730
See Also
- README.md — top-level overview of features, providers, and examples
- pydantic_ai_slim/README.md — the slim package that ships the tools and toolsets modules
- pydantic_graph/README.md — graph and finite state machine library for complex workflows
- pydantic_evals/README.md — evaluating agent and tool behavior with datasets
- Release v1.105.0 — on-demand (deferred loading) capabilities
- Release v1.103.0 —
list_promptsandget_promptforMcpServer - Issue #110 — original "Toolsets, OpenAPI and MCP" design discussion
Source: https://github.com/pydantic/pydantic-ai / Human Manual
UI Adapters, Embeddings & Evaluation
Related topics: Overview & Core Agent System, Models, Providers & Structured Outputs
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & Core Agent System, Models, Providers & Structured Outputs
UI Adapters, Embeddings & Evaluation
Pydantic AI ships three independent but complementary surfaces that sit on top of the core agent runtime: a family of UI adapters that translate between Pydantic AI message streams and external UI protocols, an embeddings story that is largely surfaced through the model-provider layer, and a dedicated evaluation library (pydantic_evals) for systematically measuring non-deterministic agent behavior. This page maps each surface to the source files that implement it and highlights the community-reported failure modes the maintainers have been actively working on.
1. UI Adapters and Round-Trip Semantics
UI adapters expose Pydantic AI agents behind standardized streaming event protocols so that the same agent can drive a Vercel AI chat UI, an AG-UI frontend, an A2A endpoint, or a custom web client without rewriting the transport. The project advertises this as one of the headline capabilities of the framework, alongside MCP and durable execution Source: [README.md:80-100].
flowchart LR
A[Pydantic AI Agent] -->|stream events| B(UI Adapter Base)
B --> C[Vercel AI Adapter]
B --> D[AG-UI Adapter]
B --> E[A2A Adapter]
C --> F[Vercel AI SDK / UIMessage]
D --> G[AG-UI Event Stream]
E --> H[Agent-to-Agent Transport]The most active area of community concern in 2026 has been round-trip fidelity between the internal Pydantic AI message model and the external protocol payloads. Issue #5764 documents that FileUrl.vendor_metadata and BinaryContent.vendor_metadata are silently dropped on round-trip through the Vercel AI and AG-UI adapters, even though those fields are documented as load-bearing for several model providers. Release v1.103.0 (2026-06-02) explicitly added "Round-trip message timestamps through VercelAIAdapter's UIMessage.metadata", which is a step in the same direction but does not yet close the gap on vendor_metadata Source: [README.md:120-160]. A dedicated GitHub Actions workflow called *Pydantic AI Round-Trip Sweep* exists to catch these regressions, and its failures are tracked in issues #5755 and #5685.
For a developer integrating an adapter, the practical guidance is: any provider-specific metadata that is not part of the standard UIMessage schema should be considered loss-prone until the round-trip sweep covers it, and should be re-attached server-side if the frontend needs it.
2. Embeddings and the Vector-Search Gap
Pydantic AI does not yet expose a first-class embeddings API. The current recommendation is to call provider SDKs directly — for example, the in-tree RAG example previously used OpenAI's plain embeddings.create API to generate vectors. Issue #58 ("Vector search and embeddings API") has 17 comments and remains the canonical thread tracking demand for a unified surface that mirrors the model abstraction (Model → EmbeddingsModel), with consistent settings, retries, and observability hooks. As of the v1.x and v2 beta releases, no module under pydantic_ai.models is dedicated to embeddings; everything flows through ModelRequestParameters and provider-specific request shapes instead.
Because of this gap, embeddings calls show up in telemetry as regular model invocations, which surfaces the related issue #5760: the model_request_parameters span attribute serializes the *entire* ModelRequestParameters dataclass on every model-invoke span, including large fields that are not actually sent to the model. For RAG workloads that pass many tool schemas, this can balloon trace storage costs and obscure what the model really saw. The recommended mitigation today is to keep embeddings flows on a dedicated tracer scope and to avoid stuffing transient retrieval results into the agent's RunContext if they are not needed for downstream tool calls.
3. Evaluation with `pydantic_evals`
pydantic_evals is a separate PyPI package that lives in its own subdirectory and is intentionally not tied to Pydantic AI at runtime. The README is explicit: "this library only uses Pydantic AI for a small subset of generative functionality internally, and it is designed to be used with arbitrary 'stochastic function' implementations" Source: [pydantic_evals/README.md:10-18]. The core abstractions are Case (a single test input plus optional expected output), Dataset (a collection of cases), and Evaluator[T, U] (an async function that scores a case and returns a value, often normalized to a float in [0, 1]). Evaluators receive an EvaluatorContext that exposes the output, the expected output, the inputs, and the full case metadata, so path-aware evaluators can be written without monkey-patching.
Integration with Pydantic Logfire is the recommended path: logfire.configure(send_to_logfire='if-token-present', environment='development', service_name='evals') followed by my_dataset.evaluate_sync(my_task) Source: [pydantic_evals/README.md:30-55]. Logfire then renders dedicated "Evals Overview" and "Case View" dashboards that show inputs, outputs, token usage, durations, and the full OTel trace for each case. Because the library is framework-agnostic, you can evaluate a LangChain chain, a hand-rolled prompt, or a Pydantic AI agent with the same Dataset definition.
4. CLI and the `clai` Web Chat
The clai command (pronounced "clay") provides both a one-shot REPL and a streaming web chat interface. The web mode defaults to http://127.0.0.1:7932 and is launched with clai web -m openai:gpt-5.2, or pointed at an existing agent via clai web --agent my_agent:my_agent Source: [clai/README.md:30-70]. The web UI is a thin consumer of the same UI-adapter layer described in §1, which is why improvements to adapter round-trip semantics (timestamps in UIMessage.metadata, deferred approval hooks, etc.) show up in the chat UI "for free".
See Also
- README.md — overview of capabilities, models, and observability
- pydantic_evals/README.md — full evaluation API and Logfire integration
- clai/README.md — CLI and web chat usage
- pydantic_ai_slim/README.md — minimal-dependency core install
- examples/README.md — runnable patterns including RAG
Source: https://github.com/pydantic/pydantic-ai / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 38 structured pitfall item(s), including 14 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_1ba1e19b3084422e8159cd03a471ff22 | https://github.com/pydantic/pydantic-ai/issues/530
2. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_33fbdccc2f504c9b8e403096036955b2 | https://github.com/pydantic/pydantic-ai/issues/4580
3. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_541e0749d4804613aefb3265ecdbffd3 | https://github.com/pydantic/pydantic-ai/issues/4773
4. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_2fc34eb4512b462e8dae7f13caca7503 | https://github.com/pydantic/pydantic-ai/issues/5764
5. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_e96ced1f99cd4c5fa1d03d941a794046 | https://github.com/pydantic/pydantic-ai/issues/5755
6. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_dc917318a7ca470d83f95ba2d1c75343 | https://github.com/pydantic/pydantic-ai/issues/5760
7. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_435f54efc7884881b6a2d5da81e14c64 | https://github.com/pydantic/pydantic-ai/issues/5160
8. Maintenance risk: Maintenance risk requires verification
- Severity: high
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_be644079d636479eb07771abb7e71b3f | https://github.com/pydantic/pydantic-ai/issues/5765
9. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Developers should check this security_permissions risk before relying on the project: Proposal: Gating tool execution with a policy/audit layer
- User impact: Developers may expose sensitive permissions or credentials: Proposal: Gating tool execution with a policy/audit layer
- Recommended check: Before packaging this project, run the relevant install/config/quickstart check for: Proposal: Gating tool execution with a policy/audit layer. Context: Source discussion did not expose a precise runtime context.
- Evidence: failure_mode_cluster:github_issue | fmev_a90f4d42528dc6e6897fd66b928ca7f3 | https://github.com/pydantic/pydantic-ai/issues/5730
10. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_aa4ad5bf22b842c8858396f7ed4269af | https://github.com/pydantic/pydantic-ai/issues/5769
11. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_61823dcdcb09428da7bf968335da3cb6 | https://github.com/pydantic/pydantic-ai/issues/5770
12. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_19737eac52114aa19adaf7cac89b18e1 | https://github.com/pydantic/pydantic-ai/issues/5685
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using pydantic-ai with real data or production workflows.
- Gateway: Adaptive routing — latency-aware provider selection - github / github_issue
- [[aw] Pydantic AI Round-Trip Sweep failed](https://github.com/pydantic/pydantic-ai/issues/5755) - github / github_issue
- [[aw] Pydantic AI Stale Issues Finder failed](https://github.com/pydantic/pydantic-ai/issues/5676) - github / github_issue
- RFC: Pluggable cross-run memory layer (AbstractMemoryStore) - github / github_issue
- Feature request: Structured inter-agent message passing for multi-agent - github / github_issue
- GoogleModel returns empty responses (0 tokens) after v1.92.0 streaming c - github / github_issue
- [[Feature] Add
/usageslash command toclaiCLI to display cumulative](https://github.com/pydantic/pydantic-ai/issues/5770) - github / github_issue - Ability to Persist Messages in External Stores - github / github_issue
- Installation risk requires verification - GitHub / issue
- Configuration risk requires verification - GitHub / issue
- Maintenance risk requires verification - GitHub / issue
- Security or permission risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence