# deepeval - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `confident-ai/deepeval`
- capability: The LLM Evaluation Framework
- expected_user_outcome: The LLM Evaluation Framework

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **DeepEval Overview and Core Architecture**: importance `high`
  - source_paths: deepeval/__init__.py, deepeval/metrics/base_metric.py, deepeval/metrics/indicator.py, deepeval/config/settings.py, deepeval/config/settings_manager.py
- **Tracing, Observability and Framework Integrations**: importance `high`
  - source_paths: deepeval/tracing/__init__.py, deepeval/tracing/tracing.py, deepeval/tracing/trace_context.py, deepeval/tracing/context.py, deepeval/tracing/types.py
- **Evaluation Engine, Metrics and Synthetic Data**: importance `high`
  - source_paths: deepeval/evaluate/evaluate.py, deepeval/evaluate/execute/loop.py, deepeval/evaluate/execute/agentic.py, deepeval/evaluate/execute/e2e.py, deepeval/evaluate/execute/trace_scope.py
- **CLI, Tooling, Extensibility and TypeScript**: importance `medium`
  - source_paths: deepeval/cli/main.py, deepeval/cli/server.py, deepeval/cli/inspect.py, deepeval/inspect/app.py, deepeval/inspect/loader.py

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `c399fb4034ae7a321544826f5fcc6624abf9cc57`
- inspected_files: `pyproject.toml`, `README.md`, `docs/vercel.json`, `docs/README.md`, `docs/package.json`, `docs/proxy.ts`, `docs/tsconfig.json`, `docs/source.config.ts`, `docs/home/read-me.mdx`, `docs/app/robots.ts`, `docs/app/sitemap.ts`, `docs/enterprise/read-me.mdx`, `docs/lib/source.ts`, `docs/lib/remark-admonitions.ts`, `docs/lib/cn.ts`, `docs/lib/authors.ts`, `docs/lib/defaults.ts`, `docs/lib/shared.ts`, `docs/lib/llms-route.ts`, `docs/lib/blog-categories.ts`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/confident-ai/deepeval/issues/1235
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Installation risk requires verification

- Trigger: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/confident-ai/deepeval/issues/2508
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Security or permission risk requires verification

- Trigger: Developers should check this security_permissions risk before relying on the project: Security: request for a submitting security vulnerabilities.
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: Security: request for a submitting security vulnerabilities.. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Developers may expose sensitive permissions or credentials: Security: request for a submitting security vulnerabilities.
- Evidence: failure_mode_cluster:github_issue | https://github.com/confident-ai/deepeval/issues/2744
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Security or permission risk requires verification

- Trigger: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/confident-ai/deepeval/issues/2594
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Configuration risk requires verification

- Trigger: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | github_repo:676829188 | https://github.com/confident-ai/deepeval
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: 🎉 New Interfaces, Reduce ETL Code < 50%!
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: 🎉 New Interfaces, Reduce ETL Code < 50%!. Context: Observed when using python
- Why it matters: Upgrade or migration may change expected behavior: 🎉 New Interfaces, Reduce ETL Code < 50%!
- Evidence: failure_mode_cluster:github_release | https://github.com/confident-ai/deepeval/releases/tag/v3.7.2
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: 🔥 DeepEval 4.0: Eval Harness for Coding Agents, 1-line integrations, TUI for trace inspection!
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: 🔥 DeepEval 4.0: Eval Harness for Coding Agents, 1-line integrations, TUI for trace inspection!. Context: Observed when using python
- Why it matters: Upgrade or migration may change expected behavior: 🔥 DeepEval 4.0: Eval Harness for Coding Agents, 1-line integrations, TUI for trace inspection!
- Evidence: failure_mode_cluster:github_release | https://github.com/confident-ai/deepeval/releases/tag/v4.0.2
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | github_repo:676829188 | https://github.com/confident-ai/deepeval
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Runtime risk requires verification

- Trigger: Developers should check this runtime risk before relying on the project: ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel. Context: Observed when using python
- Why it matters: Developers may hit a documented source-backed failure mode: ConfidentInstrumentationSettings with pydantic-ai: tools_called, expected_tools, and actual_output are all None when using OpenAIResponsesModel
- Evidence: failure_mode_cluster:github_issue | https://github.com/confident-ai/deepeval/issues/2508
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Maintenance risk requires verification

- Trigger: Developers should check this migration risk before relying on the project: 🎉 New Decision Graph Logic for Granular Simulation Control
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: 🎉 New Decision Graph Logic for Granular Simulation Control. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Upgrade or migration may change expected behavior: 🎉 New Decision Graph Logic for Granular Simulation Control
- Evidence: failure_mode_cluster:github_release | https://github.com/confident-ai/deepeval/releases/tag/v4.0.3
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.