Doramagic Project Pack · Human Manual
evidently
Evidently is \u200b\u200ban open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Overview and System Architecture
Related topics: Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails, UI Service, Storage Backends, and Deployment
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails, UI Service, Storage Backends, and Deployment
Overview and System Architecture
Evidently is an open-source framework for evaluating, testing, and monitoring machine learning and LLM-powered systems. The repository is organized as a monorepo containing a Python evaluation library, a TypeScript/React user interface, a self-hosted backend service, supporting examples, and a documentation generation toolchain. Source: README.md.
Purpose and Scope
Evidently targets the full lifecycle of ML and LLM systems, from offline experiments to production monitoring. The framework operates on tabular and text data and supports both predictive (classification, regression) and generative (RAG, agentic) tasks. Its feature set includes 100+ built-in metrics, descriptor-based feature analysis, LLM judges, guardrails, and a prompt registry. Source: README.md.
The system is intentionally modular. Users can run one-off evaluations in Python, embed pass/fail test conditions in CI/CD pipelines, or operate a full self-hosted monitoring service backed by a web UI. The README frames Evidently around two primitives — Reports (summaries of metrics) and Test Suites (Reports augmented with pass/fail conditions), with presets and custom metrics as the extension points. Source: README.md.
Repository Layout
The top-level structure separates concerns between the evaluation core, the UI, examples, and tooling:
| Directory | Role |
|---|---|
src/evidently/ | Python library implementing reports, metrics, descriptors, and the self-hosted UI backend (ui/service/) |
ui/ | Frontend workspaces: evidently-ui-lib (shared React components) and service (the Vite-built SPA served by the backend) |
examples/ | End-to-end tutorials, cookbook snippets, service demos, Grafana integrations, and sample datasets |
api-reference/ | pdoc-based API reference generation with uv and live-reload support |
Source: README.md; examples/README.md; api-reference/README.md.
The examples/ directory is itself divided into cookbook/ (short, focused recipes), tutorials/ (longer end-to-end walkthroughs), service/ (local full-stack demo), and grafana/ (dashboard integrations). Source: examples/README.md; examples/cookbook/README.md.
Core Subsystems
The following diagram summarizes how the Python evaluation core, the backend service, and the React UI interact, with supporting subsystems for prompts, traces, and descriptors.
flowchart LR
A[Python Client<br/>Reports / Test Suites / Descriptors] --> B[evidently library<br/>src/evidently]
B --> C[UI Backend Service<br/>src/evidently/ui/service]
C --> D[evidently-ui-lib<br/>React components]
D --> E[service SPA<br/>Vite + React 18]
C --> F[Artifact Manager<br/>Prompts / Datasets]
B --> G[Traces & Spans]
G --> C
F --> D
B --> H[API Reference<br/>pdoc + uv]Reports and Test Suites. Reports are collections of metrics that can be exported as JSON, Python dicts, or HTML. Test Suites add pass/fail conditions on top of Reports. In the UI, test results are rendered via TestData, which maps backend TestState values (success, warning, fail, unknown, error) to MUI AlertColor severities and supports collapsible detail blocks. Source: ui/packages/evidently-ui-lib/src/widgets/TestSuiteWidget/TestData.tsx.
Descriptors and LLM judges. Descriptors are column-level feature extractors used to compute things like text length, sentiment, or LLM-based classifications. The UI's makeEmptyLLMJudgeDescriptorTemplate helper bootstraps a binary classification prompt template with target_category, non_target_category, include_category, and uncertainty fields, indicating that LLM judges are first-class prompt artifacts in the system. Source: ui/packages/evidently-ui-lib/src/components/Descriptors/_utils/utils.tsx.
Traces and agentic evaluation. Traces are composed of spans and can be viewed as a table with token usage, cost, guardrail indicators, and per-trace actions. The TraceTable component hides the guardrails column when no spans carry guardrail data, demonstrating UI-side schema awareness. Source: ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/TraceTable.tsx. Token and cost extraction is centralized in the UsageData helper. Source: ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/UsageData.tsx.
Prompt Registry. The backend exposes endpoints to list, fetch, and create prompt artifacts. The prompts.py module converts between PromptVersion domain objects and the generic Artifact/ArtifactVersion storage model, and a get_prompt_by_name route resolves prompts within a project scope. Source: src/evidently/ui/service/api/prompts.py. The frontend mirrors this with a PromptInfoHeader that shows prompt ID, version chip, and a ToggleViewEdit mode switcher. Source: ui/packages/evidently-ui-lib/src/components/Prompts/PromptInfoHeader.tsx.
Frontend Stack and Data Contracts
The UI is a TypeScript/React monorepo. The service package is a Vite-built SPA depending on the shared evidently-ui-lib workspace, React 18, dayjs, and tiny-invariant. It uses @vitejs/plugin-react-swc and Playwright for testing, with Biome for linting. Source: ui/service/package.json.
Frontend data contracts are generated from the backend's OpenAPI schema. api/types/index.ts re-exports paths and components from ~/api/types/endpoints and derives higher-level TypeScript aliases such as ProjectModel, ReportModel, DashboardInfoModel, and DatasetFilter, which are consumed by widgets like Widget.tsx (the generic dashboard widget that renders title, content, alerts, and insights). Source: ui/packages/evidently-ui-lib/src/api/types/index.ts; ui/packages/evidently-ui-lib/src/widgets/Widget.tsx.
Known Architectural Concerns
Several open issues reflect real architectural boundaries users should be aware of. The "legacy" metrics module under evidently.legacy.metrics is not yet ported to the new Report API, so users with custom regression metrics must wait for parity work tracked in issue #1805. Descriptor plots generated by LLMEval are not always populated when logged through Tests, as documented in issue #1292. The self-hosted UI's dataset materialization endpoint has a path traversal vulnerability that was disclosed in issue #1887, and the framework currently pins sentence-transformers to 5.3.0 due to a SemanticSimilarity regression in later versions (issue #1888). Test suites attached to text descriptors do not always surface in the monitoring dashboard (issue #731), and Python 3.13 support is gated on pydantic v1 removal (issue #1612).
See Also
- Reports and Metrics
- Test Suites and Pass/Fail Conditions
- Descriptors and LLM Judges
- Prompt Registry
- Self-Hosted UI Service
- API Reference Generation
Source: https://github.com/evidentlyai/evidently / Human Manual
Core Evaluation Engine: Reports, Metrics, Presets, and Datasets
Related topics: Overview and System Architecture, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview and System Architecture, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
Core Evaluation Engine: Reports, Metrics, Presets, and Datasets
Purpose and Scope
Evidently is an open-source Python library for evaluating, testing, and monitoring ML and LLM systems, from offline experiments to production. The Core Evaluation Engine is the layer that turns raw tabular or text data into structured, visualizable results. According to README.md, the engine is intentionally modular and exposes four primary abstractions:
| Abstraction | Role |
|---|---|
| Report | A container that aggregates one or more metrics and renders their outputs. |
| Metric | An individual computation (e.g., drift, data quality, LLM judge). |
| Preset | A pre-bundled set of metrics tuned for a common task (regression, classification, RAG, etc.). |
| Dataset | The typed input view describing columns, embeddings, text fields, and target roles. |
The library ships 100+ built-in metrics, supports both predictive and generative workloads, and produces outputs that can be viewed in Python, exported as JSON / HTML, or surfaced in the self-hosted UI. Source: README.md.
Architecture and Data Flow
The engine follows a layered design: datasets flow into metrics, metrics are grouped into Reports, and Reports optionally attach pass/fail conditions to become Test Suites. The same outputs can be visualized through widgets in the React-based UI service.
flowchart LR
A[Dataset / DataFrame] --> B[Column Mapping]
B --> C[Metric 1]
B --> D[Metric 2]
B --> E[Preset Bundle]
C --> F[Report]
D --> F
E --> F
F --> G{Add pass/fail?}
G -- yes --> H[Test Suite]
G -- no --> I[Interactive Report]
H --> J[Monitoring UI / JSON / HTML]
I --> JThe UI side is built as a TypeScript/React workspace. The service package depends on the shared evidently-ui-lib and React, and exposes the dashboard that lists projects, snapshots, dashboards, test suites, and traces (see ui/service/package.json). Backend endpoints for projects, snapshots, dashboards, and reports are wired in the legacy API module src/evidently/legacy/ui/api/projects.py, which exposes both read-only and guarded (auth-required) routes.
Core Abstractions in Detail
Reports
A Report is the top-level entry point. It accepts a list of metrics (or presets) and produces a structured document. Reports are best suited for exploratory analysis, debugging, and experiments. They can be turned into Test Suites by adding pass/fail conditions (gt, lt, and similar comparators). The README states that auto-generated conditions from a reference dataset are supported with "zero setup". Source: README.md.
Example pattern (from examples/cookbook/README.md):
from evidently import Report
report = Report(metrics=[...])
result = report.run(reference_data=ref, current_data=cur)
Metrics
Metrics are the atomic units of evaluation. They range from data drift and data quality checks to LLM-as-a-judge descriptors. The metric system is pluggable — users can write custom metrics through the Python interface. The widget layer in the UI consumes RichDataParams, BigTableWidgetParams, and TestDataInfo shapes defined in ui/packages/evidently-ui-lib/src/api/index.tsx, including a TestState enum of 'unknown' | 'error' | 'success' | 'warning' | 'fail'. This enum drives how test rows are rendered by ui/packages/evidently-ui-lib/src/widgets/TestSuiteWidget/TestData.tsx, which maps states to MUI AlertColor severities.
Text-only widgets are rendered through a Markdown component defined in ui/packages/evidently-ui-lib/src/widgets/TextWidgetContent.tsx.
Presets
Presets are opinionated bundles of metrics that target a specific ML/LLM use case. The cookbook exposes presets like regression_preset (see examples/cookbook/regression_preset.ipynb) and recsys_metrics. The intended pattern is to start with a preset and then narrow down to individual metrics for custom workflows. Source: examples/cookbook/README.md.
Datasets
The dataset layer normalizes input DataFrames into typed columns (numerical, categorical, text, embedding, target, prediction). A ColumnMapping declares which columns fall into each role, which is then consumed by every metric. The top-level examples/README.md recommends three starter notebooks: agentic tracing, LLM input/output validation, and classic ML validation — each demonstrates a different dataset configuration.
Community-Reported Pain Points
Several recurring issues surfaced in the community and are worth understanding before adopting the engine:
- Legacy to new Report API migration — Metrics such as
RegressionErrorBiasTable,RegressionPredictedVsActualScatter, andRegressionTopErrorMetricstill live underevidently.legacy.metricsand are not bundled by the modernRegressionPreset(issue #1805). Users must explicitly include them when migrating older workflows.
- LLMEval descriptors not plottable from Tests — The descriptor plot on the Descriptors tab can render empty when tests like
TestShareOfOutRangeValuesorTestCategoryCountare logged to a Project (issue #1292). The UI widget for descriptors relies on the structure defined in ui/packages/evidently-ui-lib/src/components/Descriptors/_utils/utils.tsx, so missing plot data usually points to a metric payload gap rather than a UI bug.
- SemanticSimilarity dependency pin —
SemanticSimilarityfails withsentence-transformers> 5.3.0 (issue #1888). Pin the dependency or wait for a compatibility fix.
- Python 3.13 support — Imports break on 3.13 because of
pydantic.v1references insrc/evidently/__init__.py(issue #1612). Stay on 3.12 unless this is resolved.
- Plotly deprecation — The engine still imports the deprecated
plotly.graph_objsalias (issue #1884); downstream code should migrate toplotly.graph_objects.
- Plot scale customization — There is no built-in option to switch Y-axis to percentages or rescale distribution plots (issue #700); custom rendering is required.
- Test suite visibility in monitoring — Test suites built with text descriptors do not always appear in the monitoring dashboard (issue #731), typically because of column-mapping gaps in
text_features.
- Security advisory — An unauthenticated path traversal in the UI dataset-materialization endpoint was disclosed (issue #1887); self-hosted deployments must upgrade past the fix.
Common Failure Modes
- Empty dashboards — usually caused by an incomplete
ColumnMapping, not by the engine itself (see issue #731). - Crashes on import — check Python version and pydantic compatibility (issue #1612).
- Missing preset metrics — confirm whether you are on the legacy or new Report API (issue #1805).
- Silent plot gaps — descriptor payloads can be incomplete when invoked through tests (issue #1292).
See Also
Source: https://github.com/evidentlyai/evidently / Human Manual
LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
Related topics: Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, UI Service, Storage Backends, and Deployment
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, UI Service, Storage Backends, and Deployment
LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
Overview
Evidently positions itself as an open-source framework for evaluating, testing, and monitoring ML and LLM-powered systems. The library spans tabular and text data, supports both predictive and generative tasks, and exposes 100+ built-in metrics that range from data drift detection to LLM judges. Source: README.md:9-20. Descriptors, prompts, RAG utilities, and guardrails are the four LLM-focused subsystems that make it possible to score free-form model outputs, version control instructions, ground generation in retrieved context, and block unsafe traffic.
The framework is modular, so the LLM subsystems can be combined with classical ML validation flows or used standalone. The repository ships a dedicated cookbook to make these subsystems approachable:
| Cookbook Notebook | Subsystem |
|---|---|
descriptors.ipynb | Descriptors and feature analysis |
guardrails.ipynb | LLM application guardrails |
prompt_registry.ipynb | Storing and versioning prompts |
prompt_optimization_*.ipynb | Prompt optimization workflows |
recsys_metrics.ipynb | Recommendation metrics |
Source: examples/cookbook/README.md:7-29.
Descriptors and LLM Evaluation
Descriptors are reusable column-level scorers that turn raw text columns into evaluable metrics. The cookbook ships a dedicated descriptors.ipynb example, and the UI ships a Descriptors/_utils module that contains factories such as makeEmptyLLMJudgeDescriptorTemplate for constructing empty judge templates with target/non-target categories and an uncertainty placeholder. Source: ui/packages/evidently-ui-lib/src/components/Descriptors/_utils/utils.tsx:5-13.
Common descriptor use cases include:
- Built-in text descriptors — e.g.
TextLength()— for sanity checks on input and output sizes. - Semantic similarity — comparing model output against a reference column via sentence embeddings.
- Custom
LLMEvaljudges — calling an LLM with a templated prompt to classify text into target/non-target categories.
A community-reported failure mode is that SemanticSimilarity breaks on sentence-transformers > 5.3.0; pinning the dependency to 5.3.0 is the current workaround. Source: Issue #1888. A related tutorial bug describes the corrected Report API usage:
text_evals_report = Report(metrics=[
TextEvals(column_name="response", descriptors=[
SemanticSimilarity(with_column="question", display_name="Semantic Similarity")
])
])
Source: Issue #1367.
Descriptors integrate with Reports and Test Suites. They can also be plottable, but a known UX gap is that descriptors like TextLength() or custom LLMEval() outputs do not populate the Descriptors tab in the monitoring UI when wrapped in a TestShareOfOutRangeValues or TestCategoryCount Test. Source: Issue #1292.
Prompts and Prompt Registry
The Prompt Registry stores and versions LLM instructions so that experiments and production prompts can be tracked, reused, and audited. The backend exposes CRUD endpoints under prompts.py, which list, fetch, and create prompts via an ArtifactManager. The route GET /by-name/{name:str} resolves prompts by name within a project. Source: src/evidently/ui/service/api/prompts.py:69-95.
Prompt versioning uses ArtifactVersion records that wrap a typed PromptContent payload. Three content types are supported by the viewer, dispatched on the type discriminator:
Content type | Rendered as |
|---|---|
evidently:prompt_content:TextPromptContent | Single Prompt text block |
evidently:prompt_content:MessagesPromptContent | Stack of role-tagged Prompt messages |
evidently:prompt_content:TemplatePromptContent | LLMJudgeTemplateViewer for judge templates |
Source: ui/packages/evidently-ui-lib/src/components/Prompts/Versions/View/index.tsx:14-41. The prompt header surfaces the prompt ID and a chip-styled version number (#${promptVersion.version}), and toggles between view and edit modes. Source: ui/packages/evidently-ui-lib/src/components/Prompts/PromptInfoHeader.tsx:24-32.
A first-version creation form lets users choose between a plain text-messages layout and an LLMJudge template, validating fields such as category names and uncertainty before submit. Source: ui/packages/evidently-ui-lib/src/components/Prompts/Versions/Forms/CreateFirstPromptVersionForm.tsx:21-43.
RAG, Guardrails, and Tracing
The cookbook includes a guardrails.ipynb notebook dedicated to LLM guardrails. Descriptors such as context relevance feed into RAG evaluation, and tracing captures token usage, costs, and span-level metadata for multi-step agentic workflows.
Token usage and cost are computed per trace and surfaced with a tooltip breakdown in the UI:
const totalTokens = Array.from(extractUsageData.entries())
.reduce((acc, it) => acc + it[1][0], 0)
const totalCost = Array.from(extractUsageData.entries())
.reduce((acc, it) => acc + it[1][1], 0)
Source: ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/UsageData.tsx:9-14. Guardrail hits and other metadata are extracted from span attributes and rendered as chips in the trace table. Source: ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/TraceTable.tsx:43-62.
For chat-style traces, the SessionCardContent component selects an input span and output span by attribute name (SplitField), then renders the user and agent messages. Source: ui/packages/evidently-ui-lib/src/components/Traces/DialogViewer/components/SessionCardContent.tsx:11-26.
flowchart LR A[Input text columns] --> B[Descriptors] B --> C[Report / TestSuite] C --> D[Evidently UI] P[Prompt Registry] --> J[LLMEval descriptors] J --> C R[RAG traces] --> G[Guardrail descriptors] G --> C T[Token / cost data] --> D
Known Issues and Community Notes
Several recurring pain points affect the LLM subsystems:
- Sentence-transformers compatibility.
SemanticSimilarityfails onsentence-transformers > 5.3.0; pin to 5.3.0 as a workaround. Source: Issue #1888. - Plots from Test Suites. Descriptor plots are not populated when wrapping an
LLMEvaldescriptor in a Test that is then logged to a Project. Source: Issue #1292. - Test suite visibility. Test suites created against text columns with text descriptor functions sometimes do not appear in the monitoring dashboard. Source: Issue #731.
- Plotly deprecation. The codebase still imports from
plotly.graph_objs, which is fully aliased tograph_objectsupstream; switch toplotly.graph_objectsto avoid deprecation warnings. Source: Issue #1884. - Python 3.13. Python 3.12 works, but 3.13 currently raises a
pydantic.v1-related error during import. Source: Issue #1612. - Plot axis scales. Distribution plots do not yet expose a percentage-based Y-axis override. Source: Issue #700.
See Also
- README.md — High-level project overview and feature list.
- examples/README.md — Entry points for tutorials, cookbook, and service examples.
- examples/cookbook/README.md — Focused notebooks for descriptors, guardrails, and prompt workflows.
- api-reference/README.md — Instructions for generating API reference docs with
pdoc.
Source: https://github.com/evidentlyai/evidently / Human Manual
UI Service, Storage Backends, and Deployment
Related topics: Overview and System Architecture, Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview and System Architecture, Core Evaluation Engine: Reports, Metrics, Presets, and Datasets, LLM Evaluation, Descriptors, Prompts, RAG, and Guardrails
UI Service, Storage Backends, and Deployment
The Evidently UI is the operational front-end for an open-source ML and LLM evaluation framework. It is shipped as a React/TypeScript application and split across a workspace containing two deployable surfaces: a long-running monitoring service and a smaller standalone package used for embedding rendered widgets (for example, inside Jupyter notebooks). The UI consumes a typed OpenAPI contract that mirrors the backend schema, and renders Projects, Reports, Dashboards, Datasets, Traces, Prompt versions, and Tests widgets on top of that contract. Source: README.md:1-19, ui/README.md:1-9.
1. UI Repository Layout and Workspace
The ui/ folder is a pnpm v9 workspace and requires Node.js 20 or higher. Two projects live inside it: service, the full monitoring UI, and standalone, an embeddable widget surface. Source: ui/README.md:1-13.
| Package | Purpose | Key dependency |
|---|---|---|
service | Monitoring UI served by Vite | evidently-ui-lib (workspace) |
standalone | Embedded widgets (e.g. notebooks) | shares evidently-ui-lib |
evidently-ui-lib | Shared component/widget library | React 18, MUI, Plotly |
The shared library evidently-ui-lib is the only place where widget renderers, dashboard panels, and typed API models live, so the two surfaces never drift in behaviour. Source: ui/packages/evidently-ui-lib/README.md:1-3, ui/service/package.json:1-35.
The Vite-based build pipeline is configured per package; the service exposes dev, build, preview, and a sync-back script that re-fetches backend OpenAPI types via ./.github/scripts/get-types-from-back.sh. Source: ui/service/package.json:5-20.
2. Typed API Surface and Widget Rendering
The frontend is generated from the backend OpenAPI schema. src/api/types/index.ts re-exports paths and components['schemas'] from auto-generated endpoints, then derives domain aliases such as ProjectModel, ReportModel, DashboardModel, DatasetPaginationModel, and SeriesModel. This means every UI prop is type-checked against the live backend contract and changes in the Python API surface automatically propagate to TypeScript. Source: ui/packages/evidently-ui-lib/src/api/types/index.ts:1-30.
The widget rendering layer is a small discriminated union. WidgetRenderer.tsx switches on info.type and dispatches to the right content component (list, text, test_suite, etc.), wrapping each in a Widget card that supports alerts and insights. Source: ui/packages/evidently-ui-lib/src/widgets/WidgetRenderer.tsx:1-30, ui/packages/evidently-ui-lib/src/widgets/Widget.tsx:1-40.
Markdown-rich text panels are rendered through TextWidgetContent, which uses react-markdown to render arbitrary Markdown supplied by the backend (used for documentation-style panels in dashboards). Source: ui/packages/evidently-ui-lib/src/widgets/TextWidgetContent.tsx:1-13.
3. Domain Modules: Projects, Datasets, Traces, Prompts, Dashboards
The library exposes feature-grouped component folders that match the backend entities:
- Datasets –
components/Datasets/hooks.tsxbuilds Material-UI DataGrid columns from theDatasetPaginationschema, mapping column types (string,datetime,prompt_result) to specialized cells/editors, and supports row virtualization viagetRowIdGeneratedOnFront. Source: ui/packages/evidently-ui-lib/src/components/Datasets/hooks.tsx:1-60. - Traces –
TraceTable.tsxandUsageData.tsxrender per-trace metrics (duration, token usage, cost) and link to a per-trace dialog viewer (SessionCardContent.tsx,Message.tsx) that flattens spans into user/agent messages. Guardrails data is extracted from spans and shown as a per-trace button when any span contains guardrail attributes. Source: ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/TraceTable.tsx:1-30, ui/packages/evidently-ui-lib/src/components/Traces/TraceViewer/components/UsageData.tsx:1-30, ui/packages/evidently-ui-lib/src/components/Traces/DialogViewer/components/SessionCardContent.tsx:1-30. - Prompts –
PromptInfoHeader.tsxshows the prompt ID and version chip with a copy-to-clipboard helper, whileCreateFirstPromptVersionForm.tsxbuilds a typed form for the first prompt version, validating viauseCustomFormValidatorand supporting two template variants (text-messages,judge). Source: ui/packages/evidently-ui-lib/src/components/Prompts/PromptInfoHeader.tsx:1-30, ui/packages/evidently-ui-lib/src/components/Prompts/Versions/Forms/CreateFirstPromptVersionForm.tsx:1-40. - Dashboard panels –
Dashboard/Panels/implementations/Text.tsxdefines a typed panel descriptor with a discriminatedtype: 'text'and asize: 'full' | 'half'flag, rendered throughPanelCardGeneral. Source: ui/packages/evidently-ui-lib/src/components/Dashboard/Panels/implementations/Text.tsx:1-15.
4. Deployment, Tooling, and Operational Caveats
Local development of the service uses Vite with pnpm dev; CI-style code checks run through Biome for formatting, import sorting, and linting (pnpm code-check or pnpm code-check --fix). The recommended VS Code setup points editor.defaultFormatter at Biome and turns on format-on-save. Source: ui/README.md:13-39.
The Python library and the optional self-hosted UI/collector service ship together on PyPI. A self-hosted operator wiring up the collector via API has reported missing-key errors during config creation — a frequent blocker when bringing up the service against a fresh backend, see issue #1413. Test suites built from text descriptors do not always surface in the monitoring dashboard when added via TestSuite(tests=[...]); this is tracked in issue #731. Custom LLMEval descriptors also do not plot on the Descriptors tab when used inside TestCategoryCount/TestShareOfOutRangeValues, see issue #1292. Plot Y-axis scaling cannot yet be overridden in distribution plots — see issue #700. Finally, an unauthenticated path-traversal bug in dataset materialization was disclosed in issue #1887; self-hosted operators should pin to a patched release before exposing the service publicly. For users who want to embed widgets rather than run the full service, standalone is the recommended distribution channel. Source: ui/README.md:1-9, README.md:1-25.
See Also
- README.md – top-level project overview
- examples/README.md – tutorial and cookbook index
- api-reference/README.md – pdoc-based API reference generator
- ui/README.md – UI workspace, Biome setup, and service run instructions
Source: https://github.com/evidentlyai/evidently / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 20 structured pitfall item(s), including 9 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/670
2. Configuration risk: Configuration risk requires verification
- Severity: high
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/491
3. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1367
4. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1805
5. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1292
6. Maintenance risk: Maintenance risk requires verification
- Severity: high
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/686
7. Maintenance risk: Maintenance risk requires verification
- Severity: high
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1884
8. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1879
9. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1888
10. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1413
11. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/evidentlyai/evidently/issues/1612
12. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/evidentlyai/evidently
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using evidently with real data or production workflows.
- SemanticSimilarity fails with sentence-transformers > 5.3.0 - github / github_issue
- Make
LLMEvaldescriptors plottable from Tests - github / github_issue - Legacy metrics to new Report API - github / github_issue
- Unauthenticated path traversal arbitrary file read in Evidently UI datas - github / github_issue
- Plotly Graph Objects - Deprecated module is in use. - github / github_issue
- Protect this repo from AI-generated PRs - github / github_issue
- Fix semantic similarity in LLM eval tutorial - github / github_issue
- The fixed value for feel_zeroes in get_binned_data may lead to deviation - github / github_issue
- Error when trying to create collector config in self-hosted environment - github / github_issue
- python 3.13 support - github / github_issue
- Modify scales of plots generated in report - github / github_issue
- Installation risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence