Doramagic Project Pack · Human Manual

opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

What is Opik? Platform Overview and Key Capabilities

Related topics: System Architecture and Repository Layout, Deployment, Self-Hosting, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 2.1 Tracing and Observability

Continue reading this section for the full explanation and source context.

Section 2.2 Evaluation

Continue reading this section for the full explanation and source context.

Section 2.3 Production Monitoring

Continue reading this section for the full explanation and source context.

Related topics: System Architecture and Repository Layout, Deployment, Self-Hosting, and Operations

What is Opik? Platform Overview and Key Capabilities

1. Platform Purpose and Scope

Opik is an open-source platform built by Comet that streamlines the entire lifecycle of LLM (Large Language Model) applications. It is designed to help developers evaluate, test, monitor, and optimize LLM models and agentic systems, both during development and in production. Source: README.md:5-13.

The platform addresses common pain points when building LLM-powered systems — lack of visibility into model behavior, difficulty measuring quality, and operational blind spots in production traffic. Opik bundles tracing, evaluation, online monitoring, prompt optimization, and safety guardrails into a single ecosystem.

The project is organized as a monorepo containing:

  • apps/opik-backend — Java-based backend service (see apps/opik-backend/README.md).
  • apps/opik-frontend — React + TypeScript web UI built on shadcn/ui, Radix, and Zustand (see apps/opik-frontend/README.md:3-44).
  • apps/opik-documentation — User-facing docs and integration templates.
  • sdks/python — Python SDK distributed on PyPI.
  • sdks/typescript — TypeScript/JavaScript SDK with first-class Node.js support.
  • sdks/opik_optimizer — Standalone prompt and agent optimization package (see sdks/opik_optimizer/README.md).

2. Core Capabilities

The platform groups its capabilities into five areas:

2.1 Tracing and Observability

Opik provides deep tracing of LLM calls, conversation turns, and agent activity. The Python SDK exposes a @opik.track decorator and track_*() wrappers that automatically capture inputs, outputs, latency, and token usage (see sdks/python/design/README.md:5-17). Batched, asynchronous message processing ensures tracing does not block application code. Source: README.md:14-22.

The TypeScript SDK mirrors this with a track decorator, a native OpenTelemetry exporter, and a batch-queue client (see sdks/typescript/design/README.md:5-17).

2.2 Evaluation

Opik supports prompt evaluation, LLM-as-a-judge scoring, and experiment management. The Python design docs describe four evaluation methods and a metrics architecture that separates metric definitions from evaluation engines (see sdks/python/design/README.md:5-9). TypeScript support includes prompt evaluation flows as well (see sdks/typescript/design/README.md:5-17).

2.3 Production Monitoring

Production-grade dashboards expose LLM-as-a-Judge metrics and online evaluation rules so teams can identify issues in live traffic. Source: README.md:20-22.

2.4 Opik Agent Optimizer

A dedicated SDK (opik-optimizer) that enhances prompts and agents. It uses LiteLLM under the hood, which means any provider supported by LiteLLM can be used as an optimization target. Source: sdks/opik_optimizer/README.md:11-44.

2.5 Opik Guardrails

Built-in features that help teams implement safe and responsible AI practices, particularly relevant for production deployments. Source: README.md:24-26.

3. SDKs, Integrations, and Configuration

Opik ships with first-party Python and TypeScript SDKs and a growing library of integrations.

The TypeScript SDK environment variables are documented in sdks/typescript/src/opik/configure/src/lib/env-constants.ts:1-26. The four required variables are summarized below.

VariablePurpose
OPIK_API_KEYAPI key for authentication against the Opik server.
OPIK_URL_OVERRIDEBase URL for the Opik API (Cloud or self-hosted).
OPIK_WORKSPACEWorkspace name.
OPIK_PROJECT_NAMEProject name for organizing traces (default: "Default Project").

A companion CLI (npx opik-ts configure) interactively writes these values to the user's environment, including a --use-local flag for local development. Source: sdks/typescript/design/README.md:30-40.

Integrations are split into first-party packages with their own npm artifacts. The current official set includes:

For Python, the integration documentation templates() define a decision matrix covering four patterns: code-based wrappers, OpenAI-compatible wrappers, LiteLLM callbacks, and pure OpenTelemetry. This matrix is the canonical reference for new integrations.

4. Deployment and High-Level Architecture

Opik can be deployed in three ways, summarized in the diagram below.

flowchart LR
    A[LLM Application] --> B[Opik SDK<br/>Python or TypeScript]
    B --> C[Opik Backend<br/>Java service]
    C --> D[(Storage<br/>ClickHouse + MySQL)]
    C --> E[Opik Frontend<br/>React UI]
    F[Comet.com Cloud] -.alternative.-> A
    A -.uses.-> G[opik-optimizer<br/>Prompt optimization]
    A -.uses.-> H[Guardrails<br/>Safety checks]
  • Cloud (managed) — Recommended path; sign up at comet.com. Source: README.md:32-34.
  • Self-hosted Docker Compose./opik.sh script brings up the full stack for local development. Source: README.md:36-46.
  • Kubernetes — For production-scale self-hosting (referenced in the same installation section).

The backend exposes a REST API consumed by the frontend and the SDKs; SDKs batch events locally before flushing, which keeps instrumentation overhead minimal (see sdks/python/design/README.md:5-17 and sdks/typescript/design/README.md:5-17).

5. Community Context and Known Gaps

Several frequently requested features shape the platform's roadmap:

  • Authentication and Authorization — A community proposal (#949) requests an auth layer in front of the open-source deployment similar to Langfuse, indicating that production users want self-hosted Opik to be secure by default.
  • n8n integration — A widely followed request (#1587) calls for native support in the n8n workflow automation tool, which would let non-engineers route prompts and completions through Opik without code changes.
  • Project name propagation bug — Issue #420 shows that project_name is not correctly forwarded to the @opik.track decorator when used inside opik.evaluation.evaluate, an SDK/UI interaction gap that is worth understanding when designing evaluation pipelines.
  • Cost-tracking coverage — Issue #4507 requests additional entries in model_prices_and_context_window.json (e.g., qwen/qwen3-235b-a22b-2507 via OpenRouter) so cost analytics stay accurate for new providers.

The latest release referenced in these discussions is 2.0.73, which also surfaces OTel span errors, tool outputs, and Google cost data for Pydantic AI (see README.md release notes block).

See Also

Source: https://github.com/comet-ml/opik / Human Manual

System Architecture and Repository Layout

Related topics: What is Opik? Platform Overview and Key Capabilities, Tracing, Spans, and Framework Integrations, Frontend Application (React/TypeScript)

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Backend

Continue reading this section for the full explanation and source context.

Section Frontend

Continue reading this section for the full explanation and source context.

Section Documentation

Continue reading this section for the full explanation and source context.

Related topics: What is Opik? Platform Overview and Key Capabilities, Tracing, Spans, and Framework Integrations, Frontend Application (React/TypeScript)

System Architecture and Repository Layout

Overview and Scope

The comet-ml/opik repository is a polyglot monorepo that delivers Opik — an open-source LLM evaluation, tracing, and monitoring platform developed by Comet. It ships several independently versioned components under a single source tree: a Java backend service, a React-based frontend, a documentation application, a guardrails microservice, plus Python and TypeScript SDKs and their respective integration packages. The repository's self-describing READMEs emphasize that the project is organized so contributors can navigate to the specific application or SDK they need to modify without coupling concerns across languages.

The release cadence is tracked centrally — the most recent published version visible in the community changelog is 2.0.73, which bundles backend fixes, SDK improvements, and OpenAPI/Fern code generation automation in a single release. Source: scripts/README.md (release notes summary).

Top-Level Repository Layout

The root of the repository is divided into four primary top-level directories that map cleanly to deliverable surfaces. Each top-level area contains its own README pointing contributors to a CONTRIBUTING.md guide.

DirectoryPurposeSource
apps/Deployable services (backend, frontend, documentation, guardrails)apps/opik-backend/README.md, apps/opik-frontend/README.md
sdks/Client libraries (Python, TypeScript) and third-party integrationssdks/python/README.md, sdks/typescript/design/README.md
scripts/Repository-level tooling: OpenAPI generation, dev-runner, codex syncscripts/README.md
tests_end_to_end/Browser-level Playwright tests with agentic planning/healer toolingtests_end_to_end/typescript-tests/README.md

The architecture is intentionally layered so that cross-cutting concerns (such as the API contract and generated client code) are owned by scripts/generate_openapi.sh and consumed by both the SDKs and the documentation app. Source: scripts/README.md.

Application Services (`apps/`)

The apps/ directory hosts the runtime surfaces that compose the Opik platform.

Backend

apps/opik-backend is the Java service that powers Opik's API. Its README defers contribution guidance to CONTRIBUTING.md, and the surrounding tooling — including Liquibase migrations and the Dockerfile referenced in the build pipeline — establishes it as the canonical server implementation. The backend exposes the OpenAPI specification that downstream SDKs are generated against. Source: apps/opik-backend/README.md.

Frontend

apps/opik-frontend is a React/TypeScript SPA built around Zustand stores and a shadcn/ui + Radix component layer. Notably, the frontend ships two parallel navigation generations under a single entry point:

  • src/v1/ — Opik 1: feature-organized navigation (layout/, pages/, pages-shared/).
  • src/v2/ — Opik 2: project-first navigation with the same substructure.

Both versions share a common import-direction rule: ui → shared → v1/pages-shared → v1/pages (and analogously for v2). The router in src/router.tsx and entry point src/index.tsx orchestrate both. Source: apps/opik-frontend/README.md. Each version also has its own lib/utils.ts that prefixes documentation URLs with /v1 or /v2 respectively, indicating the docs site mirrors the same versioning. Source: apps/opik-frontend/src/v1/lib/utils.ts, apps/opik-frontend/src/v2/lib/utils.ts.

Documentation

apps/opik-documentation is a Fern-based docs site. A templates subdirectory formalizes four integration documentation archetypes (Code, OpenAI-Based, LiteLLM, OpenTelemetry) and prescribes screenshot placement under fern/img/tracing/. Source: apps/opik-documentation/documentation/templates/README.md.

Guardrails Backend

apps/opik-guardrails-backend is a separate microservice exposing a /guardrails endpoint for TOPIC and PII validation. It accepts a JSON payload of validations and returns a structured result, with entity/topic thresholds configurable per request. Source: apps/opik-guardrails-backend/README.md.

SDKs and Integration Packages (`sdks/`)

Python SDK

sdks/python/ ships the opik PyPI package. The Python SDK design is documented in sdks/python/design/README.md, which exposes four contributor guides: *API and Data Flow*, *Testing*, *Integrations*, and *Evaluation*. Each describes a 3-layer architecture with sync vs. async paths and batched message processing. Source: sdks/python/design/README.md.

TypeScript SDK and Integrations

The TypeScript SDK mirrors the Python design, with guides for *API and Data Flow*, *Tracing*, *Testing*, *Integrations*, and *Evaluation*. Under sdks/typescript/src/opik/integrations/, each integration is published as a standalone npm package with explicit peer-dependency pinning:

IntegrationPeer DependencySource
opik-openaiopenai: ^6.0.1, opik: ^1.8.61package.json
opik-langchain`@langchain/core: ^0.3.78 \\^1.0.0, opik: ^1.8.75`package.json
opik-gemini@google/genai: >=1.0.0, opik: ^1.7.25package.json
opik-configure(CLI; depends on axios, inquirer, magicast, posthog-node)package.json

Each integration package bundles its own build/lint/test scripts via tsup, eslint, and vitest, allowing them to ship independently. Source: sdks/typescript/src/opik/integrations/opik-openai/package.json. Community requests such as #1587 (n8n support) and #4507 (additional model cost entries) demonstrate ongoing demand for new integration packages and provider cost data. Source: [community context — issues #1587, #4507].

Supporting Tooling

Repository Scripts

scripts/dev-runner.sh provides a Docker-backed local development orchestrator supporting --restart, --backend, --frontend, and related subcommands for managing infrastructure, backend, and frontend processes. scripts/generate_openapi.sh and scripts/start_openapi_server.sh keep the OpenAPI specification (used by the SDKs via Fern) and a Redoc preview server in lockstep. scripts/sync-codex.sh syncs .agents/rules/*.mdc into a Codex-friendly AGENTS.override.md. Source: scripts/README.md.

End-to-End Testing

tests_end_to_end/typescript-tests/ contains a Playwright-based suite augmented with two AI agents: a Planner that turns feature specs into *.spec.ts files (saving under tests/{feature-area}/{test-name}.spec.ts) and a Healer that retries and patches failing tests. The generated tests use the same fixtures and page-object patterns as the manual suite. Source: tests_end_to_end/typescript-tests/README.md.

High-Level Architecture

flowchart LR
    subgraph Clients
        PySDK["Python SDK<br/>(sdks/python)"]
        TSSDK["TypeScript SDK<br/>(sdks/typescript)"]
        Integ["Integrations<br/>opik-openai / opik-langchain / opik-gemini"]
    end
    subgraph Apps["apps/"]
        Backend["opik-backend<br/>(Java API)"]
        Frontend["opik-frontend<br/>(v1 + v2 SPA)"]
        Docs["opik-documentation<br/>(Fern)"]
        Guards["opik-guardrails-backend"]
    end
    Scripts["scripts/<br/>OpenAPI · dev-runner"]
    E2E["tests_end_to_end/<br/>Playwright + AI agents"]
    PySDK --> Backend
    TSSDK --> Backend
    Integ --> TSSDK
    Frontend --> Backend
    Guards --> Backend
    Scripts -. generates .-> PySDK
    Scripts -. generates .-> TSSDK
    Scripts -. generates .-> Docs
    E2E --> Frontend
    E2E --> Backend

Common Cross-Cutting Concerns

  • API contract: The backend's OpenAPI spec is the single source of truth; both SDKs and documentation are regenerated from it via scripts/generate_openapi.sh. Source: scripts/README.md.
  • Versioning: Independent package versioning is evident in the peer-dependency pins (e.g., opik-openai requires opik ^1.8.61, while opik-gemini requires opik ^1.7.25). Source: sdks/typescript/src/opik/integrations/opik-openai/package.json.
  • Authentication: Currently, the open-source distribution does not include an authentication layer — community issue #949 explicitly requests one comparable to Langfuse. This is a known architectural gap worth tracking when evaluating deployment. Source: [community context — issue #949].
  • SDK evaluation API: Issue #420 reports that project_name cannot be passed through the @opik.track decorator when invoked inside opik.evaluation.evaluate, illustrating tight coupling between the tracing decorator and evaluation entry points in the Python SDK. Source: [community context — issue #420].

See Also

Source: https://github.com/comet-ml/opik / Human Manual

Tracing, Spans, and Framework Integrations

Related topics: System Architecture and Repository Layout, Evaluation, Datasets, Experiments, and LLM-as-Judge, Authentication, Authorization, and Workspaces

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: System Architecture and Repository Layout, Evaluation, Datasets, Experiments, and LLM-as-Judge, Authentication, Authorization, and Workspaces

Tracing, Spans, and Framework Integrations

Overview

Opik is an open-source LLM evaluation and observability platform by Comet that centers on tracing as its primary signal for understanding LLM applications. Every API call, retrieval step, tool invocation, and agent turn can be recorded as a hierarchical record composed of traces and spans, then visualized in the Opik UI for debugging and evaluation. The repository ships two parallel SDKs — Python and TypeScript — and a growing catalog of integrations that adopt different patching strategies depending on the target framework.

The integration matrix in the main README.md lists supported frameworks such as LangChain, LlamaIndex, OpenAI, Anthropic, Haystack, CrewAI, DSPy, Semantic Kernel, Strands Agents, Spring AI, Vercel AI SDK, and many others. A separate column for OpenTelemetry-based capture extends coverage to languages without a first-class SDK, and community requests (e.g. issue #1587 asking for n8n support) continue to expand the matrix.

The Trace and Span Data Model

A trace in Opik represents one end-to-end execution of a logical unit of work — typically one user request, one agent run, or one evaluation task. Spans are the children of a trace and represent a single operation inside it, such as an LLM call, a vector search, or a tool call. The Python SDK exposes @opik.track to instrument functions, while the TypeScript SDK exposes a track decorator backed by AsyncLocalStorage for correct async context propagation, as documented in sdks/typescript/design/TRACING.md.

Internally, the SDKs push trace and span events through asynchronous batch queues before flushing them to the Opik backend. The Python SDK design doc (sdks/python/design/README.md) emphasizes a three-layer architecture and a batching system tuned for throughput, while the TypeScript design doc describes the equivalent batch-queue pipeline. Both SDKs auto-generate their REST clients from the server's OpenAPI specification using Fern — see scripts/generate_openapi.sh — so any new trace or span field added to the backend becomes available to SDK users on the next regeneration cycle.

Framework Integration Patterns

Opik supports four distinct integration patterns, classified in apps/opik-documentation/documentation/templates/README.md:

PatternWhen usedExample
Code integrationUser modifies code; uses track_*() wrappersLangChain, CrewAI, DSPy, Haystack
OpenAI-compatible integrationTarget exposes an OpenAI-style clientBytePlus, OpenRouter, any OpenAI-compatible API
LiteLLM integrationProvider is supported by LiteLLM; uses OpikLogger callbackOpenAI, Anthropic, Groq, Fireworks, Cohere, Mistral, xAI Grok
OpenTelemetry integrationNo code changes; configured via OTEL env varsRuby SDK, Pydantic AI via Logfire, direct OTEL Python

The TypeScript SDK further distinguishes Proxy, Callback, and Exporter patterns in sdks/typescript/design/INTEGRATIONS.md. Each integration package is published as its own npm module — for example, sdks/typescript/src/opik/integrations/opik-gemini/package.json declares opik as a peer dependency and uses tsup to produce both ESM and CommonJS bundles, mirroring how every official integration is structured.

flowchart LR
    UserCode[User Code / Framework] -->|track decorator or callback| SDK[Opik SDK]
    SDK -->|batched events| Queue[Async Batch Queue]
    Queue -->|HTTP/gRPC| Backend[Opik Backend]
    OTEL[OpenTelemetry SDK] -->|OTLP| Backend
    Backend --> UI[Opik UI / API]
    UI --> Eval[Evaluation & Metrics]

OpenTelemetry and Cross-Language Capture

For runtimes where Opik does not yet provide a first-class SDK, the backend accepts OpenTelemetry traces via the OTLP endpoint configured through environment variables such as OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and OTEL_TRACES_EXPORTER. Recent releases (e.g. PR #7152, mentioned in the community changelog) improved surfacing of OTel span errors, tool outputs, and Google provider cost data for Pydantic AI users — illustrating that the OTEL path is treated as a first-class ingestion route rather than a fallback. The frontend's Nginx configuration (apps/opik-frontend/README.md) forwards traces to a dedicated otel-collector service, separating ingest from the application HTTP traffic.

Community-Driven Extensions

Several highly-discussed community issues shape the tracing roadmap. Issue #949 requests an authentication and authorization layer for the open-source deployment, which would protect trace data behind per-project access control. Issue #420 reports that project_name cannot be passed to the @opik.track decorator when used together with opik.evaluation.evaluate, causing traces to land in the wrong project — a notable failure mode for users who run evaluations inside scripts. Issue #4507 highlights the importance of model_prices_and_context_window.json, the central registry used by integrations to compute cost; missing entries produce zero-cost spans in the UI.

See Also

  • Evaluation and metrics
  • Python SDK architecture
  • TypeScript SDK architecture
  • Deployment and Docker infrastructure

Source: README.md:1-50, sdks/python/design/README.md:1-40, sdks/typescript/design/README.md:1-50, apps/opik-documentation/documentation/templates/README.md:1-40, apps/opik-frontend/README.md:1-60, scripts/README.md:1-30

Source: https://github.com/comet-ml/opik / Human Manual

Evaluation, Datasets, Experiments, and LLM-as-Judge

Related topics: Tracing, Spans, and Framework Integrations, Agent and Prompt Optimization (Opik Optimizer), Frontend Application (React/TypeScript)

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Tracing, Spans, and Framework Integrations, Agent and Prompt Optimization (Opik Optimizer), Frontend Application (React/TypeScript)

Evaluation, Datasets, Experiments, and LLM-as-Judge

Overview and Purpose

Opik is an open-source LLM evaluation and observability platform. The project exposes a unified system in which datasets hold representative test inputs, experiments record the output of running a candidate pipeline (or prompt) against those inputs, and evaluation functions score those outputs. The LLM-as-Judge pattern is one of the supported scoring strategies and is used when a metric is best approximated by another language model rather than deterministic logic.

The repository separates this concern across three layers:

  • A Python SDK that ships the user-facing evaluation API, dataset management, and metric base classes.
  • A TypeScript SDK that mirrors the core flow (dataset/, experiment/, evaluation/ modules) for browser and Node.js consumers.
  • A backend service and frontend that persist datasets, experiments, and feedback scores and render them in the UI.

Source: sdks/python/design/README.md. The Python design guide lists Evaluation.md as a dedicated contributor document covering the evaluation engine, the four evaluation methods, and the metrics architecture. Source: sdks/typescript/design/README.md — the TypeScript design guide does the same and points at a BaseMetric abstraction for adding new metrics.

The opik_optimizer package sits on top of this stack: it consumes the same dataset and metric primitives to run prompt-optimization algorithms (including GepaOptimizer) that rely on evaluation feedback to drive search. Source: sdks/opik_optimizer/README.md.

Datasets and Experiments

Datasets and experiments are the durable artefacts of the evaluation system. A dataset is a versioned collection of items (typically input/output pairs) used as test cases; an experiment is the result of executing a candidate pipeline, prompt, or model against a dataset and capturing the traces, outputs, and feedback scores that were produced.

In the Python SDK, the design guide calls out dataset/, experiment/, and evaluation/ as first-class submodules inside opik/, alongside tracer/, prompt/, and query/. Source: sdks/python/design/README.md. The TypeScript SDK mirrors this layout, grouping Dataset management, Experiment tracking, and Evaluation engine and metrics under their own directories. Source: sdks/typescript/design/README.md.

flowchart LR
    A[Dataset items] --> B[Experiment run]
    B --> C[Traces / Spans]
    C --> D[Metric scoring]
    D --> E[Feedback scores]
    E --> F[Experiment report]

The data flow is consistent across SDKs: items leave the dataset, are fed into a tracked function or model call, become traces and spans, are scored by one or more metrics, and finally surface as feedback scores attached to the experiment. Source: sdks/typescript/design/README.md — the TypeScript design notes that the tracer/ module owns Trace and Span objects that the evaluation layer reads from. Source: sdks/python/design/README.md.

A community-reported bug illustrates the close coupling between project names, the track decorator, and the evaluation entry point: passing project_name to the decorator while running opik.evaluation.evaluate was reported as broken in version 0.2.1 and was tracked in issue #420. The fix path lies in correctly threading project context through the evaluation entry point, not just the decorator.

Evaluation Methods and Metrics

Opik's evaluation engine supports multiple methods so users can choose the granularity that matches their pipeline. The Python design guide enumerates "all 4 evaluation methods" and a dedicated Metrics Architecture section. Source: sdks/python/design/README.md. Concretely, these methods typically include: scoring an entire experiment offline, scoring a single task invocation, scoring using a custom mapping function, and using a built-in LLM-as-Judge metric.

The metrics architecture is deliberately pluggable. In TypeScript, a BaseMetric class is the canonical extension point, and the design guide recommends reading the "Metrics Architecture, BaseMetric" section when adding a new scorer. Source: sdks/typescript/design/README.md. In Python, the equivalent is documented in the Evaluation.md contributor doc listed by the design guide. Source: sdks/python/design/README.md.

The optimizer package reinforces this design: it consumes the same ChatPrompt and metric objects that evaluation does, then uses metric output to drive prompt search. Source: sdks/opik_optimizer/README.md — its GepaOptimizer example runs an agent and a tool-equipped prompt against a dataset and a user-supplied metric, exactly the same contract evaluation uses.

LLM-as-Judge and Integration Templates

LLM-as-Judge is the metric pattern in which a language model is prompted with the candidate output (and optionally reference output) and asked to return a structured verdict — typically a score plus a reason. The Integration Documentation Templates in the documentation app make the available metric integration patterns explicit, which is relevant because a judge metric is itself implemented as an integration with a model provider. Source: apps/opik-documentation/documentation/templates/README.md.

The template matrix in that README defines four integration shapes that a judge metric can take:

Integration TypeJudge Pattern
Code IntegrationUser-supplied Python class implementing the metric interface, then registered with track_*() wrappers
OpenAI-Based IntegrationReuses track_openai() to score completions from any OpenAI-compatible endpoint
LiteLLM IntegrationRoutes the judge call through OpikLogger and LiteLLM's unified interface
OpenTelemetry IntegrationThe judge emits OTel spans consumed by Opik through the OTEL endpoint, requiring no code changes

Source: apps/opik-documentation/documentation/templates/README.md.

A note on community gaps: a frequent request in the project's issue tracker is to keep the upstream model_prices_and_context_window.json registry current (for example, to add openrouter/qwen/qwen3-235b-a22b-2507 with the Cerebras provider, issue #4507). LLM-as-Judge metrics inherit this dependency because accurate cost tracking requires the registry to know about the judge model. Source: issue #4507.

Operational Considerations

  • End-to-end coverage. The Playwright-based end-to-end test harness ships a "Planner / Generator / Healer" agent workflow that can produce tests for evaluation features such as project-metrics, dataset-upload, and experiment comparison. This means an evaluation change in the backend typically expects a matching E2E spec under tests/. Source: tests_end_to_end/typescript-tests/README.md.
  • Configuration tooling. The TypeScript SDK ships a configure CLI (invoked as npx opik-ts configure) used to bootstrap project setup, including any environment variables the evaluation engine reads. Source: sdks/typescript/src/opik/configure/package.json.
  • Backend persistence. Datasets, experiments, and feedback scores are persisted by the Java backend and surfaced through the React frontend; the frontend README groups project-level views under v1/pages and v2/pages, and a "project-first navigation" in v2 is the new entry point for evaluation dashboards. Source: apps/opik-frontend/README.md.

See Also

Source: https://github.com/comet-ml/opik / Human Manual

Agent and Prompt Optimization (Opik Optimizer)

Related topics: Evaluation, Datasets, Experiments, and LLM-as-Judge, Frontend Application (React/TypeScript)

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Evaluation, Datasets, Experiments, and LLM-as-Judge, Frontend Application (React/TypeScript)

Agent and Prompt Optimization (Opik Optimizer)

Overview and Purpose

The Opik Agent Optimizer is a dedicated Python package, distributed on PyPI as opik-optimizer, that refines prompts and LLM call parameters to improve model performance. It lives alongside the core Opik Python SDK and is described as a component of the broader Opik evaluation platform by Comet (sdks/opik_optimizer/README.md:1-7).

Its position in the Opik ecosystem is complementary to the tracing and evaluation features of the main SDK. The main Opik Python SDK focuses on capturing, tracing, and evaluating LLM calls (sdks/python/README.md:1-15), while the optimizer operates on prompts themselves — iteratively improving them through automated search strategies. The high-level platform goal, as expressed in the top-level README, is to help teams "build, evaluate, and optimize LLM systems that run better, faster, and cheaper," and the optimizer is the component that addresses the "optimize" portion of that workflow (README.md:1-30).

The optimizer exposes a standardized API, which is its central design principle: every algorithm in the package implements the same optimize_prompt() method, returns a standardized OptimizationResult object, and supports chaining, multimodal prompts (text, image, audio, video), Model Context Protocol (MCP) tool calling via ChatPrompt.tools, and built-in LLM/tool call counters (sdks/opik_optimizer/README.md:30-45). This uniformity means that swapping one algorithm for another is a single-line change.

Optimizer Algorithms

Six optimizer algorithms ship in the package, each targeting a different strategy for prompt improvement (sdks/opik_optimizer/README.md:9-20):

AlgorithmStrategy
EvolutionaryOptimizerGenetic algorithms for prompt evolution
FewShotBayesianOptimizerFew-shot learning combined with Bayesian optimization
GepaOptimizerGEPA (Genetic-Pareto) optimization approach
HRPOHierarchical root-cause analysis to refine prompts from synthesized failure modes
MetaPromptOptimizerMeta-prompting techniques
ParameterOptimizerBayesian optimization of LLM call parameters (e.g. temperature, top_p)

Because all six share the same optimize_prompt() interface, the user picks an algorithm based on the *kind* of change that is suspected to help — wording changes (meta-prompting, evolutionary), demonstration selection (few-shot Bayesian), parameter tuning (Bayesian on temperature/top-p), or root-cause-driven rewrites (HRPO) — rather than learning a new API per algorithm. The "Optimizer Chaining" feature lets the output of one optimizer become the input of the next, so practitioners can compose, for example, a MetaPromptOptimizer pass followed by a ParameterOptimizer pass (sdks/opik_optimizer/README.md:35-40).

Standardized API and Result Model

The optimize_prompt() method is the single entry point. Internally, it accepts a ChatPrompt (which can carry tools for MCP tool calling and multimodal content parts), runs the chosen search strategy, and returns an OptimizationResult (sdks/opik_optimizer/README.md:30-45). The result is consumed identically regardless of the algorithm used, which is what enables chaining and comparison.

LLM access is delegated to LiteLLM, so the optimizer inherits LiteLLM's broad provider support. Provider credentials are passed through standard environment variables such as OPENAI_API_KEY, configured before the optimizer is invoked (sdks/opik_optimizer/README.md:50-90). The TypeScript SDK's environment-constant module demonstrates the same pattern of standardized environment variable names — OPIK_API_KEY, OPIK_URL_OVERRIDE, OPIK_WORKSPACE, OPIK_PROJECT_NAME — which the Python ecosystem mirrors (sdks/typescript/src/opik/configure/src/lib/env-constants.ts:1-30).

A high-level data flow for a typical optimization run is:

flowchart LR
    A[Define ChatPrompt] --> B[Choose Optimizer]
    B --> C[optimize_prompt]
    C --> D[LiteLLM provider calls]
    D --> E[Evaluation signal]
    E --> F[OptimizationResult]
    F --> G{Chain next optimizer?}
    G -- yes --> B
    G -- no --> H[Deploy refined prompt]

Configuration, Setup, and Integration

The package is installed separately from the main SDK. Standard installation uses pip install opik-optimizer, with uv pip install opik-optimizer offered as a faster alternative (sdks/opik_optimizer/README.md:60-80). For teams that want optimizer runs to be recorded alongside traces and datasets, the documentation recommends also installing the main opik package and running opik configure to set the API key and workspace.

The integration template catalog in the documentation repo classifies the optimizer as a code integration: users install the Opik Python SDK, modify their code, and use Opik wrapper functions directly (apps/opik-documentation/documentation/templates/README.md:5-15). Optimizer code is therefore expected to live in the same Python project as the application being optimized.

The optimizer is a Python-only component at the time of writing. The TypeScript SDK has no equivalent optimization engine; its design docs cover tracing, evaluation, and integrations, but not prompt optimization (sdks/typescript/design/README.md:1-15). Teams with TypeScript/Node-based agents typically run the optimizer on the Python side and import the refined prompt back.

Common Failure Modes and Operational Notes

  1. Missing provider credentials. Because the optimizer routes through LiteLLM, an unset OPENAI_API_KEY (or the equivalent for the chosen provider) will cause every candidate evaluation to fail. Set the relevant environment variable before invoking optimize_prompt() (sdks/opik_optimizer/README.md:85-95).
  2. Confusing tracing and optimization. The opik SDK and opik-optimizer package are separate. Calling opik configure configures tracing, evaluation, and dataset logging — not the optimizer itself. For full experiment tracking, configure *both* the SDK and the optimizer's LiteLLM credentials (sdks/opik_optimizer/README.md:65-80).
  3. Deprecated parameters. The optimizer emits deprecation warnings and preserves old parameters via kwargs extraction, so code using older signatures still runs. Users should still address the warnings to benefit from the standardized interface (sdks/opik_optimizer/README.md:40-45).
  4. No optimizer on the TypeScript side. Attempting to call optimization routines from a TypeScript or Node application will not work; the optimizer is Python-only at this time (sdks/typescript/design/README.md:1-15).
  5. Cost amplification. Bayesian and evolutionary search strategies issue many LLM calls per run. Built-in LLM and tool call counters on the result object exist precisely so users can monitor usage; large search budgets should be sized with these counters in mind (sdks/opik_optimizer/README.md:35-45).

See Also

Source: https://github.com/comet-ml/opik / Human Manual

Frontend Application (React/TypeScript)

Related topics: System Architecture and Repository Layout, Tracing, Spans, and Framework Integrations, Evaluation, Datasets, Experiments, and LLM-as-Judge, Agent and Prompt Optimization (O...

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: System Architecture and Repository Layout, Tracing, Spans, and Framework Integrations, Evaluation, Datasets, Experiments, and LLM-as-Judge, Agent and Prompt Optimization (Opik Optimizer)

Frontend Application (React/TypeScript)

Overview and Purpose

The Opik frontend is a single-page React/TypeScript application that serves as the primary user interface for the open-source LLM observability platform. It lives in apps/opik-frontend and is consumed alongside the Python and TypeScript SDKs, the backend services, and the documentation site. Source: apps/opik-frontend/package.json.

The frontend is responsible for rendering trace explorers, experiment comparisons, prompt playgrounds, evaluation dashboards, and project management screens. It calls the Opik REST API directly and depends on the same domain model exposed by the TypeScript SDK. Source: sdks/typescript/design/README.md.

Community issues confirm that this UI is the surface users interact with daily. For example, bug #420 ("Cannot pass project_name to track decorator when using opik.evaluation.evaluate") lists "Opik UI" as one of the affected components, and feature request #949 asks for an authentication layer to be added "in front of the application" for the open-source build — both touch the frontend directly.

Tech Stack and Build Tooling

The frontend is a Vite-powered React 18 application written in TypeScript, styled with Tailwind CSS and SCSS modules. Key runtime and tooling dependencies declared in apps/opik-frontend/package.json include:

ConcernLibrary
UI primitives / layoutreact, react-dom, react-grid-layout, react-resizable-panels
Formsreact-hook-form
Data fetching / stateuse-query-params, use-local-storage-state, react-intersection-observer
Chartsrecharts
Markdown / sanitizationreact-markdown, remark-gfm, rehype-sanitize, sanitize-html
Media playbackreact-player, react-h5-audio-player, react-pdf
Linting / buildeslint, eslint-plugin-react, stylelint, vite, tsup (in SDKs), dependency-cruiser

The same file declares React 18 types (@types/react ^18.3.3, @types/react-dom ^18.3.0) and TypeScript-aware ESLint plugins. Lint-staged with Prettier is wired in for pre-commit formatting. Source: apps/opik-frontend/package.json.

v1 / v2 Module Structure

A noteworthy architectural detail is the presence of a v2 source tree alongside the legacy v1 code. The thin wrapper file apps/opik-frontend/src/v2/lib/utils.ts re-exports documentation URL builders from the v1 helpers, demonstrating that the v2 layer composes on top of v1 utilities rather than duplicating them:

import {
  buildDocsUrl as buildDocsUrlBase,
  buildDocsMarkdownUrl as buildDocsMarkdownUrlBase,
} from "@/lib/utils";

export const buildDocsUrl = (path: string = "", hash: string = "") =>
  buildDocsUrlBase(path, hash);

export const buildDocsMarkdownUrl = (path: string = "") =>
  buildDocsMarkdownUrlBase(path);

This incremental migration pattern lets new screens (likely the rewritten project/evaluation views mentioned in the Opik roadmap) reuse battle-tested helpers while introducing a cleaner module boundary. Source: apps/opik-frontend/src/v2/lib/utils.ts.

flowchart LR
    UI[Opik Frontend (React/TS)] -- "REST + OpenAPI" --> API[Opik Backend]
    UI -- "Vercel AI SDK telemetry" --> Exporter[opik-vercel Exporter]
    UI -- "Documentation deep-links" --> Docs[Docs site]
    Exporter -- "OTLP/HTTP" --> API

Integration with the TypeScript SDK Family

Although the frontend does not import the SDK packages directly, the entire observability surface it visualizes is produced by the TypeScript SDK and its integrations. The core SDK (sdks/typescript/package.json) declares peer dependencies on zod and ai (Vercel AI SDK v6), and pulls in @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, and @ai-sdk/google-vertex for typed model calls.

Integration packages live under sdks/typescript/src/opik/integrations/:

The internal design documentation (sdks/typescript/design/README.md) frames these around three patterns: Proxy, Callback, and Exporter, and groups tracing, evaluation, and integrations as first-class domains.

End-to-End Testing

UI regressions are guarded by the Playwright suite under tests_end_to_end/typescript-tests. The README (tests_end_to_end/typescript-tests/README.md) describes an "agentic" workflow where a planner produces a markdown spec, a generator writes a Playwright spec that imports page objects and fixtures, and a healer retries failed runs:

import { test, expect } from '../../fixtures/projects.fixture';
import { ProjectsPage } from '../../page-objects/projects.page';

test.describe('Feature Name @fullregression @feature', () => {
  test('should perform main user flow', async ({ page, projectName }) => {
    const projectsPage = new ProjectsPage(page);
    await projectsPage.goto();
    await projectsPage.clickProject(projectName);
    await expect(page.locator('[data-testid="result"]')).toBeVisible();
  });
});

This convention — fixtures per resource, page objects per screen, data-testid locators — is what the UI must conform to in order to remain testable. Source: tests_end_to_end/typescript-tests/README.md.

Common Gaps and Community Considerations

Several top community threads map directly to frontend work that is still open:

  • Authentication — Issue #949 requests a login layer for the OSS build, similar to Langfuse; the frontend would need a route guard and token storage. Source: community context, issue #949.
  • Workflow-tool integrations — Issue #1587 asks for n8n support; the docs template matrix (apps/opik-documentation/documentation/templates/README.md) already defines Code / OpenAI-compatible / LiteLLM / OpenTelemetry templates, but no n8n template yet.
  • Bug parity between SDK and UI — Issue #420 shows how decorator behavior (project_name) must stay in sync between the Python SDK and the project switcher shown in the UI.

See Also

  • Backend service and REST API surface
  • Python SDK tracing and evaluation
  • TypeScript SDK design (API and Data Flow, Tracing, Integrations, Evaluation)

Source: https://github.com/comet-ml/opik / Human Manual

Deployment, Self-Hosting, and Operations

Related topics: What is Opik? Platform Overview and Key Capabilities, System Architecture and Repository Layout, Authentication, Authorization, and Workspaces

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Frontend Nginx Variables

Continue reading this section for the full explanation and source context.

Section Client SDK Configuration

Continue reading this section for the full explanation and source context.

Section Authentication

Continue reading this section for the full explanation and source context.

Related topics: What is Opik? Platform Overview and Key Capabilities, System Architecture and Repository Layout, Authentication, Authorization, and Workspaces

Deployment, Self-Hosting, and Operations

Overview

Opik is shipped as a multi-application platform comprising a Java backend, a React/TypeScript frontend, and SDK clients for Python and TypeScript. Self-hosters run the backend and frontend together — typically behind Nginx — while client SDKs are pointed at the deployed endpoint via configuration. The repository exposes tooling that orchestrates the local stack, per-service container assets, and runtime configuration through environment variables.

Repository Topology for Deployment

The polyglot monorepo places each deployable unit under a top-level apps/ or sdks/ directory. The backend lives at apps/opik-backend/, the frontend SPA at apps/opik-frontend/, and the user-facing documentation site at apps/opik-documentation/. The Python SDK is rooted at sdks/python/, while the TypeScript SDK — including per-integration subpackages such as opik-gemini — lives at sdks/typescript/. Source: apps/opik-backend/README.md, apps/opik-frontend/README.md, apps/opik-documentation/README.md, sdks/python/README.md.

Local Development Runner

scripts/dev-runner.sh is the canonical entry point for spinning up a full local Opik stack. The repository documents it as a "Development environment runner script for local Opik development. This script manages Docker infrastructure, backend, and frontend services for development workflows." Source: scripts/README.md.

The default invocation performs a full restart with rebuild and must be run from the repository root:

./scripts/dev-runner.sh
# or explicitly
./scripts/dev-runner.sh --restart

The same README documents additional command modes — including a "Standard Mode" that runs the backend and frontend as local processes — useful for iterating on a single service without rebuilding the full Docker infrastructure. Source: scripts/README.md.

Runtime Configuration

Frontend Nginx Variables

The frontend is containerized and served behind Nginx. A patch-nginx.conf.sh script consumes environment variables to rewrite the Nginx configuration at container start, exposing the following knobs:

VariableDefaultDescription
NGINX_PID/run/nginx.pidPath to the Nginx PID file
NGINX_PORT8080Nginx listening port
OTEL_COLLECTOR_HOSTotel-collectorHostname of the OpenTelemetry collector
OTEL_COLLECTOR_PORT4317Port of the OpenTelemetry collector
OTEL_TRACES_EXPORTERotlpExporter type for OpenTelemetry
OTEL_EXPORTER_OTLP_TRACES_ENDPOINThttp://${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT}Full endpoint URL for OTLP traces

Source: apps/opik-frontend/README.md.

The defaults imply that a self-hosted Opik deployment is expected to ship with an OTLP-compatible collector reachable at otel-collector:4317, and that frontend telemetry is forwarded through standard OpenTelemetry environment variables rather than an in-process exporter. Frontend build and test tooling — Vite, Vitest, Tailwind, and the lint stack — is pinned in apps/opik-frontend/package.json. Source: apps/opik-frontend/package.json.

Client SDK Configuration

The Python SDK is the primary integration target for self-hosted deployments and is distributed via PyPI:

pip install opik

Source: sdks/python/README.md.

The TypeScript SDK ships a separate configure subpackage whose package.json declares dependencies on posthog-node, inquirer, zod, read-env, yargs, and dotenv. This tooling enables a CLI-driven initialization flow that helps operators point a Node.js SDK at a self-hosted instance without hard-coding endpoints. Source: sdks/typescript/src/opik/configure/package.json.

Operational Concerns and Community Gaps

Authentication

Authentication for the open-source distribution is a recurring community request. Issue #949 — "Authentication and Authorization" — explicitly compares the gap to Langfuse and is motivated by the desire for "a more secure approach" when exposing Opik publicly. Until upstream authentication lands, self-hosters who expose the application on an untrusted network should plan for an external reverse-proxy auth layer (OAuth proxy, mTLS, VPN, or IP allow-listing) in front of Nginx.

Workflow Integrations (n8n)

Issue #1587 requests n8n integration so that LLM workflow traces flow into Opik for evaluation. Until a native integration exists, operators can bridge n8n to Opik through the OpenTelemetry collector that the frontend already targets via OTEL_* environment variables, by exporting n8n spans to the same collector.

SDK Behavior Bugs Affecting Routing

A reported bug #420 documents that project_name cannot be passed through the @opik.track decorator when invoked inside opik.evaluation.evaluate (Opik 0.2.1). Operators planning multi-tenant trace routing on the Python SDK should track the upstream fix and validate project assignment end-to-end in evaluation pipelines.

Roadmap Visibility

The community roadmap thread (issue #535) links to the live Opik Roadmap document and surfaces deployment-adjacent milestones; self-hosters should monitor it for security, scalability, and integration-related changes. Source: issue context in the community conversation.

See Also

Source: https://github.com/comet-ml/opik / Human Manual

Authentication, Authorization, and Workspaces

Related topics: System Architecture and Repository Layout, Frontend Application (React/TypeScript), Deployment, Self-Hosting, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section AuthenticationResource and AuthService

Continue reading this section for the full explanation and source context.

Section AuthModule Wiring

Continue reading this section for the full explanation and source context.

Section Workspaces as the Tenancy Boundary

Continue reading this section for the full explanation and source context.

Related topics: System Architecture and Repository Layout, Frontend Application (React/TypeScript), Deployment, Self-Hosting, and Operations

Authentication, Authorization, and Workspaces

Overview

Opik organizes access control around three cooperating concerns: authentication (verifying who is calling), authorization (deciding what they may do), and workspaces (the tenancy boundary inside which traces, projects, datasets, and experiments are scoped). These concerns live in the backend service under the priv (private) API namespace, which is gated behind authenticated sessions in production deployments.

The community has actively asked for stronger out-of-the-box authentication in the open-source distribution (issue #949 — "Authentication and Authorization"), comparing the desired experience to Langfuse. As of release 2.0.73 (referenced in the latest release notes), the backend already ships the Java/Jersey resource classes listed above; the question for self-hosted operators is therefore less *whether* the hooks exist and more *how* to wire them up in front of the application.

The relationship between these subsystems is shown below.

flowchart LR
    Client[Client / SDK] -->|HTTPS + API key or cookie| AuthRes[AuthenticationResource]
    AuthRes --> AuthSvc[AuthService]
    AuthSvc -->|resolves principal| Filter[Auth Filter / AuthModule]
    Filter -->|enforces tenancy| WRes[WorkspacesResource]
    Filter -->|checks role| WPerm[WorkspacePermissionsResource]
    Filter -->|guards premium features| Toggle[ServiceTogglesResource]
    WRes --> Workspace[(Workspace store)]
    WPerm --> Perms[(Permission store)]

Authentication Layer

`AuthenticationResource` and `AuthService`

The HTTP surface for identity sits in AuthenticationResource, a Jersey resource exposed under /v1/priv/auth. It delegates the actual identity work to AuthService, which centralizes credential validation, session issuance, and principal resolution.

Typical responsibilities wired through these two classes include:

  • Verifying an incoming API key or username/password pair against the configured identity provider.
  • Issuing a session token (cookie or bearer header) on success.
  • Returning the resolved User and Workspace context for downstream resources.

Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/AuthenticationResource.java

Source: apps/opik-backend/src/main/java/com/comet/opik/infrastructure/auth/AuthService.java

`AuthModule` Wiring

AuthModule is the Guice/DI binding module that registers AuthService and any associated filters, providers, and request-scoped bindings. It is the single integration point for swapping or extending the auth stack (for example, plugging in an OIDC provider, an SSO IdP, or a header-based trust for an internal reverse proxy).

In practice, operators who want a hardened open-source deployment — the gap called out in #949 — typically:

  1. Front the backend with an authenticating reverse proxy (e.g. oauth2-proxy, Authentik, Keycloak) that injects a trusted header.
  2. Extend AuthModule to bind a custom AuthService that consumes that header and resolves the principal.
  3. Reuse the existing WorkspacePermissionsResource for per-workspace checks rather than re-implementing them.

Source: apps/opik-backend/src/main/java/com/comet/opik/infrastructure/auth/AuthModule.java

Workspaces and Permissions

Workspaces as the Tenancy Boundary

A *workspace* in Opik is the top-level container that owns projects, datasets, experiments, prompts, and traces. WorkspacesResource exposes CRUD-style endpoints for creating, listing, and configuring workspaces, and is the only path through which a workspace is materialized server-side.

Every authenticated request is associated with exactly one workspace context, propagated by AuthService. That context flows into the data layer so that traces logged from the Python or TypeScript SDK never bleed across tenants.

Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/WorkspacesResource.java

Per-Workspace Authorization

WorkspacePermissionsResource handles role-based checks inside a workspace — who may read traces, who may run evaluations, who may invite users, and so on. It pairs with AuthService so that authorization decisions use the same resolved principal as authentication.

This is also the layer that interacts with the frontend's v1/v2 routing in apps/opik-frontend/README.md, where the shared → v1/pages-shared → v1/pages import direction ensures authorization-aware components do not silently bypass checks.

Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/WorkspacePermissionsResource.java

Service Toggles and Feature Gating

Some capabilities — notably advanced guardrails, certain integrations, and parts of the OpenTelemetry-based proxy surface documented in apps/opik-guardrails-backend/README.md — are gated behind feature flags. ServiceTogglesResource exposes the current toggle state to the frontend so the UI can hide controls the workspace cannot use, and to other backend services that need to short-circuit calls when a feature is disabled.

This resource is also useful for staged rollouts: an operator can enable a feature for a single workspace without exposing it elsewhere, which is a common pattern when hardening auth itself.

Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/ServiceTogglesResource.java

Common Failure Modes and Configuration Notes

  • No auth in front of the open-source build. Until issue #949 is fully resolved for self-hosted users, operators must put an authenticating reverse proxy in front of the backend. Skipping this step exposes every priv endpoint to the public network.
  • Project scoping in evaluation flows. A frequent bug class — see #420 — is that a project_name passed to opik.evaluation.evaluate may not propagate through the same workspace context used by the @opik.track decorator. Always verify the resolved workspace after authentication when debugging cross-resource traces.
  • Missing cost metadata. When a model is missing from model_prices_and_context_window.json (see #4507), the guardrails/auth services that depend on cost thresholds will silently skip enforcement. Treat feature toggles as load-bearing and verify them after changes.

See Also

Source: https://github.com/comet-ml/opik / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

1. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/comet-ml/opik

2. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik

3. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/comet-ml/opik

4. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/comet-ml/opik

5. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik

6. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using opik with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence