Doramagic Project Pack · Human Manual
opik
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
What is Opik? Platform Overview and Key Capabilities
Related topics: System Architecture and Repository Layout, Deployment, Self-Hosting, and Operations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture and Repository Layout, Deployment, Self-Hosting, and Operations
What is Opik? Platform Overview and Key Capabilities
1. Platform Purpose and Scope
Opik is an open-source platform built by Comet that streamlines the entire lifecycle of LLM (Large Language Model) applications. It is designed to help developers evaluate, test, monitor, and optimize LLM models and agentic systems, both during development and in production. Source: README.md:5-13.
The platform addresses common pain points when building LLM-powered systems — lack of visibility into model behavior, difficulty measuring quality, and operational blind spots in production traffic. Opik bundles tracing, evaluation, online monitoring, prompt optimization, and safety guardrails into a single ecosystem.
The project is organized as a monorepo containing:
apps/opik-backend— Java-based backend service (see apps/opik-backend/README.md).apps/opik-frontend— React + TypeScript web UI built on shadcn/ui, Radix, and Zustand (see apps/opik-frontend/README.md:3-44).apps/opik-documentation— User-facing docs and integration templates.sdks/python— Python SDK distributed on PyPI.sdks/typescript— TypeScript/JavaScript SDK with first-class Node.js support.sdks/opik_optimizer— Standalone prompt and agent optimization package (see sdks/opik_optimizer/README.md).
2. Core Capabilities
The platform groups its capabilities into five areas:
2.1 Tracing and Observability
Opik provides deep tracing of LLM calls, conversation turns, and agent activity. The Python SDK exposes a @opik.track decorator and track_*() wrappers that automatically capture inputs, outputs, latency, and token usage (see sdks/python/design/README.md:5-17). Batched, asynchronous message processing ensures tracing does not block application code. Source: README.md:14-22.
The TypeScript SDK mirrors this with a track decorator, a native OpenTelemetry exporter, and a batch-queue client (see sdks/typescript/design/README.md:5-17).
2.2 Evaluation
Opik supports prompt evaluation, LLM-as-a-judge scoring, and experiment management. The Python design docs describe four evaluation methods and a metrics architecture that separates metric definitions from evaluation engines (see sdks/python/design/README.md:5-9). TypeScript support includes prompt evaluation flows as well (see sdks/typescript/design/README.md:5-17).
2.3 Production Monitoring
Production-grade dashboards expose LLM-as-a-Judge metrics and online evaluation rules so teams can identify issues in live traffic. Source: README.md:20-22.
2.4 Opik Agent Optimizer
A dedicated SDK (opik-optimizer) that enhances prompts and agents. It uses LiteLLM under the hood, which means any provider supported by LiteLLM can be used as an optimization target. Source: sdks/opik_optimizer/README.md:11-44.
2.5 Opik Guardrails
Built-in features that help teams implement safe and responsible AI practices, particularly relevant for production deployments. Source: README.md:24-26.
3. SDKs, Integrations, and Configuration
Opik ships with first-party Python and TypeScript SDKs and a growing library of integrations.
The TypeScript SDK environment variables are documented in sdks/typescript/src/opik/configure/src/lib/env-constants.ts:1-26. The four required variables are summarized below.
| Variable | Purpose |
|---|---|
OPIK_API_KEY | API key for authentication against the Opik server. |
OPIK_URL_OVERRIDE | Base URL for the Opik API (Cloud or self-hosted). |
OPIK_WORKSPACE | Workspace name. |
OPIK_PROJECT_NAME | Project name for organizing traces (default: "Default Project"). |
A companion CLI (npx opik-ts configure) interactively writes these values to the user's environment, including a --use-local flag for local development. Source: sdks/typescript/design/README.md:30-40.
Integrations are split into first-party packages with their own npm artifacts. The current official set includes:
opik-openai— OpenAI client wrapper (see sdks/typescript/src/opik/integrations/opik-openai/package.json:1-35).opik-langchain— LangChain instrumentation (see sdks/typescript/src/opik/integrations/opik-langchain/package.json:1-35).opik-vercel— Vercel AI SDK integration with hierarchical trace visualization, metadata capture, error handling, and streaming support (see sdks/typescript/src/opik/integrations/opik-vercel/README.md:1-32).opik-otel— Generic OpenTelemetry exporter for any OTEL-instrumented runtime (see sdks/typescript/src/opik/integrations/opik-otel/package.json:1-35).
For Python, the integration documentation templates() define a decision matrix covering four patterns: code-based wrappers, OpenAI-compatible wrappers, LiteLLM callbacks, and pure OpenTelemetry. This matrix is the canonical reference for new integrations.
4. Deployment and High-Level Architecture
Opik can be deployed in three ways, summarized in the diagram below.
flowchart LR
A[LLM Application] --> B[Opik SDK<br/>Python or TypeScript]
B --> C[Opik Backend<br/>Java service]
C --> D[(Storage<br/>ClickHouse + MySQL)]
C --> E[Opik Frontend<br/>React UI]
F[Comet.com Cloud] -.alternative.-> A
A -.uses.-> G[opik-optimizer<br/>Prompt optimization]
A -.uses.-> H[Guardrails<br/>Safety checks]- Cloud (managed) — Recommended path; sign up at comet.com. Source: README.md:32-34.
- Self-hosted Docker Compose —
./opik.shscript brings up the full stack for local development. Source: README.md:36-46. - Kubernetes — For production-scale self-hosting (referenced in the same installation section).
The backend exposes a REST API consumed by the frontend and the SDKs; SDKs batch events locally before flushing, which keeps instrumentation overhead minimal (see sdks/python/design/README.md:5-17 and sdks/typescript/design/README.md:5-17).
5. Community Context and Known Gaps
Several frequently requested features shape the platform's roadmap:
- Authentication and Authorization — A community proposal (#949) requests an auth layer in front of the open-source deployment similar to Langfuse, indicating that production users want self-hosted Opik to be secure by default.
- n8n integration — A widely followed request (#1587) calls for native support in the n8n workflow automation tool, which would let non-engineers route prompts and completions through Opik without code changes.
- Project name propagation bug — Issue #420 shows that
project_nameis not correctly forwarded to the@opik.trackdecorator when used insideopik.evaluation.evaluate, an SDK/UI interaction gap that is worth understanding when designing evaluation pipelines. - Cost-tracking coverage — Issue #4507 requests additional entries in
model_prices_and_context_window.json(e.g.,qwen/qwen3-235b-a22b-2507via OpenRouter) so cost analytics stay accurate for new providers.
The latest release referenced in these discussions is 2.0.73, which also surfaces OTel span errors, tool outputs, and Google cost data for Pydantic AI (see README.md release notes block).
See Also
Source: https://github.com/comet-ml/opik / Human Manual
System Architecture and Repository Layout
Related topics: What is Opik? Platform Overview and Key Capabilities, Tracing, Spans, and Framework Integrations, Frontend Application (React/TypeScript)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: What is Opik? Platform Overview and Key Capabilities, Tracing, Spans, and Framework Integrations, Frontend Application (React/TypeScript)
System Architecture and Repository Layout
Overview and Scope
The comet-ml/opik repository is a polyglot monorepo that delivers Opik — an open-source LLM evaluation, tracing, and monitoring platform developed by Comet. It ships several independently versioned components under a single source tree: a Java backend service, a React-based frontend, a documentation application, a guardrails microservice, plus Python and TypeScript SDKs and their respective integration packages. The repository's self-describing READMEs emphasize that the project is organized so contributors can navigate to the specific application or SDK they need to modify without coupling concerns across languages.
The release cadence is tracked centrally — the most recent published version visible in the community changelog is 2.0.73, which bundles backend fixes, SDK improvements, and OpenAPI/Fern code generation automation in a single release. Source: scripts/README.md (release notes summary).
Top-Level Repository Layout
The root of the repository is divided into four primary top-level directories that map cleanly to deliverable surfaces. Each top-level area contains its own README pointing contributors to a CONTRIBUTING.md guide.
| Directory | Purpose | Source |
|---|---|---|
apps/ | Deployable services (backend, frontend, documentation, guardrails) | apps/opik-backend/README.md, apps/opik-frontend/README.md |
sdks/ | Client libraries (Python, TypeScript) and third-party integrations | sdks/python/README.md, sdks/typescript/design/README.md |
scripts/ | Repository-level tooling: OpenAPI generation, dev-runner, codex sync | scripts/README.md |
tests_end_to_end/ | Browser-level Playwright tests with agentic planning/healer tooling | tests_end_to_end/typescript-tests/README.md |
The architecture is intentionally layered so that cross-cutting concerns (such as the API contract and generated client code) are owned by scripts/generate_openapi.sh and consumed by both the SDKs and the documentation app. Source: scripts/README.md.
Application Services (`apps/`)
The apps/ directory hosts the runtime surfaces that compose the Opik platform.
Backend
apps/opik-backend is the Java service that powers Opik's API. Its README defers contribution guidance to CONTRIBUTING.md, and the surrounding tooling — including Liquibase migrations and the Dockerfile referenced in the build pipeline — establishes it as the canonical server implementation. The backend exposes the OpenAPI specification that downstream SDKs are generated against. Source: apps/opik-backend/README.md.
Frontend
apps/opik-frontend is a React/TypeScript SPA built around Zustand stores and a shadcn/ui + Radix component layer. Notably, the frontend ships two parallel navigation generations under a single entry point:
src/v1/— Opik 1: feature-organized navigation (layout/,pages/,pages-shared/).src/v2/— Opik 2: project-first navigation with the same substructure.
Both versions share a common import-direction rule: ui → shared → v1/pages-shared → v1/pages (and analogously for v2). The router in src/router.tsx and entry point src/index.tsx orchestrate both. Source: apps/opik-frontend/README.md. Each version also has its own lib/utils.ts that prefixes documentation URLs with /v1 or /v2 respectively, indicating the docs site mirrors the same versioning. Source: apps/opik-frontend/src/v1/lib/utils.ts, apps/opik-frontend/src/v2/lib/utils.ts.
Documentation
apps/opik-documentation is a Fern-based docs site. A templates subdirectory formalizes four integration documentation archetypes (Code, OpenAI-Based, LiteLLM, OpenTelemetry) and prescribes screenshot placement under fern/img/tracing/. Source: apps/opik-documentation/documentation/templates/README.md.
Guardrails Backend
apps/opik-guardrails-backend is a separate microservice exposing a /guardrails endpoint for TOPIC and PII validation. It accepts a JSON payload of validations and returns a structured result, with entity/topic thresholds configurable per request. Source: apps/opik-guardrails-backend/README.md.
SDKs and Integration Packages (`sdks/`)
Python SDK
sdks/python/ ships the opik PyPI package. The Python SDK design is documented in sdks/python/design/README.md, which exposes four contributor guides: *API and Data Flow*, *Testing*, *Integrations*, and *Evaluation*. Each describes a 3-layer architecture with sync vs. async paths and batched message processing. Source: sdks/python/design/README.md.
TypeScript SDK and Integrations
The TypeScript SDK mirrors the Python design, with guides for *API and Data Flow*, *Tracing*, *Testing*, *Integrations*, and *Evaluation*. Under sdks/typescript/src/opik/integrations/, each integration is published as a standalone npm package with explicit peer-dependency pinning:
| Integration | Peer Dependency | Source | ||
|---|---|---|---|---|
opik-openai | openai: ^6.0.1, opik: ^1.8.61 | package.json | ||
opik-langchain | `@langchain/core: ^0.3.78 \ | \ | ^1.0.0, opik: ^1.8.75` | package.json |
opik-gemini | @google/genai: >=1.0.0, opik: ^1.7.25 | package.json | ||
opik-configure | (CLI; depends on axios, inquirer, magicast, posthog-node) | package.json |
Each integration package bundles its own build/lint/test scripts via tsup, eslint, and vitest, allowing them to ship independently. Source: sdks/typescript/src/opik/integrations/opik-openai/package.json. Community requests such as #1587 (n8n support) and #4507 (additional model cost entries) demonstrate ongoing demand for new integration packages and provider cost data. Source: [community context — issues #1587, #4507].
Supporting Tooling
Repository Scripts
scripts/dev-runner.sh provides a Docker-backed local development orchestrator supporting --restart, --backend, --frontend, and related subcommands for managing infrastructure, backend, and frontend processes. scripts/generate_openapi.sh and scripts/start_openapi_server.sh keep the OpenAPI specification (used by the SDKs via Fern) and a Redoc preview server in lockstep. scripts/sync-codex.sh syncs .agents/rules/*.mdc into a Codex-friendly AGENTS.override.md. Source: scripts/README.md.
End-to-End Testing
tests_end_to_end/typescript-tests/ contains a Playwright-based suite augmented with two AI agents: a Planner that turns feature specs into *.spec.ts files (saving under tests/{feature-area}/{test-name}.spec.ts) and a Healer that retries and patches failing tests. The generated tests use the same fixtures and page-object patterns as the manual suite. Source: tests_end_to_end/typescript-tests/README.md.
High-Level Architecture
flowchart LR
subgraph Clients
PySDK["Python SDK<br/>(sdks/python)"]
TSSDK["TypeScript SDK<br/>(sdks/typescript)"]
Integ["Integrations<br/>opik-openai / opik-langchain / opik-gemini"]
end
subgraph Apps["apps/"]
Backend["opik-backend<br/>(Java API)"]
Frontend["opik-frontend<br/>(v1 + v2 SPA)"]
Docs["opik-documentation<br/>(Fern)"]
Guards["opik-guardrails-backend"]
end
Scripts["scripts/<br/>OpenAPI · dev-runner"]
E2E["tests_end_to_end/<br/>Playwright + AI agents"]
PySDK --> Backend
TSSDK --> Backend
Integ --> TSSDK
Frontend --> Backend
Guards --> Backend
Scripts -. generates .-> PySDK
Scripts -. generates .-> TSSDK
Scripts -. generates .-> Docs
E2E --> Frontend
E2E --> BackendCommon Cross-Cutting Concerns
- API contract: The backend's OpenAPI spec is the single source of truth; both SDKs and documentation are regenerated from it via
scripts/generate_openapi.sh. Source: scripts/README.md. - Versioning: Independent package versioning is evident in the peer-dependency pins (e.g.,
opik-openairequiresopik ^1.8.61, whileopik-geminirequiresopik ^1.7.25). Source: sdks/typescript/src/opik/integrations/opik-openai/package.json. - Authentication: Currently, the open-source distribution does not include an authentication layer — community issue #949 explicitly requests one comparable to Langfuse. This is a known architectural gap worth tracking when evaluating deployment. Source: [community context — issue #949].
- SDK evaluation API: Issue #420 reports that
project_namecannot be passed through the@opik.trackdecorator when invoked insideopik.evaluation.evaluate, illustrating tight coupling between the tracing decorator and evaluation entry points in the Python SDK. Source: [community context — issue #420].
See Also
Source: https://github.com/comet-ml/opik / Human Manual
Tracing, Spans, and Framework Integrations
Related topics: System Architecture and Repository Layout, Evaluation, Datasets, Experiments, and LLM-as-Judge, Authentication, Authorization, and Workspaces
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture and Repository Layout, Evaluation, Datasets, Experiments, and LLM-as-Judge, Authentication, Authorization, and Workspaces
Tracing, Spans, and Framework Integrations
Overview
Opik is an open-source LLM evaluation and observability platform by Comet that centers on tracing as its primary signal for understanding LLM applications. Every API call, retrieval step, tool invocation, and agent turn can be recorded as a hierarchical record composed of traces and spans, then visualized in the Opik UI for debugging and evaluation. The repository ships two parallel SDKs — Python and TypeScript — and a growing catalog of integrations that adopt different patching strategies depending on the target framework.
The integration matrix in the main README.md lists supported frameworks such as LangChain, LlamaIndex, OpenAI, Anthropic, Haystack, CrewAI, DSPy, Semantic Kernel, Strands Agents, Spring AI, Vercel AI SDK, and many others. A separate column for OpenTelemetry-based capture extends coverage to languages without a first-class SDK, and community requests (e.g. issue #1587 asking for n8n support) continue to expand the matrix.
The Trace and Span Data Model
A trace in Opik represents one end-to-end execution of a logical unit of work — typically one user request, one agent run, or one evaluation task. Spans are the children of a trace and represent a single operation inside it, such as an LLM call, a vector search, or a tool call. The Python SDK exposes @opik.track to instrument functions, while the TypeScript SDK exposes a track decorator backed by AsyncLocalStorage for correct async context propagation, as documented in sdks/typescript/design/TRACING.md.
Internally, the SDKs push trace and span events through asynchronous batch queues before flushing them to the Opik backend. The Python SDK design doc (sdks/python/design/README.md) emphasizes a three-layer architecture and a batching system tuned for throughput, while the TypeScript design doc describes the equivalent batch-queue pipeline. Both SDKs auto-generate their REST clients from the server's OpenAPI specification using Fern — see scripts/generate_openapi.sh — so any new trace or span field added to the backend becomes available to SDK users on the next regeneration cycle.
Framework Integration Patterns
Opik supports four distinct integration patterns, classified in apps/opik-documentation/documentation/templates/README.md:
| Pattern | When used | Example |
|---|---|---|
| Code integration | User modifies code; uses track_*() wrappers | LangChain, CrewAI, DSPy, Haystack |
| OpenAI-compatible integration | Target exposes an OpenAI-style client | BytePlus, OpenRouter, any OpenAI-compatible API |
| LiteLLM integration | Provider is supported by LiteLLM; uses OpikLogger callback | OpenAI, Anthropic, Groq, Fireworks, Cohere, Mistral, xAI Grok |
| OpenTelemetry integration | No code changes; configured via OTEL env vars | Ruby SDK, Pydantic AI via Logfire, direct OTEL Python |
The TypeScript SDK further distinguishes Proxy, Callback, and Exporter patterns in sdks/typescript/design/INTEGRATIONS.md. Each integration package is published as its own npm module — for example, sdks/typescript/src/opik/integrations/opik-gemini/package.json declares opik as a peer dependency and uses tsup to produce both ESM and CommonJS bundles, mirroring how every official integration is structured.
flowchart LR
UserCode[User Code / Framework] -->|track decorator or callback| SDK[Opik SDK]
SDK -->|batched events| Queue[Async Batch Queue]
Queue -->|HTTP/gRPC| Backend[Opik Backend]
OTEL[OpenTelemetry SDK] -->|OTLP| Backend
Backend --> UI[Opik UI / API]
UI --> Eval[Evaluation & Metrics]OpenTelemetry and Cross-Language Capture
For runtimes where Opik does not yet provide a first-class SDK, the backend accepts OpenTelemetry traces via the OTLP endpoint configured through environment variables such as OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and OTEL_TRACES_EXPORTER. Recent releases (e.g. PR #7152, mentioned in the community changelog) improved surfacing of OTel span errors, tool outputs, and Google provider cost data for Pydantic AI users — illustrating that the OTEL path is treated as a first-class ingestion route rather than a fallback. The frontend's Nginx configuration (apps/opik-frontend/README.md) forwards traces to a dedicated otel-collector service, separating ingest from the application HTTP traffic.
Community-Driven Extensions
Several highly-discussed community issues shape the tracing roadmap. Issue #949 requests an authentication and authorization layer for the open-source deployment, which would protect trace data behind per-project access control. Issue #420 reports that project_name cannot be passed to the @opik.track decorator when used together with opik.evaluation.evaluate, causing traces to land in the wrong project — a notable failure mode for users who run evaluations inside scripts. Issue #4507 highlights the importance of model_prices_and_context_window.json, the central registry used by integrations to compute cost; missing entries produce zero-cost spans in the UI.
See Also
- Evaluation and metrics
- Python SDK architecture
- TypeScript SDK architecture
- Deployment and Docker infrastructure
Source: README.md:1-50, sdks/python/design/README.md:1-40, sdks/typescript/design/README.md:1-50, apps/opik-documentation/documentation/templates/README.md:1-40, apps/opik-frontend/README.md:1-60, scripts/README.md:1-30
Source: https://github.com/comet-ml/opik / Human Manual
Evaluation, Datasets, Experiments, and LLM-as-Judge
Related topics: Tracing, Spans, and Framework Integrations, Agent and Prompt Optimization (Opik Optimizer), Frontend Application (React/TypeScript)
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Tracing, Spans, and Framework Integrations, Agent and Prompt Optimization (Opik Optimizer), Frontend Application (React/TypeScript)
Evaluation, Datasets, Experiments, and LLM-as-Judge
Overview and Purpose
Opik is an open-source LLM evaluation and observability platform. The project exposes a unified system in which datasets hold representative test inputs, experiments record the output of running a candidate pipeline (or prompt) against those inputs, and evaluation functions score those outputs. The LLM-as-Judge pattern is one of the supported scoring strategies and is used when a metric is best approximated by another language model rather than deterministic logic.
The repository separates this concern across three layers:
- A Python SDK that ships the user-facing evaluation API, dataset management, and metric base classes.
- A TypeScript SDK that mirrors the core flow (
dataset/,experiment/,evaluation/modules) for browser and Node.js consumers. - A backend service and frontend that persist datasets, experiments, and feedback scores and render them in the UI.
Source: sdks/python/design/README.md. The Python design guide lists Evaluation.md as a dedicated contributor document covering the evaluation engine, the four evaluation methods, and the metrics architecture. Source: sdks/typescript/design/README.md — the TypeScript design guide does the same and points at a BaseMetric abstraction for adding new metrics.
The opik_optimizer package sits on top of this stack: it consumes the same dataset and metric primitives to run prompt-optimization algorithms (including GepaOptimizer) that rely on evaluation feedback to drive search. Source: sdks/opik_optimizer/README.md.
Datasets and Experiments
Datasets and experiments are the durable artefacts of the evaluation system. A dataset is a versioned collection of items (typically input/output pairs) used as test cases; an experiment is the result of executing a candidate pipeline, prompt, or model against a dataset and capturing the traces, outputs, and feedback scores that were produced.
In the Python SDK, the design guide calls out dataset/, experiment/, and evaluation/ as first-class submodules inside opik/, alongside tracer/, prompt/, and query/. Source: sdks/python/design/README.md. The TypeScript SDK mirrors this layout, grouping Dataset management, Experiment tracking, and Evaluation engine and metrics under their own directories. Source: sdks/typescript/design/README.md.
flowchart LR
A[Dataset items] --> B[Experiment run]
B --> C[Traces / Spans]
C --> D[Metric scoring]
D --> E[Feedback scores]
E --> F[Experiment report]The data flow is consistent across SDKs: items leave the dataset, are fed into a tracked function or model call, become traces and spans, are scored by one or more metrics, and finally surface as feedback scores attached to the experiment. Source: sdks/typescript/design/README.md — the TypeScript design notes that the tracer/ module owns Trace and Span objects that the evaluation layer reads from. Source: sdks/python/design/README.md.
A community-reported bug illustrates the close coupling between project names, the track decorator, and the evaluation entry point: passing project_name to the decorator while running opik.evaluation.evaluate was reported as broken in version 0.2.1 and was tracked in issue #420. The fix path lies in correctly threading project context through the evaluation entry point, not just the decorator.
Evaluation Methods and Metrics
Opik's evaluation engine supports multiple methods so users can choose the granularity that matches their pipeline. The Python design guide enumerates "all 4 evaluation methods" and a dedicated Metrics Architecture section. Source: sdks/python/design/README.md. Concretely, these methods typically include: scoring an entire experiment offline, scoring a single task invocation, scoring using a custom mapping function, and using a built-in LLM-as-Judge metric.
The metrics architecture is deliberately pluggable. In TypeScript, a BaseMetric class is the canonical extension point, and the design guide recommends reading the "Metrics Architecture, BaseMetric" section when adding a new scorer. Source: sdks/typescript/design/README.md. In Python, the equivalent is documented in the Evaluation.md contributor doc listed by the design guide. Source: sdks/python/design/README.md.
The optimizer package reinforces this design: it consumes the same ChatPrompt and metric objects that evaluation does, then uses metric output to drive prompt search. Source: sdks/opik_optimizer/README.md — its GepaOptimizer example runs an agent and a tool-equipped prompt against a dataset and a user-supplied metric, exactly the same contract evaluation uses.
LLM-as-Judge and Integration Templates
LLM-as-Judge is the metric pattern in which a language model is prompted with the candidate output (and optionally reference output) and asked to return a structured verdict — typically a score plus a reason. The Integration Documentation Templates in the documentation app make the available metric integration patterns explicit, which is relevant because a judge metric is itself implemented as an integration with a model provider. Source: apps/opik-documentation/documentation/templates/README.md.
The template matrix in that README defines four integration shapes that a judge metric can take:
| Integration Type | Judge Pattern |
|---|---|
| Code Integration | User-supplied Python class implementing the metric interface, then registered with track_*() wrappers |
| OpenAI-Based Integration | Reuses track_openai() to score completions from any OpenAI-compatible endpoint |
| LiteLLM Integration | Routes the judge call through OpikLogger and LiteLLM's unified interface |
| OpenTelemetry Integration | The judge emits OTel spans consumed by Opik through the OTEL endpoint, requiring no code changes |
Source: apps/opik-documentation/documentation/templates/README.md.
A note on community gaps: a frequent request in the project's issue tracker is to keep the upstream model_prices_and_context_window.json registry current (for example, to add openrouter/qwen/qwen3-235b-a22b-2507 with the Cerebras provider, issue #4507). LLM-as-Judge metrics inherit this dependency because accurate cost tracking requires the registry to know about the judge model. Source: issue #4507.
Operational Considerations
- End-to-end coverage. The Playwright-based end-to-end test harness ships a "Planner / Generator / Healer" agent workflow that can produce tests for evaluation features such as
project-metrics,dataset-upload, andexperiment comparison. This means an evaluation change in the backend typically expects a matching E2E spec undertests/. Source: tests_end_to_end/typescript-tests/README.md. - Configuration tooling. The TypeScript SDK ships a
configureCLI (invoked asnpx opik-ts configure) used to bootstrap project setup, including any environment variables the evaluation engine reads. Source: sdks/typescript/src/opik/configure/package.json. - Backend persistence. Datasets, experiments, and feedback scores are persisted by the Java backend and surfaced through the React frontend; the frontend README groups project-level views under
v1/pagesandv2/pages, and a "project-first navigation" inv2is the new entry point for evaluation dashboards. Source: apps/opik-frontend/README.md.
See Also
Source: https://github.com/comet-ml/opik / Human Manual
Agent and Prompt Optimization (Opik Optimizer)
Related topics: Evaluation, Datasets, Experiments, and LLM-as-Judge, Frontend Application (React/TypeScript)
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Evaluation, Datasets, Experiments, and LLM-as-Judge, Frontend Application (React/TypeScript)
Agent and Prompt Optimization (Opik Optimizer)
Overview and Purpose
The Opik Agent Optimizer is a dedicated Python package, distributed on PyPI as opik-optimizer, that refines prompts and LLM call parameters to improve model performance. It lives alongside the core Opik Python SDK and is described as a component of the broader Opik evaluation platform by Comet (sdks/opik_optimizer/README.md:1-7).
Its position in the Opik ecosystem is complementary to the tracing and evaluation features of the main SDK. The main Opik Python SDK focuses on capturing, tracing, and evaluating LLM calls (sdks/python/README.md:1-15), while the optimizer operates on prompts themselves — iteratively improving them through automated search strategies. The high-level platform goal, as expressed in the top-level README, is to help teams "build, evaluate, and optimize LLM systems that run better, faster, and cheaper," and the optimizer is the component that addresses the "optimize" portion of that workflow (README.md:1-30).
The optimizer exposes a standardized API, which is its central design principle: every algorithm in the package implements the same optimize_prompt() method, returns a standardized OptimizationResult object, and supports chaining, multimodal prompts (text, image, audio, video), Model Context Protocol (MCP) tool calling via ChatPrompt.tools, and built-in LLM/tool call counters (sdks/opik_optimizer/README.md:30-45). This uniformity means that swapping one algorithm for another is a single-line change.
Optimizer Algorithms
Six optimizer algorithms ship in the package, each targeting a different strategy for prompt improvement (sdks/opik_optimizer/README.md:9-20):
| Algorithm | Strategy |
|---|---|
EvolutionaryOptimizer | Genetic algorithms for prompt evolution |
FewShotBayesianOptimizer | Few-shot learning combined with Bayesian optimization |
GepaOptimizer | GEPA (Genetic-Pareto) optimization approach |
HRPO | Hierarchical root-cause analysis to refine prompts from synthesized failure modes |
MetaPromptOptimizer | Meta-prompting techniques |
ParameterOptimizer | Bayesian optimization of LLM call parameters (e.g. temperature, top_p) |
Because all six share the same optimize_prompt() interface, the user picks an algorithm based on the *kind* of change that is suspected to help — wording changes (meta-prompting, evolutionary), demonstration selection (few-shot Bayesian), parameter tuning (Bayesian on temperature/top-p), or root-cause-driven rewrites (HRPO) — rather than learning a new API per algorithm. The "Optimizer Chaining" feature lets the output of one optimizer become the input of the next, so practitioners can compose, for example, a MetaPromptOptimizer pass followed by a ParameterOptimizer pass (sdks/opik_optimizer/README.md:35-40).
Standardized API and Result Model
The optimize_prompt() method is the single entry point. Internally, it accepts a ChatPrompt (which can carry tools for MCP tool calling and multimodal content parts), runs the chosen search strategy, and returns an OptimizationResult (sdks/opik_optimizer/README.md:30-45). The result is consumed identically regardless of the algorithm used, which is what enables chaining and comparison.
LLM access is delegated to LiteLLM, so the optimizer inherits LiteLLM's broad provider support. Provider credentials are passed through standard environment variables such as OPENAI_API_KEY, configured before the optimizer is invoked (sdks/opik_optimizer/README.md:50-90). The TypeScript SDK's environment-constant module demonstrates the same pattern of standardized environment variable names — OPIK_API_KEY, OPIK_URL_OVERRIDE, OPIK_WORKSPACE, OPIK_PROJECT_NAME — which the Python ecosystem mirrors (sdks/typescript/src/opik/configure/src/lib/env-constants.ts:1-30).
A high-level data flow for a typical optimization run is:
flowchart LR
A[Define ChatPrompt] --> B[Choose Optimizer]
B --> C[optimize_prompt]
C --> D[LiteLLM provider calls]
D --> E[Evaluation signal]
E --> F[OptimizationResult]
F --> G{Chain next optimizer?}
G -- yes --> B
G -- no --> H[Deploy refined prompt]Configuration, Setup, and Integration
The package is installed separately from the main SDK. Standard installation uses pip install opik-optimizer, with uv pip install opik-optimizer offered as a faster alternative (sdks/opik_optimizer/README.md:60-80). For teams that want optimizer runs to be recorded alongside traces and datasets, the documentation recommends also installing the main opik package and running opik configure to set the API key and workspace.
The integration template catalog in the documentation repo classifies the optimizer as a code integration: users install the Opik Python SDK, modify their code, and use Opik wrapper functions directly (apps/opik-documentation/documentation/templates/README.md:5-15). Optimizer code is therefore expected to live in the same Python project as the application being optimized.
The optimizer is a Python-only component at the time of writing. The TypeScript SDK has no equivalent optimization engine; its design docs cover tracing, evaluation, and integrations, but not prompt optimization (sdks/typescript/design/README.md:1-15). Teams with TypeScript/Node-based agents typically run the optimizer on the Python side and import the refined prompt back.
Common Failure Modes and Operational Notes
- Missing provider credentials. Because the optimizer routes through LiteLLM, an unset
OPENAI_API_KEY(or the equivalent for the chosen provider) will cause every candidate evaluation to fail. Set the relevant environment variable before invokingoptimize_prompt()(sdks/opik_optimizer/README.md:85-95). - Confusing tracing and optimization. The
opikSDK andopik-optimizerpackage are separate. Callingopik configureconfigures tracing, evaluation, and dataset logging — not the optimizer itself. For full experiment tracking, configure *both* the SDK and the optimizer's LiteLLM credentials (sdks/opik_optimizer/README.md:65-80). - Deprecated parameters. The optimizer emits deprecation warnings and preserves old parameters via
kwargsextraction, so code using older signatures still runs. Users should still address the warnings to benefit from the standardized interface (sdks/opik_optimizer/README.md:40-45). - No optimizer on the TypeScript side. Attempting to call optimization routines from a TypeScript or Node application will not work; the optimizer is Python-only at this time (sdks/typescript/design/README.md:1-15).
- Cost amplification. Bayesian and evolutionary search strategies issue many LLM calls per run. Built-in LLM and tool call counters on the result object exist precisely so users can monitor usage; large search budgets should be sized with these counters in mind (sdks/opik_optimizer/README.md:35-45).
See Also
- Tracing Integrations (Vercel AI SDK) — example of a code-side integration that the optimizer can target.
- Integration Templates — decision matrix for choosing integration styles.
- Opik Python SDK Design — context on the 3-layer SDK architecture that hosts the optimizer's client calls.
- Opik TypeScript SDK Design — clarifies which subsystems are Python-only.
- Opik Project README — overall platform positioning of tracing, evaluation, and optimization.
Source: https://github.com/comet-ml/opik / Human Manual
Frontend Application (React/TypeScript)
Related topics: System Architecture and Repository Layout, Tracing, Spans, and Framework Integrations, Evaluation, Datasets, Experiments, and LLM-as-Judge, Agent and Prompt Optimization (O...
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture and Repository Layout, Tracing, Spans, and Framework Integrations, Evaluation, Datasets, Experiments, and LLM-as-Judge, Agent and Prompt Optimization (Opik Optimizer)
Frontend Application (React/TypeScript)
Overview and Purpose
The Opik frontend is a single-page React/TypeScript application that serves as the primary user interface for the open-source LLM observability platform. It lives in apps/opik-frontend and is consumed alongside the Python and TypeScript SDKs, the backend services, and the documentation site. Source: apps/opik-frontend/package.json.
The frontend is responsible for rendering trace explorers, experiment comparisons, prompt playgrounds, evaluation dashboards, and project management screens. It calls the Opik REST API directly and depends on the same domain model exposed by the TypeScript SDK. Source: sdks/typescript/design/README.md.
Community issues confirm that this UI is the surface users interact with daily. For example, bug #420 ("Cannot pass project_name to track decorator when using opik.evaluation.evaluate") lists "Opik UI" as one of the affected components, and feature request #949 asks for an authentication layer to be added "in front of the application" for the open-source build — both touch the frontend directly.
Tech Stack and Build Tooling
The frontend is a Vite-powered React 18 application written in TypeScript, styled with Tailwind CSS and SCSS modules. Key runtime and tooling dependencies declared in apps/opik-frontend/package.json include:
| Concern | Library |
|---|---|
| UI primitives / layout | react, react-dom, react-grid-layout, react-resizable-panels |
| Forms | react-hook-form |
| Data fetching / state | use-query-params, use-local-storage-state, react-intersection-observer |
| Charts | recharts |
| Markdown / sanitization | react-markdown, remark-gfm, rehype-sanitize, sanitize-html |
| Media playback | react-player, react-h5-audio-player, react-pdf |
| Linting / build | eslint, eslint-plugin-react, stylelint, vite, tsup (in SDKs), dependency-cruiser |
The same file declares React 18 types (@types/react ^18.3.3, @types/react-dom ^18.3.0) and TypeScript-aware ESLint plugins. Lint-staged with Prettier is wired in for pre-commit formatting. Source: apps/opik-frontend/package.json.
v1 / v2 Module Structure
A noteworthy architectural detail is the presence of a v2 source tree alongside the legacy v1 code. The thin wrapper file apps/opik-frontend/src/v2/lib/utils.ts re-exports documentation URL builders from the v1 helpers, demonstrating that the v2 layer composes on top of v1 utilities rather than duplicating them:
import {
buildDocsUrl as buildDocsUrlBase,
buildDocsMarkdownUrl as buildDocsMarkdownUrlBase,
} from "@/lib/utils";
export const buildDocsUrl = (path: string = "", hash: string = "") =>
buildDocsUrlBase(path, hash);
export const buildDocsMarkdownUrl = (path: string = "") =>
buildDocsMarkdownUrlBase(path);
This incremental migration pattern lets new screens (likely the rewritten project/evaluation views mentioned in the Opik roadmap) reuse battle-tested helpers while introducing a cleaner module boundary. Source: apps/opik-frontend/src/v2/lib/utils.ts.
flowchart LR
UI[Opik Frontend (React/TS)] -- "REST + OpenAPI" --> API[Opik Backend]
UI -- "Vercel AI SDK telemetry" --> Exporter[opik-vercel Exporter]
UI -- "Documentation deep-links" --> Docs[Docs site]
Exporter -- "OTLP/HTTP" --> APIIntegration with the TypeScript SDK Family
Although the frontend does not import the SDK packages directly, the entire observability surface it visualizes is produced by the TypeScript SDK and its integrations. The core SDK (sdks/typescript/package.json) declares peer dependencies on zod and ai (Vercel AI SDK v6), and pulls in @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, and @ai-sdk/google-vertex for typed model calls.
Integration packages live under sdks/typescript/src/opik/integrations/:
opik-openai— wraps the officialopenaiSDK (peeropenai ^6.0.1). Source: sdks/typescript/src/opik/integrations/opik-openai/package.json.opik-langchain— peer@langchain/core ^0.3.78 || ^1.0.0. Source: sdks/typescript/src/opik/integrations/opik-langchain/package.json.opik-gemini— peer@google/genai >=1.0.0. Source: sdks/typescript/src/opik/integrations/opik-gemini/package.json.opik-vercel— ships anOpikExporterfor@opentelemetry/sdk-node, accepting tags, metadata, andthreadIdper trace. Source: sdks/typescript/src/opik/integrations/opik-vercel/README.md.opik-otel— generic OpenTelemetry span exporter. Source: sdks/typescript/src/opik/integrations/opik-otel/package.json.
The internal design documentation (sdks/typescript/design/README.md) frames these around three patterns: Proxy, Callback, and Exporter, and groups tracing, evaluation, and integrations as first-class domains.
End-to-End Testing
UI regressions are guarded by the Playwright suite under tests_end_to_end/typescript-tests. The README (tests_end_to_end/typescript-tests/README.md) describes an "agentic" workflow where a planner produces a markdown spec, a generator writes a Playwright spec that imports page objects and fixtures, and a healer retries failed runs:
import { test, expect } from '../../fixtures/projects.fixture';
import { ProjectsPage } from '../../page-objects/projects.page';
test.describe('Feature Name @fullregression @feature', () => {
test('should perform main user flow', async ({ page, projectName }) => {
const projectsPage = new ProjectsPage(page);
await projectsPage.goto();
await projectsPage.clickProject(projectName);
await expect(page.locator('[data-testid="result"]')).toBeVisible();
});
});
This convention — fixtures per resource, page objects per screen, data-testid locators — is what the UI must conform to in order to remain testable. Source: tests_end_to_end/typescript-tests/README.md.
Common Gaps and Community Considerations
Several top community threads map directly to frontend work that is still open:
- Authentication — Issue #949 requests a login layer for the OSS build, similar to Langfuse; the frontend would need a route guard and token storage. Source: community context, issue #949.
- Workflow-tool integrations — Issue #1587 asks for n8n support; the docs template matrix (apps/opik-documentation/documentation/templates/README.md) already defines Code / OpenAI-compatible / LiteLLM / OpenTelemetry templates, but no n8n template yet.
- Bug parity between SDK and UI — Issue #420 shows how decorator behavior (
project_name) must stay in sync between the Python SDK and the project switcher shown in the UI.
See Also
- Backend service and REST API surface
- Python SDK tracing and evaluation
- TypeScript SDK design (API and Data Flow, Tracing, Integrations, Evaluation)
Source: https://github.com/comet-ml/opik / Human Manual
Deployment, Self-Hosting, and Operations
Related topics: What is Opik? Platform Overview and Key Capabilities, System Architecture and Repository Layout, Authentication, Authorization, and Workspaces
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: What is Opik? Platform Overview and Key Capabilities, System Architecture and Repository Layout, Authentication, Authorization, and Workspaces
Deployment, Self-Hosting, and Operations
Overview
Opik is shipped as a multi-application platform comprising a Java backend, a React/TypeScript frontend, and SDK clients for Python and TypeScript. Self-hosters run the backend and frontend together — typically behind Nginx — while client SDKs are pointed at the deployed endpoint via configuration. The repository exposes tooling that orchestrates the local stack, per-service container assets, and runtime configuration through environment variables.
Repository Topology for Deployment
The polyglot monorepo places each deployable unit under a top-level apps/ or sdks/ directory. The backend lives at apps/opik-backend/, the frontend SPA at apps/opik-frontend/, and the user-facing documentation site at apps/opik-documentation/. The Python SDK is rooted at sdks/python/, while the TypeScript SDK — including per-integration subpackages such as opik-gemini — lives at sdks/typescript/. Source: apps/opik-backend/README.md, apps/opik-frontend/README.md, apps/opik-documentation/README.md, sdks/python/README.md.
Local Development Runner
scripts/dev-runner.sh is the canonical entry point for spinning up a full local Opik stack. The repository documents it as a "Development environment runner script for local Opik development. This script manages Docker infrastructure, backend, and frontend services for development workflows." Source: scripts/README.md.
The default invocation performs a full restart with rebuild and must be run from the repository root:
./scripts/dev-runner.sh
# or explicitly
./scripts/dev-runner.sh --restart
The same README documents additional command modes — including a "Standard Mode" that runs the backend and frontend as local processes — useful for iterating on a single service without rebuilding the full Docker infrastructure. Source: scripts/README.md.
Runtime Configuration
Frontend Nginx Variables
The frontend is containerized and served behind Nginx. A patch-nginx.conf.sh script consumes environment variables to rewrite the Nginx configuration at container start, exposing the following knobs:
| Variable | Default | Description |
|---|---|---|
NGINX_PID | /run/nginx.pid | Path to the Nginx PID file |
NGINX_PORT | 8080 | Nginx listening port |
OTEL_COLLECTOR_HOST | otel-collector | Hostname of the OpenTelemetry collector |
OTEL_COLLECTOR_PORT | 4317 | Port of the OpenTelemetry collector |
OTEL_TRACES_EXPORTER | otlp | Exporter type for OpenTelemetry |
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT | http://${OTEL_COLLECTOR_HOST}:${OTEL_COLLECTOR_PORT} | Full endpoint URL for OTLP traces |
Source: apps/opik-frontend/README.md.
The defaults imply that a self-hosted Opik deployment is expected to ship with an OTLP-compatible collector reachable at otel-collector:4317, and that frontend telemetry is forwarded through standard OpenTelemetry environment variables rather than an in-process exporter. Frontend build and test tooling — Vite, Vitest, Tailwind, and the lint stack — is pinned in apps/opik-frontend/package.json. Source: apps/opik-frontend/package.json.
Client SDK Configuration
The Python SDK is the primary integration target for self-hosted deployments and is distributed via PyPI:
pip install opik
Source: sdks/python/README.md.
The TypeScript SDK ships a separate configure subpackage whose package.json declares dependencies on posthog-node, inquirer, zod, read-env, yargs, and dotenv. This tooling enables a CLI-driven initialization flow that helps operators point a Node.js SDK at a self-hosted instance without hard-coding endpoints. Source: sdks/typescript/src/opik/configure/package.json.
Operational Concerns and Community Gaps
Authentication
Authentication for the open-source distribution is a recurring community request. Issue #949 — "Authentication and Authorization" — explicitly compares the gap to Langfuse and is motivated by the desire for "a more secure approach" when exposing Opik publicly. Until upstream authentication lands, self-hosters who expose the application on an untrusted network should plan for an external reverse-proxy auth layer (OAuth proxy, mTLS, VPN, or IP allow-listing) in front of Nginx.
Workflow Integrations (n8n)
Issue #1587 requests n8n integration so that LLM workflow traces flow into Opik for evaluation. Until a native integration exists, operators can bridge n8n to Opik through the OpenTelemetry collector that the frontend already targets via OTEL_* environment variables, by exporting n8n spans to the same collector.
SDK Behavior Bugs Affecting Routing
A reported bug #420 documents that project_name cannot be passed through the @opik.track decorator when invoked inside opik.evaluation.evaluate (Opik 0.2.1). Operators planning multi-tenant trace routing on the Python SDK should track the upstream fix and validate project assignment end-to-end in evaluation pipelines.
Roadmap Visibility
The community roadmap thread (issue #535) links to the live Opik Roadmap document and surfaces deployment-adjacent milestones; self-hosters should monitor it for security, scalability, and integration-related changes. Source: issue context in the community conversation.
See Also
- Python SDK Overview
- TypeScript SDK Design
- Frontend Architecture
- Opik Roadmap (community)
- Contributing to Opik
Source: https://github.com/comet-ml/opik / Human Manual
Authentication, Authorization, and Workspaces
Related topics: System Architecture and Repository Layout, Frontend Application (React/TypeScript), Deployment, Self-Hosting, and Operations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture and Repository Layout, Frontend Application (React/TypeScript), Deployment, Self-Hosting, and Operations
Authentication, Authorization, and Workspaces
Overview
Opik organizes access control around three cooperating concerns: authentication (verifying who is calling), authorization (deciding what they may do), and workspaces (the tenancy boundary inside which traces, projects, datasets, and experiments are scoped). These concerns live in the backend service under the priv (private) API namespace, which is gated behind authenticated sessions in production deployments.
The community has actively asked for stronger out-of-the-box authentication in the open-source distribution (issue #949 — "Authentication and Authorization"), comparing the desired experience to Langfuse. As of release 2.0.73 (referenced in the latest release notes), the backend already ships the Java/Jersey resource classes listed above; the question for self-hosted operators is therefore less *whether* the hooks exist and more *how* to wire them up in front of the application.
The relationship between these subsystems is shown below.
flowchart LR
Client[Client / SDK] -->|HTTPS + API key or cookie| AuthRes[AuthenticationResource]
AuthRes --> AuthSvc[AuthService]
AuthSvc -->|resolves principal| Filter[Auth Filter / AuthModule]
Filter -->|enforces tenancy| WRes[WorkspacesResource]
Filter -->|checks role| WPerm[WorkspacePermissionsResource]
Filter -->|guards premium features| Toggle[ServiceTogglesResource]
WRes --> Workspace[(Workspace store)]
WPerm --> Perms[(Permission store)]Authentication Layer
`AuthenticationResource` and `AuthService`
The HTTP surface for identity sits in AuthenticationResource, a Jersey resource exposed under /v1/priv/auth. It delegates the actual identity work to AuthService, which centralizes credential validation, session issuance, and principal resolution.
Typical responsibilities wired through these two classes include:
- Verifying an incoming API key or username/password pair against the configured identity provider.
- Issuing a session token (cookie or bearer header) on success.
- Returning the resolved
UserandWorkspacecontext for downstream resources.
Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/AuthenticationResource.java
Source: apps/opik-backend/src/main/java/com/comet/opik/infrastructure/auth/AuthService.java
`AuthModule` Wiring
AuthModule is the Guice/DI binding module that registers AuthService and any associated filters, providers, and request-scoped bindings. It is the single integration point for swapping or extending the auth stack (for example, plugging in an OIDC provider, an SSO IdP, or a header-based trust for an internal reverse proxy).
In practice, operators who want a hardened open-source deployment — the gap called out in #949 — typically:
- Front the backend with an authenticating reverse proxy (e.g. oauth2-proxy, Authentik, Keycloak) that injects a trusted header.
- Extend
AuthModuleto bind a customAuthServicethat consumes that header and resolves the principal. - Reuse the existing
WorkspacePermissionsResourcefor per-workspace checks rather than re-implementing them.
Source: apps/opik-backend/src/main/java/com/comet/opik/infrastructure/auth/AuthModule.java
Workspaces and Permissions
Workspaces as the Tenancy Boundary
A *workspace* in Opik is the top-level container that owns projects, datasets, experiments, prompts, and traces. WorkspacesResource exposes CRUD-style endpoints for creating, listing, and configuring workspaces, and is the only path through which a workspace is materialized server-side.
Every authenticated request is associated with exactly one workspace context, propagated by AuthService. That context flows into the data layer so that traces logged from the Python or TypeScript SDK never bleed across tenants.
Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/WorkspacesResource.java
Per-Workspace Authorization
WorkspacePermissionsResource handles role-based checks inside a workspace — who may read traces, who may run evaluations, who may invite users, and so on. It pairs with AuthService so that authorization decisions use the same resolved principal as authentication.
This is also the layer that interacts with the frontend's v1/v2 routing in apps/opik-frontend/README.md, where the shared → v1/pages-shared → v1/pages import direction ensures authorization-aware components do not silently bypass checks.
Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/WorkspacePermissionsResource.java
Service Toggles and Feature Gating
Some capabilities — notably advanced guardrails, certain integrations, and parts of the OpenTelemetry-based proxy surface documented in apps/opik-guardrails-backend/README.md — are gated behind feature flags. ServiceTogglesResource exposes the current toggle state to the frontend so the UI can hide controls the workspace cannot use, and to other backend services that need to short-circuit calls when a feature is disabled.
This resource is also useful for staged rollouts: an operator can enable a feature for a single workspace without exposing it elsewhere, which is a common pattern when hardening auth itself.
Source: apps/opik-backend/src/main/java/com/comet/opik/api/resources/v1/priv/ServiceTogglesResource.java
Common Failure Modes and Configuration Notes
- No auth in front of the open-source build. Until issue #949 is fully resolved for self-hosted users, operators must put an authenticating reverse proxy in front of the backend. Skipping this step exposes every
privendpoint to the public network. - Project scoping in evaluation flows. A frequent bug class — see #420 — is that a
project_namepassed toopik.evaluation.evaluatemay not propagate through the same workspace context used by the@opik.trackdecorator. Always verify the resolved workspace after authentication when debugging cross-resource traces. - Missing cost metadata. When a model is missing from
model_prices_and_context_window.json(see #4507), the guardrails/auth services that depend on cost thresholds will silently skip enforcement. Treat feature toggles as load-bearing and verify them after changes.
See Also
Source: https://github.com/comet-ml/opik / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.
1. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/comet-ml/opik
2. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik
3. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/comet-ml/opik
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/comet-ml/opik
5. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/comet-ml/opik
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using opik with real data or production workflows.
- 2.0.73 - github / github_release
- 2.0.72 - github / github_release
- 2.0.71 - github / github_release
- 2.0.70 - github / github_release
- 2.0.69 - github / github_release
- 2.0.68 - github / github_release
- 2.0.67 - github / github_release
- 2.0.66 - github / github_release
- 2.0.65 - github / github_release
- 2.0.64 - github / github_release
- Capability evidence risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence