Doramagic Project Pack ยท Human Manual
langfuse
Langfuse serves as a centralized platform for capturing and analyzing interactions between AI models and end users. The project is MIT licensed and supports both cloud-hosted and self-host...
Project Introduction
Related topics: System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture
Project Introduction
Langfuse is an open-source observability and analytics platform designed for Large Language Model (LLM) applications. It provides comprehensive tracing, evaluation, and prompt management capabilities that enable developers to monitor, debug, and optimize their AI-powered applications.
Overview
Langfuse serves as a centralized platform for capturing and analyzing interactions between AI models and end users. The project is MIT licensed and supports both cloud-hosted and self-hosted deployment models, making it accessible for teams of various sizes and requirements.
Core Purpose
The platform addresses several critical needs in LLM application development:
- Observability: Track and visualize traces, observations, and model interactions in real-time
- Evaluation: Measure and analyze AI application performance through configurable scoring systems
- Prompt Management: Create, version, and manage prompts with support for complex dependency resolution
- Collaboration: Enable team collaboration through commenting and sharing features
- Analytics: Provide insights into AI application behavior through comprehensive analytics dashboards
High-Level Architecture
Langfuse follows a modern microservices-inspired architecture with clear separation between frontend, backend processing, and data storage components.
graph TD
subgraph Frontend["Frontend (Next.js/React)"]
UI[User Interface]
DesignSystem[Design System Components]
FeatureFlags[Feature Flags]
end
subgraph Backend["Backend Services"]
API[API Server]
MCP[MCP Server]
Worker[Worker/Queue Processing]
end
subgraph Storage["Data Layer"]
Postgres[(PostgreSQL)]
ClickHouse[(ClickHouse)]
Redis[(Redis)]
S3[(S3 Storage)]
end
UI --> API
MCP --> API
Worker --> API
API --> Postgres
API --> ClickHouse
API --> Redis
API --> S3Technology Stack
Langfuse is built using a modern technology stack optimized for performance and developer experience.
Frontend Stack
| Component | Technology | Purpose |
|---|---|---|
| Framework | Next.js | Server-side rendering and routing |
| UI Library | React | Component-based UI development |
| State Management | React Context + Hooks | Local and global state |
| Virtualization | TanStack Virtual | Efficient rendering of large lists |
| Styling | Tailwind CSS + cva | Utility-first styling with variant handling |
| Forms | Zod | Schema validation |
| Data Fetching | tRPC | Type-safe API communication |
Sources: web/src/components/design-system/README.md
Backend Stack
| Component | Technology | Purpose |
|---|---|---|
| Runtime | Node.js/TypeScript | Server-side logic |
| Database | PostgreSQL | Primary data storage |
| Analytics | ClickHouse | High-performance analytics queries |
| Cache | Redis | Caching and queue management |
| Queue | BullMQ | Background job processing |
| Storage | S3-compatible | File and event storage |
Key Frontend Components
The frontend architecture is organized around several key systems:
#### Design System
The design system (web/src/components/design-system/) provides reusable, primitive UI components following strict principles:
- Presentational only: No business logic in components
- Explicit, typed APIs: Strict TypeScript definitions
- No className/style props: Prevents style leakage
- cva for variants: Consistent variant handling
graph LR
A[Design System] --> B[Button]
A --> C[Input]
A --> D[Modal]
B --> E[Consistent Styling]
C --> E
D --> ESources: web/src/components/design-system/README.md
#### Layout System
All pages use a standardized Page wrapper component that ensures:
- Consistent layout structure
- Sticky header behavior
- Proper scroll management (
"content-scroll"or"page-scroll") - Breadcrumb navigation support
- Custom header actions
graph TD
Page[Page Component] --> Header[Sticky Header]
Page --> Content[Scrollable Content]
Header --> Breadcrumb[Breadcrumb Navigation]
Header --> Actions[Action Buttons]Sources: web/src/components/layouts/README.md
#### JSON Viewer Component
The AdvancedJsonViewer component provides efficient rendering of large JSON datasets:
- Virtualization: Uses TanStack Virtual for row-based rendering
- Iterative algorithms: Explicit stack-based iteration to prevent stack overflow
- Client-side search: In-memory matching with binary search navigation
- Theme support: Customizable JSON syntax highlighting
graph TD
Input[Large JSON Data] --> Parser[JSON Parser]
Parser --> TreeBuilder[Tree Builder]
TreeBuilder --> Virtualizer[TanStack Virtual]
Virtualizer --> Renderer[Row Renderer]
Search[Search Query] --> Matcher[In-Memory Matcher]
Matcher --> Navigator[Binary Search Navigator]Sources: web/src/components/ui/AdvancedJsonViewer/README.md
Core Features
Tracing and Observability
Langfuse provides comprehensive tracing capabilities that capture the full lifecycle of AI interactions:
- Traces: Complete request/response cycles
- Observations: Individual components within a trace (spans, events, generations)
- Metadata: Custom metadata attachment for context
- Tree Structure: Hierarchical representation of nested observations
The tree-building system uses iterative algorithms to handle millions of observations without stack overflow:
// Iterative traversal pattern
function traverse(rootNode: TreeNode) {
const stack = [rootNode];
while (stack.length > 0) {
const node = stack.pop()!;
process(node);
node.children.forEach((child) => stack.push(child));
}
}
Sources: web/src/components/trace/lib/tree-building.clienttest.ts
Prompt Management
Langfuse supports sophisticated prompt management with dependency resolution:
- Prompt Stacking: Compose prompts from multiple sources
- Dependency Tags: Reference other prompts using
@@@langfusePrompt:...@@@syntax - Resolution Modes:
getPromptResolved: Returns fully resolved prompt with dependencies inlinedgetPromptUnresolved: Returns raw prompt with tags preserved for analysis
graph LR
A[Prompt A] -->|references| B[Prompt B]
A -->|references| C[Prompt C]
B -->|references| D[Prompt D]
A -->|resolved| E[Final Prompt]Sources: web/src/features/mcp/README.md
Score Analytics
The scoring system enables quantitative evaluation of AI application performance:
- Multiple Score Types: Supports numeric, categorical, and boolean scores
- Time Series Analysis: Track score changes over configurable intervals
- Distribution Analysis: Visualize score distributions with bins and categories
- Comparison Mode: Compare two scores side-by-side
The analytics layer provides interpretive functions for common metrics:
| Metric | Interpretation | Threshold |
|---|---|---|
| Agreement (Cohen's Kappa) | Excellent | โฅ 0.9 |
| Agreement (Cohen's Kappa) | Good | โฅ 0.8 |
| Agreement (Cohen's Kappa) | Fair | โฅ 0.6 |
| Agreement (Cohen's Kappa) | Poor | โฅ 0.4 |
Sources: web/src/features/score-analytics/lib/statistics-utils.ts
Entitlements System
Access control is managed through a hierarchical entitlements system:
graph TD
Plan[Plan] -->|contains| Entitlements[Entitlements]
Plan -->|contains| Limits[Entitlement Limits]
Entitlements -->|grants| Features[Feature Access]
Limits -->|restricts| Resources[Resource Quotas]
PlanTypes[Plan Types] --> OSS[OSS]
PlanTypes --> CloudPro[Cloud Pro]
PlanTypes --> SelfHostedEnterprise[Self-Hosted Enterprise]Available entitlements include:
- Feature Flags: Enable/disable features via
useIsFeatureEnabledhook - Entitlement Limits: Quotas on resources (e.g., annotation queue count)
- Plan-based Access: Cloud and self-hosted enterprise plans
Sources: web/src/features/entitlements/README.md
Feature Flags
Feature flags control feature availability dynamically:
const isFeatureEnabled = useIsFeatureEnabled("feature-flag-name");
A feature flag is enabled when:
- Flag is in user's
feature_flagslist LANGFUSE_ENABLE_EXPERIMENTAL_FEATURESenvironment variable is set- User has admin privileges
Sources: web/src/features/feature-flags/README.md
Collaboration Features
Langfuse includes team collaboration capabilities:
- Mention Parser: Extract and resolve user mentions in comments
- User References: Syntax
@Display Namefor linking users - Sanitization: Clean user-generated content for safe display
// Mention format: @[Alice](user:alice123)
// Parser extracts: alice123
Sources: web/src/features/comments/lib/mentionParser.clienttest.ts
Data Flow Architecture
Ingestion Pipeline
Events flow through the system in a structured pipeline:
graph LR
S3[S3 Event Storage] --> Worker[Worker Processing]
Worker -->|Standard| IngestionQueue[IngestionSecondaryQueue]
Worker -->|OTEL| OtelQueue[OtelIngestionQueue]
IngestionQueue --> Postgres[(PostgreSQL)]
OtelQueue --> ClickHouse[(ClickHouse)]Event processing includes:
- Checkpointing: Resume from failures using
.checkpointfiles - Rate Limiting: Client-side and server-side throttling
- Retry Logic: Exponential backoff with jitter for transient failures
- Error Logging: Failed events appended to
errors.csv
Sources: worker/src/scripts/replayIngestionEventsV2/README.md
API Architecture
Langfuse uses tRPC for type-safe API communication:
- Server-side validation: Zod schemas for input validation
- tRPC routers: Organized endpoint handlers
- API versioning: V1 (legacy) and V2 (current) API support
| API Version | GET Support | Notes |
|---|---|---|
| V1 | traceId required | Legacy, trace-focused |
| V2 | traceId optional | Adds sessionId support |
Sources: packages/shared/src/features/scores/interfaces/README.md
MCP Server Architecture
The Model Context Protocol (MCP) server provides external access to Langfuse:
- Stateless per-request: Fresh server instance for each request
- Context via closures: Authentication captured in handler closures
- No session storage: Request-disposable architecture
graph TD
Request[MCP Request] --> Instance[New Server Instance]
Instance --> Auth[Auth Context Closure]
Auth --> Handler[Request Handler]
Handler --> Response[Response]
Response --> Discard[Instance Discarded]Sources: web/src/features/mcp/README.md
Filtering System
Langfuse implements a sophisticated filtering system with type-safe encoding:
Filter Types
| Type | Description | Example |
|---|---|---|
string | Simple string matching | Trace name |
number | Numeric comparison | Latency values |
datetime | Date/time filtering | Time ranges |
boolean | True/false matching | Flag states |
arrayOptions | Multi-value selection | Tags |
categoryOptions | Categorical filtering | Status values |
positionInTrace | Nested location | Span hierarchy |
State Management
Filters support multiple storage locations:
- URL: Persisted in query parameters
- Session Storage: In-memory per session
- Peek Context: Temporary state for preview panels
const filterOptions: UseSidebarFilterStateOptions = {
stateLocation: "urlAndSessionStorage",
sessionFilterContextId: projectId,
implicitDefaultConfig: DEFAULT_SIDEBAR_IMPLICIT_ENVIRONMENT_CONFIG,
};
Sources: web/src/components/table/peek/README.md
Deployment Models
Langfuse supports multiple deployment configurations:
| Model | Plan Options | Authentication | License |
|---|---|---|---|
| Cloud | cloud:pro | JWT via NextAuth | Proprietary |
| Self-Hosted | self-hosted:enterprise | JWT + License Key | Proprietary |
| Open Source | oss | Basic auth | MIT |
Self-Hosted Configuration
Self-hosted deployments require:
- PostgreSQL database
- ClickHouse for analytics
- Redis for caching/queues
- S3-compatible storage for events
- License key for enterprise features
Sources: web/src/features/entitlements/README.md
Performance Considerations
Large Dataset Handling
Langfuse handles large datasets through several mechanisms:
| Scale | Mechanism | Threshold |
|---|---|---|
| 10k+ rows | TanStack Virtual | Row-based rendering |
| 1M+ nodes | Iterative algorithms | No stack overflow |
| 10k+ search matches | Binary search | Efficient navigation |
Known Limitations
- No horizontal virtualization for wide rows
- Client-side search only (can be slow with many matches)
- Memory constraints at 1M+ nodes
- Read-only JSON viewer (no inline editing)
- Wrap mode may cause layout thrashing
Sources: web/src/components/ui/AdvancedJsonViewer/README.md
Development Guidelines
Testing
The project uses Jest with custom extensions:
| File Pattern | Purpose | Location |
|---|---|---|
.clienttest.ts | Client-side tests | Colocated with components |
*.test.ts | Standard unit tests | __tests__ or inline |
# Run client tests
pnpm --filter=web run test-client --testPathPattern="ComponentName"
Debugging
Enable debug logging via localStorage:
localStorage.setItem("debug:ComponentName", "true");
Summary
Langfuse is a comprehensive observability platform that bridges the gap between AI application development and operational monitoring. Its modular architecture, built on proven technologies like PostgreSQL, ClickHouse, and React, provides a scalable foundation for teams to understand, evaluate, and optimize their LLM applications.
The platform's emphasis on type safety, performance optimization, and developer experience makes it suitable for both small development teams and large-scale enterprise deployments.
Sources: [web/src/components/design-system/README.md](https://github.com/langfuse/langfuse/blob/main/web/src/components/design-system/README.md)
Project Structure
Related topics: Monorepo Configuration, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Monorepo Configuration, System Architecture
Project Structure
Overview
Langfuse is a comprehensive observability and analytics platform for LLM applications, structured as a monorepo using pnpm workspaces and Turborepo for build orchestration. The repository contains multiple packages including the frontend web application, backend worker services, and shared libraries.
Monorepo Architecture
Workspace Configuration
Langfuse uses pnpm workspaces defined in pnpm-workspace.yaml to manage multiple packages within a single repository.
packages:
- "packages/*"
- "web"
- "worker"
Sources: pnpm-workspace.yaml
Build System (Turborepo)
The project uses Turborepo for efficient incremental builds and task caching. The turbo configuration defines the build pipeline and dependencies between packages.
Sources: turbo.json
Package Structure
1. Web Application (`/web`)
The frontend application built with Next.js and React, containing the user interface and client-side logic.
| Directory | Purpose |
|---|---|
src/components/ | Reusable UI components including layouts, tables, and specialized viewers |
src/features/ | Feature-specific modules with their own logic and components |
src/lib/ | Utility functions and helpers |
src/hooks/ | Custom React hooks |
Sources: web/package.json
2. Worker Service (`/worker`)
Backend service handling background processing, event ingestion, and queue management.
| Directory | Purpose |
|---|---|
src/scripts/ | Utility scripts for data operations and migrations |
src/queues/ | Queue handlers for async processing |
Sources: worker/package.json
3. Shared Packages (`/packages/shared`)
Common utilities, types, and validation schemas shared across web and worker packages.
| Module | Purpose |
|---|---|
| Validation | Zod schemas for type-safe data validation |
| Types | Shared TypeScript type definitions |
| Utilities | Common helper functions |
Sources: packages/shared/package.json
Web Application Structure
Component Architecture
graph TD
A[Web Application] --> B[Layout Components]
A --> C[UI Components]
A --> D[Feature Modules]
B --> B1[Page Wrapper]
B --> B2[ContainerPage]
B --> B3[Breadcrumb]
C --> C1[AdvancedJsonViewer]
C --> C2[Table Components]
C --> C3[Peek Components]
D --> D1[MCP]
D --> D2[Score Analytics]
D --> D3[Comments]
D --> D4[Filters]
D --> D5[Entitlements]Layout System
The Page component is the standard wrapper for all pages, providing:
- Sticky Header: Consistent header across pages
- Scroll Management: Supports
"content-scroll"and"page-scroll"modes - Breadcrumb Navigation: Easy navigation path display
- Custom Header Actions: Flexible button/link placement
For content that doesn't scale well with page width (e.g., settings pages), use ContainerPage instead.
Sources: web/src/components/layouts/README.md
Key Feature Modules
#### AdvancedJsonViewer
A virtualized JSON tree viewer with the following characteristics:
- Performance: Uses TanStack Virtual for rendering large JSON structures
- Search: Client-side search with regex support
- Theme Support: Multiple color themes (GitHub, Monokai, Solarized)
- Tree Navigation: Binary search for efficient node access
Sources: web/src/components/ui/AdvancedJsonViewer/README.md
#### Score Analytics
Provides analytics dashboard capabilities for score data:
- Score Comparison: Compare two scores over time
- Distribution Analysis: Histogram and heatmap visualizations
- Time Series: Temporal trend analysis with configurable intervals
- Data Transformation: Pure functions for data processing
Sources: web/src/features/score-analytics/README.md
#### MCP (Model Context Protocol)
Enables integration with external systems through MCP:
- Stateless Architecture: Fresh server instance per request
- Prompt Management: Support for
getPromptandgetPromptUnresolved - Resource Handling: MCP resources and tool support
Sources: web/src/features/mcp/README.md
#### Entitlements
Feature availability control system:
- Plan-based Access:
oss,cloud:pro,self-hosted:enterprise - Entitlement Limits: Resource quotas per plan
- Server/Client Support: Available in both frontend hooks and backend
Sources: web/src/features/entitlements/README.md
Table and Peek System
#### PeekTableStateProvider
Manages table state for peek views (slide-over panels showing item details):
graph LR
A[PeekTableStateProvider] --> B[Filters]
A --> C[Sorting]
A --> D[Pagination]
A --> E[Search]State Persistence: Filter, sort, and pagination state persists across K/J navigation between items of the same type.
State Reset: State clears when the peek view closes (via X button, Escape, or click outside).
Sources: web/src/components/table/peek/README.md
Worker Service Structure
Scripts
The worker contains utility scripts for data operations:
| Script | Purpose |
|---|---|
replayIngestionEventsV2 | Replay events from CSV to ingestion queues |
refillQueueEvent | Backfill queue with events from local files |
#### Replay Ingestion Events V2
Replays S3-stored events back to Langfuse:
- Batch Processing: Processes events in configurable batches
- Checkpoint Support: Resume capability via checkpoint files
- Rate Limiting: Respects server-side rate limits with exponential backoff
- Error Handling: Retries transient failures, logs permanent failures
Sources: worker/src/scripts/replayIngestionEventsV2/README.md
#### Refill Queue Event
Backfills queues with events from local JSONL files:
- Create
./worker/events.jsonlwith one JSON event per line - Configure Redis connection and supporting services
- Run via
pnpm run --filter=worker refill-queue-event
Sources: worker/src/scripts/refillQueueEvent/README.md
Public API Architecture
Adding New API Routes
The project follows a structured pattern for public API development:
- Implementation: Wrap routes with
withMiddlewareandcreateAuthedAPIRoute - Type Definition: Add Zod types to
/features/public-api/typesusingcoercefor primitives - Validation: Use
validateZodSchemafor response validation - Documentation: Add to Fern with
docsattributes - SDK Updates: Copy types to Python and JS SDKs
// Response type example
const responseSchema = z.object({
data: z.string(),
timestamp: z.coerce.date(),
}).strict();
Sources: web/src/features/README.md
Testing Infrastructure
Client-Side Testing
Tests use .clienttest.ts extension and are colocated with components:
pnpm --filter=web run test-client --testPathPattern="ComponentName"
Example: AdvancedJsonViewer tests cover tree building, navigation, expansion, and search operations.
Sources: web/src/components/ui/AdvancedJsonViewer/README.md
Trace Tree Building Tests
Performance tests for large observation sets:
| Scale | Threshold | Structure Types |
|---|---|---|
| 10k | 500ms | flat, deep, balanced, realistic |
| 25k | 2s | flat, realistic |
| 50k | 5s | flat, realistic |
| 500k | 60s | realistic |
Sources: web/src/components/trace/lib/tree-building.clienttest.ts
Development Workflow
Component Development Guidelines
- Use Page Wrapper: Always wrap pages with
<Page>component - Use ContainerPage: For settings/setup pages with non-scalable content
- Follow Naming: Use
.clienttest.tsfor client-side tests - State Management: Use
useSidebarFilterStatefor filters,useOrderByStatefor sorting
Prompt Composition
For MCP prompt features:
- Resolved Prompt: Use
getPromptfor executable prompts with dependencies resolved - Unresolved Prompt: Use
getPromptUnresolvedfor debugging and analysis
Sources: web/src/features/mcp/README.md
Summary
The Langfuse project is organized as a well-structured monorepo with clear separation between the frontend web application, backend worker services, and shared packages. The architecture emphasizes:
- Modularity: Feature-based organization with isolated modules
- Performance: Virtualization and incremental builds
- Type Safety: Zod schemas and TypeScript throughout
- Observability: Built-in tracing and analytics capabilities
Sources: [pnpm-workspace.yaml](https://github.com/langfuse/langfuse/blob/main/pnpm-workspace.yaml)
Quickstart Guide
Related topics: Project Introduction
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Introduction
Quickstart Guide
Langfuse is an open-source LLM engineering platform that provides observability, analytics, and prompt management for LLM applications. This guide walks you through setting up a local development environment, understanding the core architecture, and deploying Langfuse for production use.
Overview
Langfuse supports multiple deployment scenarios:
| Deployment Type | Use Case | Key Components |
|---|---|---|
| Local Development | Testing and development | Docker Compose with all services |
| Self-Hosted | Production deployment | Docker/Kubernetes with externalized services |
| Cloud | Managed SaaS offering | Langfuse-hosted infrastructure |
Sources: README.md:1-20
Architecture Overview
Langfuse consists of three main components:
graph TD
A[Web UI] --> B[Server API]
B --> C[(PostgreSQL)]
B --> D[(ClickHouse)]
B --> E[(Redis)]
F[Workers] --> C
F --> D
F --> E
G[S3 Storage] --> FCore Components
| Component | Purpose | Technology |
|---|---|---|
web/ | Frontend UI application | Next.js, React, TanStack |
server/ | API server and business logic | Node.js, tRPC, Prisma |
worker/ | Background job processing | BullMQ, Redis |
clickhouse/ | Analytics storage | ClickHouse |
Sources: docker-compose.yml:1-50
Local Development Setup
Prerequisites
- Node.js 20+ (using pnpm as package manager)
- Docker and Docker Compose
- Git
1. Clone the Repository
git clone https://github.com/langfuse/langfuse.git
cd langfuse
2. Environment Configuration
Create a .env file in the root directory with the following required variables:
# Database
DATABASE_URL=postgresql://langfuse:langfuse@localhost:5432/langfuse
# ClickHouse
CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_USER=langfuse
CLICKHOUSE_PASSWORD=langfuse
# Redis
REDIS_URL=redis://localhost:6379
# Auth (NextAuth.js)
NEXTAUTH_SECRET=your-secret-key
NEXTAUTH_URL=http://localhost:3000
# S3 Storage (MinIO for local dev)
S3_ACCESS_KEY=langfuse
S3_SECRET_KEY=langfuse
S3_REGION=us-east-1
S3_ENDPOINT_URL=http://localhost:9000
S3_EVENT_UPLOAD_BUCKET=langfuse
Sources: docker-compose.yml:50-120
3. Start Infrastructure Services
Launch all supporting services using Docker Compose:
docker compose up -d
This starts the following services:
| Service | Port | Purpose |
|---|---|---|
| postgres | 5432 | Primary database |
| clickhouse | 8123 | Analytics storage |
| redis | 6379 | Job queue broker |
| minio | 9000/9001 | S3-compatible storage |
Sources: docker-compose.yml:120-180
4. Install Dependencies
pnpm install
5. Run Database Migrations
pnpm db:migrate
6. Start Development Servers
Langfuse uses a monorepo structure with multiple development servers:
# Start all services in development mode
pnpm run dev
# Or start individual services
pnpm --filter=server run dev # API server
pnpm --filter=web run dev # Web UI
pnpm --filter=worker run dev # Background workers
Sources: CONTRIBUTING.md:50-100
Project Structure
langfuse/
โโโ web/ # Next.js frontend
โ โโโ src/
โ โ โโโ components/ # Reusable UI components
โ โ โโโ features/ # Feature modules
โ โ โโโ pages/ # Next.js pages
โ โ โโโ lib/ # Utilities
โ โโโ public/ # Static assets
โโโ server/ # API server
โ โโโ src/
โ โ โโโ api/ # tRPC routers
โ โ โโโ services/ # Business logic
โ โ โโโ lib/ # Utilities
โโโ worker/ # Background workers
โ โโโ src/
โ โโโ workers/ # Queue processors
โ โโโ scripts/ # Utility scripts
โโโ clickhouse/ # ClickHouse migrations
โโโ docker-compose.yml # Local infrastructure
Page Component Pattern
The Page component is the standard wrapper for all pages in the application:
import Page from "@/src/components/layouts/Page";
export default function MyPage() {
return (
<Page
title="My Page"
scrollable
headerProps={{
breadcrumb: [{ name: "Home", href: "/" }, { name: "My Page" }],
}}
>
<div>Content here...</div>
</Page>
);
}
Important: Every page must be wrapped inside <Page>โdo not use <main> directly.
Sources: web/src/components/layouts/README.md:1-40
Development Workflow
Running Tests
Langfuse uses different test patterns for client and server code:
# Run all tests
pnpm run test
# Client-side tests (Vitest with .clienttest.ts extension)
pnpm --filter=web run test-client --testPathPattern="ComponentName"
# Server-side tests (Jest)
pnpm --filter=server run test
#### Client Test Pattern
Client tests use stack-based iteration to avoid stack overflow:
// โ
Safe for deep trees - iterative approach
function traverse(rootNode: TreeNode) {
const stack = [rootNode];
while (stack.length > 0) {
const node = stack.pop()!;
process(node);
node.children.forEach((child) => stack.push(child));
}
}
Sources: web/src/components/ui/AdvancedJsonViewer/README.md:80-100
Debug Mode
Enable detailed logging for specific components:
localStorage.setItem("debug:AdvancedJsonViewer", "true");
Code Quality
# Lint code
pnpm run lint
# Format code
pnpm run format
# Type check
pnpm run typecheck
Sources: CONTRIBUTING.md:100-150
Feature Modules
Langfuse organizes functionality into feature modules under web/src/features/:
| Module | Purpose |
|---|---|
comments/ | User comments and mentions |
entitlements/ | Feature access control |
feature-flags/ | Feature toggle system |
filters/ | Query filtering and search |
mcp/ | Model Context Protocol integration |
score-analytics/ | Score analytics and visualization |
slack/ | Slack integration |
migrations/ | Database migrations |
Feature Flags
Enable experimental features using the useIsFeatureEnabled hook:
const isEnabled = useIsFeatureEnabled("feature-flag-name");
A feature is enabled when:
- The flag is in
user.feature_flags LANGFUSE_ENABLE_EXPERIMENTAL_FEATURESis set- The user has admin privileges
Sources: web/src/features/feature-flags/README.md:1-15
Entitlements System
Feature availability is controlled through entitlements:
- Plans: Tiers of features (
oss,cloud:pro,self-hosted:enterprise) - Entitlements: Available features per plan (e.g.,
playground) - EntitlementLimits: Resource limits (e.g.,
annotation-queue-count)
Sources: web/src/features/entitlements/README.md:1-25
Worker Scripts
The worker module includes utility scripts for data operations:
Refill Queue Event
Backfill any queue with events from local machines:
# 1. Create events file (./worker/events.jsonl)
{"projectId": "project-123", "orgId": "org-456"}
# 2. Configure environment
REDIS_CONNECTION_STRING=redis://:[email protected]:6379
CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_USER=clickhouse
CLICKHOUSE_PASSWORD=clickhouse
# 3. Run the script
pnpm run --filter=worker refill-queue-event
Sources: worker/src/scripts/refillQueueEvent/README.md:1-40
Replay Ingestion Events V2
Re-process S3-stored ingestion events:
npx tsx worker/src/scripts/replayIngestionEventsV2/index.ts \
--input=/path/to/events.csv \
--batch-size=500 \
--concurrency=4
| Parameter | Default | Description |
|---|---|---|
--input | - | Path to CSV file (required) |
--batch-size | 500 | Keys per API request |
--concurrency | 4 | Parallel API requests |
--rate-limit | 50 | Requests per second |
--dry-run | false | Validate without sending |
--resume | false | Continue from checkpoint |
Sources: worker/src/scripts/replayIngestionEventsV2/README.md:1-50
Slack Integration Setup
For local Slack OAuth development, HTTPS is required:
1. Generate SSL Certificates
# Install mkcert
brew install mkcert
mkcert -install
mkcert localhost 127.0.0.1
# Move certificates to web directory
mv localhost+1*.pem web/
2. Configure Environment
SLACK_CLIENT_ID=your_client_id
SLACK_CLIENT_SECRET=your_client_secret
SLACK_STATE_SECRET=your_state_secret
3. Start HTTPS Server
pnpm run dev:https
Sources: web/src/features/slack/README.md:1-50
Production Deployment
Docker Compose Production Mode
For production, use externalized services:
services:
web:
image: langfuse/langfuse-web:latest
environment:
- DATABASE_URL=${DATABASE_URL}
- CLICKHOUSE_URL=${CLICKHOUSE_URL}
- REDIS_URL=${REDIS_URL}
- NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
- S3_ACCESS_KEY=${S3_ACCESS_KEY}
- S3_SECRET_KEY=${S3_SECRET_KEY}
ports:
- "3000:3000"
server:
image: langfuse/langfuse-server:latest
# Configuration similar to web
worker:
image: langfuse/langfuse-worker:latest
depends_on:
- redis
- postgres
- clickhouse
S3 Event Storage
Configure S3 bucket for event storage:
-- Example: Create external table for S3 access logs
CREATE EXTERNAL TABLE s3_access_logs (
bucketowner STRING,
bucket_name STRING,
requestdatetime STRING,
remoteip STRING,
requester STRING,
requestid STRING,
operation STRING,
key STRING,
uri STRING,
statuscode INT,
errorcode STRING,
bytessent BIGINT,
objectsize BIGINT,
totaltime STRING,
turnaroundtime STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ...'
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
LOCATION 's3://your-bucket/logs/'
Sources: docker-compose.yml:180-220
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Stack overflow in tree operations | Use iterative algorithms with explicit stacks |
| Large dataset performance | Enable virtualization (TanStack Virtual) |
| Horizontal scroll performance | Avoid wrap mode for wide datasets |
| Multiple tables in peek view | Share pagination state intentionally |
Sources: web/src/components/ui/AdvancedJsonViewer/README.md:40-60
Next Steps
After setting up your development environment:
- Explore the UI - Navigate through traces, observations, and evaluations
- Integrate SDK - Connect your LLM application using Langfuse Python/JS SDK
- Configure Features - Set up feature flags and entitlements for your organization
- Deploy - Move to production using Docker Compose or Kubernetes
Sources: CONTRIBUTING.md:1-50
Sources: [README.md:1-20]()
System Architecture
Related topics: Project Structure, Database Schema (Prisma), Queue System (Redis/BullMQ)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Structure, Database Schema (Prisma), Queue System (Redis/BullMQ)
System Architecture
Langfuse is a comprehensive observability and analytics platform designed for Large Language Model (LLM) applications. The system architecture is built on a modern, modular design that separates concerns across frontend, backend worker services, and shared infrastructure layers.
Overview
Langfuse follows a distributed architecture pattern with the following primary components:
| Layer | Technology Stack | Purpose |
|---|---|---|
| Frontend | Next.js, React, TypeScript | User interface and visualization |
| Backend Worker | Node.js, BullMQ, TypeScript | Event processing and queue management |
| Shared Packages | TypeScript | Common utilities, types, and infrastructure clients |
| Database | PostgreSQL | Primary data storage |
| Cache/Queue | Redis | Queue management and caching |
| Analytics | ClickHouse | High-performance analytics queries |
| Observability | OpenTelemetry | Distributed tracing |
High-Level Architecture
graph TD
subgraph Client["Frontend (Next.js)"]
UI[User Interface]
Pages[Page Components]
DesignSystem[Design System]
end
subgraph Shared["Shared Packages"]
DB[(PostgreSQL)]
Redis[(Redis)]
ClickHouse[(ClickHouse)]
Otel[OpenTelemetry]
end
subgraph Backend["Worker Service"]
Queues[Queue Workers]
Scripts[Utility Scripts]
end
Client <-->|tRPC API| Shared
Backend <-->|Event Processing| Shared
Client -->|Ingestion| BackendInfrastructure Layer
Database Connection
The PostgreSQL database is the central data store for Langfuse, managed through a shared database client module. The connection is centralized in the packages/shared/src/db.ts module, which provides a unified interface for all database operations across the application.
Sources: packages/shared/src/db.ts:1-50
Redis Client
Redis serves dual purposes in the Langfuse architecture:
- Queue Management: BullMQ queues for asynchronous event processing
- Caching: Session and temporary data caching
The Redis client is configured in packages/shared/src/server/redis/redis.ts and is shared across worker services.
Sources: packages/shared/src/server/redis/redis.ts:1-30
ClickHouse Integration
ClickHouse provides the analytical query engine for high-performance aggregations. The client is initialized in packages/shared/src/server/clickhouse/client.ts and is primarily used for:
- Score analytics aggregations
- Time-series data analysis
- Large dataset transformations
Sources: packages/shared/src/server/clickhouse/client.ts:1-40
OpenTelemetry
The OpenTelemetry integration (packages/shared/src/server/otel/index.ts) provides distributed tracing across all services. This enables:
- Request tracing across frontend and backend
- Event processing workflow visibility
- Performance monitoring
Sources: packages/shared/src/server/otel/index.ts:1-60
Frontend Architecture
Page Structure
All frontend pages use a standardized layout system defined in web/src/components/layouts/. The Page component is the required wrapper for all application pages, ensuring consistent layout behavior.
Key layout patterns:
| Pattern | Component | Use Case |
|---|---|---|
| Standard Pages | Page | Most application pages |
| Wide Content | ContainerPage | Settings, setup pages with wide content |
The page wrapper provides:
- Sticky header management
- Scroll behavior control (
content-scrollorpage-scroll) - Breadcrumb navigation
- Custom header actions
Sources: web/src/components/layouts/README.md:1-60
Design System
The design system (web/src/components/design-system/) provides primitive, reusable UI components following strict architectural principles:
Principles:
- Presentational only (no business logic)
- Explicit, strictly typed APIs
- Props over context (no React Context)
Component Structure:
design-system/
Button/
Button.tsx
Button.stories.tsx
Styling Rules:
- No arbitrary CSS values
- Explicit enums for variants (
size: "sm" | "md" | "lg") - CVA (Class Variance Authority) for variant management
- Boolean props use positive naming (
isLoading,shouldTruncate)
Sources: web/src/components/design-system/README.md:1-80
State Management Patterns
Langfuse uses a sophisticated state management approach with the Peek Table State system:
graph TD
A[Page Load] --> B{Peek Context?}
B -->|Yes| C[PeekTableStateProvider]
B -->|No| D[URL/Session State]
C --> E[Table State Preserved]
D --> F[Standard State]
E --> G[K/J Navigation]
G --> H[State Retained โ]The PeekTableStateProvider maintains table state (filters, sorting, pagination) across K/J keyboard navigation between items of the same type. State resets only when the peek view closes.
Sources: web/src/components/table/peek/README.md:1-100
Feature Modules
Entitlements System
The entitlements feature controls feature availability at the organization level:
| Concept | Definition |
|---|---|
| Plan | Feature tier (OSS, cloud:pro, self-hosted:enterprise) |
| Entitlement | Available feature (e.g., playground, score analytics) |
| EntitlementLimit | Resource limits (e.g., annotation-queue-count) |
Plan Resolution:
- Cloud: Added to organization via JWT from NextAuth
- Self-hosted: From license key or environment configuration
Sources: web/src/features/entitlements/README.md:1-50
Score Analytics
The score analytics feature provides comprehensive statistical analysis of evaluation scores:
Architecture Components:
| Component | Location | Responsibility |
|---|---|---|
| Provider | ScoreAnalyticsProvider.tsx | Context management |
| Hook | useScoreAnalyticsQuery | Data fetching and transformation |
| Transformers | scoreAnalyticsTransformers.ts | Data transformation pipeline |
| Router | scoreAnalyticsRouter.ts | tRPC API endpoint |
Data Flow:
graph LR
A[API Request] --> B[tRPC Router]
B --> C[ClickHouse Query]
C --> D[Transformers]
D --> E[ScoreAnalyticsProvider]
E --> F[Chart Components]Sources: packages/shared/src/features/scores/interfaces/README.md:1-80 Sources: web/src/features/score-analytics/README.md:1-100
Score Interfaces Architecture
Langfuse maintains a versioned API structure for scores:
interfaces/
โโโ api/
โ โโโ v1/ # Legacy API (trace-focused)
โ โโโ v2/ # Current API (supports traces, sessions)
โ โโโ shared.ts
โโโ application/
โโโ ingestion/
โโโ ui/
API Versioning Strategy:
- POST/DELETE APIs: Support all score types across v1 and v2
- GET APIs:
- V1: Requires
traceId, trace-level only - V2:
traceIdoptional, addssessionIdsupport
Sources: packages/shared/src/features/scores/interfaces/README.md:1-50
Backend Worker Architecture
Event Processing
The worker service processes ingestion events asynchronously using BullMQ queues:
Queue Types:
| Queue | Purpose | Consumer |
|---|---|---|
IngestionSecondaryQueue | Standard event processing | Worker |
OtelIngestionQueue | OpenTelemetry events | Worker |
Event Flow:
graph LR
A[Ingestion API] --> B{Event Type?}
B -->|Standard| C[S3 Key Parse]
B -->|OTEL| D[OTEL Key Parse]
C --> E[Queue Payload]
D --> E
E --> F[BullMQ]
F --> G[Worker Processing]
G --> H[(ClickHouse)]
G --> I[(PostgreSQL)]Sources: worker/src/scripts/replayIngestionEventsV2/README.md:1-60
Replay Ingestion Events V2
The replayIngestionEventsV2 script enables replaying historical events from S3 storage:
Key Features:
- Batch processing with configurable size
- Checkpoint/resume capability
- Rate limiting with exponential backoff
- Error handling with detailed logging
Differences from V1:
| Aspect | V1 | V2 |
|---|---|---|
| Infrastructure | Redis, ClickHouse, PostgreSQL, S3 | Langfuse host URL only |
| Setup | Full repo clone + .env | npx tsx + env vars |
| Event Delivery | BullMQ addBulk to Redis | HTTP POST to admin API |
| Resume | Manual | Built-in checkpoint |
Event Transformation:
- Standard keys:
{projectId}/{type}/{eventBodyId}/{eventId}.json - OTEL keys:
otel/{projectId}/{yyyy}/{mm}/{dd}/{hh}/{mm}/{eventId}.json
Sources: worker/src/scripts/replayIngestionEventsV2/README.md:1-120
Refill Queue Event
The refillQueueEvent utility script backfills queues with events from local files:
Usage Pattern:
pnpm run --filter=worker refill-queue-event
Requirements:
./worker/events.jsonlfile with JSON events- Redis connection via
REDIS_CONNECTION_STRING - Supporting services: S3, ClickHouse
Sources: worker/src/scripts/refillQueueEvent/README.md:1-60
MCP Server Architecture
Langfuse includes an MCP (Model Context Protocol) server for programmatic prompt management:
Stateless Design:
- Fresh server instance per request
- Authentication context captured in handler closures
- Server discarded after request completes
- No state between requests
Available Tools:
| Tool | Purpose |
|---|---|
getPrompt | Fetch resolved prompt with dependencies |
getPromptUnresolved | Fetch raw prompt without resolution |
listPrompts | List prompts with filtering |
createTextPrompt | Create text prompt version |
createChatPrompt | Create chat prompt version |
updatePromptLabels | Manage prompt labels |
Prompt Resolution:
- Resolved: Recursively replaces
@@@langfusePrompt:...@@@tags - Unresolved: Returns raw content with tags intact
Sources: web/src/features/mcp/README.md:1-100
Component Communication Flow
graph TD
subgraph Pages["Page Layer"]
P[Page Component]
PC[PeekTableStateProvider]
end
subgraph Features["Feature Layer"]
FP[Feature Provider]
FH[Feature Hook]
FT[Feature Transformers]
end
subgraph Services["Service Layer"]
TRPC[tRPC Router]
API[API Routes]
end
subgraph Data["Data Layer"]
Repo[Repositories]
DB[(PostgreSQL)]
CH[(ClickHouse)]
Redis[(Redis)]
end
P --> PC
PC --> FP
FP --> FH
FH --> FT
FT --> TRPC
TRPC --> Repo
Repo --> DB
Repo --> CH
Repo --> RedisConfiguration Management
Langfuse uses environment-based configuration across layers:
| Environment | Scope | Examples |
|---|---|---|
REDIS_CONNECTION_STRING | Worker, Queue | Redis URL |
CLICKHOUSE_URL | Analytics | ClickHouse connection |
LANGFUSE_S3_EVENT_UPLOAD_BUCKET | Storage | S3 bucket name |
ADMIN_API_KEY | Admin API | Authentication |
Security Architecture
Key Security Components:
- JWT Authentication: Organization and user context embedded in JWT tokens
- API Key Validation: Admin API uses dedicated key authentication
- Scope-based Authorization: Project-level access control
- Plan Entitlements: Feature availability based on subscription tier
Self-hosted Considerations:
- License key validation for enterprise features
- Environment-based plan override capability
Performance Optimizations
Frontend Optimizations:
- Memoized transformations in hooks
- Virtualized table rendering (TanStack Virtual)
- Iterative algorithms (no recursion, preventing stack overflow)
Backend Optimizations:
- Batch processing for event replay
- Checkpoint/resume for long-running operations
- Client-side rate limiting with exponential backoff
Summary
The Langfuse architecture demonstrates a well-structured approach to observability platforms:
- Separation of Concerns: Clear boundaries between UI, business logic, and data layers
- Scalability: Asynchronous processing via queues enables horizontal scaling
- Extensibility: Feature modules and MCP server support programmatic access
- Observability: Built-in OpenTelemetry integration for distributed tracing
- Performance: ClickHouse for analytics, iterative algorithms, and batch processing
Sources: [packages/shared/src/db.ts:1-50]()
Monorepo Configuration
Related topics: Project Structure, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Project Structure, System Architecture
Monorepo Configuration
Langfuse uses a monorepo architecture managed with Turborepo, pnpm workspaces, and shared configuration packages. This setup enables efficient builds, consistent code quality standards, and streamlined dependency management across the project's multiple packages.
Architecture Overview
The Langfuse repository is organized as a pnpm workspace monorepo with the following core structure:
langfuse/
โโโ web/ # Next.js frontend application
โโโ worker/ # Background job processing
โโโ packages/
โ โโโ shared/ # Shared utilities and types
โ โโโ config-eslint/ # Shared ESLint configuration
โ โโโ config-typescript/ # Shared TypeScript configurations
โโโ turbo.json # Turborepo pipeline definition
โโโ pnpm-workspace.yaml # Workspace package definitions
Turborepo Pipeline Configuration
The turbo.json file defines the build pipeline and task orchestration across packages.
Core Pipeline Tasks
| Task | Description | Cache Strategy |
|---|---|---|
build | Compiles TypeScript and bundles assets | Enabled |
dev | Starts development servers | Local only |
test | Runs unit and integration tests | Enabled |
lint | ESLint code quality checks | Enabled |
typecheck | TypeScript type validation | Enabled |
Task Dependencies
Turborepo automatically resolves dependencies between packages. For example:
graph TD
A[packages/shared] -->|build| B[Type definitions]
B --> C[packages/config-*]
C --> D[web]
C --> E[worker]
D --> F[Build output]
E --> FShared ESLint Configuration
The packages/config-eslint/index.js provides standardized ESLint rules across all packages.
Configuration Features
- React/Next.js support for the web application
- TypeScript-aware linting via
@typescript-eslint - Import ordering rules for consistent module organization
- JSX accessibility checks
Usage
Packages extend the shared configuration:
// In package's .eslintrc.js
module.exports = {
extends: ['@langfuse/config-eslint'],
// Package-specific overrides
rules: {
// Custom rules
}
};
Sources: packages/config-eslint/index.js
Shared TypeScript Configuration
The packages/config-typescript/ directory contains base TypeScript configurations.
Base Configuration (`base.json`)
{
"compilerOptions": {
"target": "ES2020",
"module": "ESNext",
"moduleResolution": "bundler",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true
}
}
Package-Specific Configurations
Individual packages extend the base configuration:
// packages/shared/tsconfig.json
{
"extends": "@langfuse/config-typescript/base.json",
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./src"
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
Sources: packages/config-typescript/base.json, packages/shared/tsconfig.json
Package Scripts and Commands
Each package defines its own scripts in package.json. The web package demonstrates the typical pattern:
Build Commands
| Command | Purpose |
|---|---|
pnpm build | Production build with INLINE_RUNTIME_CHUNK=false |
pnpm build:check | Build without emitting (type-checking) |
pnpm dev | Development server on localhost:3000 |
pnpm dev:http | HTTPS development server for local testing |
Quality Assurance Commands
| Command | Purpose |
|---|---|
pnpm lint | ESLint with caching enabled |
pnpm lint:fix | Auto-fix linting issues |
pnpm typecheck | TypeScript validation with incremental compilation |
pnpm test | Vitest with server and in-source test projects |
Sources: web/package.json
Development Workflow
Starting Development
# Install dependencies
pnpm install
# Start all dev servers based on turbo.json
pnpm dev
# Or start a specific package
cd web && pnpm dev
Running Tests
# All tests
pnpm test
# Client-side tests only (e.g., AdvancedJsonViewer)
pnpm --filter=web run test-client --testPathPattern="AdvancedJsonViewer"
# Server-side tests
pnpm --filter=web run test --project server
Build Pipeline
graph LR
A[pnpm build] --> B[turbo build]
B --> C{Cache hit?}
C -->|Yes| D[Use cached output]
C -->|No| E[Build dependencies]
E --> F[packages/shared]
F --> G[packages/config-*]
G --> H[web/worker]
H --> I[Save to cache]
I --> J[Build artifacts]Environment Configuration
The project uses .env files for environment-specific configuration:
| File | Purpose |
|---|---|
.env | Default environment variables |
.env.local | Local overrides (git-ignored) |
.env.test | Test environment variables |
Scripts use dotenv to load these files:
dotenv -e ../.env -- next build
dotenv -e ../.env.test -e ../.env -- vitest run
Sources: worker/src/scripts/replayIngestionEventsV2/README.md
Best Practices
Adding a New Package
- Create the package under
packages/orweb/directories - Extend the shared TypeScript and ESLint configurations
- Add the package to
pnpm-workspace.yamlif needed - Define tasks in
turbo.jsonif custom pipeline is required - Add appropriate scripts to
package.json
Caching Strategy
- Build caching: Enabled by default via Turborepo
- Lint caching: ESLint caches to
.next/cache/eslint/ - TypeScript incremental: Uses
.tsbuildinfofiles
CI/CD Integration
In CI environments, clear caches to ensure fresh builds:
# Clear turbo cache
rm -rf .turbo node_modules/.cache
# Clear ESLint cache
rm -rf .next/cache/eslint/
# Fresh build
pnpm install --frozen-lockfile
pnpm build
Key Configuration Files Summary
| File | Purpose |
|---|---|
turbo.json | Task pipeline and dependency graph |
pnpm-workspace.yaml | Workspace package definitions |
packages/config-eslint/index.js | Shared ESLint rules |
packages/config-typescript/base.json | Shared TypeScript base config |
.eslintrc.js (per package) | Package-specific lint overrides |
tsconfig.json (per package) | Package-specific TypeScript config |
Sources: [packages/config-eslint/index.js]()
Database Schema (Prisma)
Related topics: System Architecture, ClickHouse Analytics Layer
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, ClickHouse Analytics Layer
Database Schema (Prisma)
Overview
Langfuse uses Prisma ORM to manage its PostgreSQL database schema. The schema defines the core data models for the application, including organizations, projects, users, traces, observations, and scores. Prisma serves as the primary interface between the application's business logic and the relational database.
The Prisma schema is located at packages/shared/prisma/schema.prisma and is shared across multiple packages in the monorepo structure. This centralized schema approach ensures consistency in data modeling across the web application, worker services, and shared libraries.
Design Philosophy
The schema follows several key principles:
- Normalized relationships: Related entities are linked through foreign keys with proper cascading behaviors
- Soft deletes: Key entities support soft deletion for data recovery and audit purposes
- Audit fields: Most tables include
createdAt,updatedAt, andcreatedByfields - Multi-tenancy: The schema supports multi-tenant architecture with organization and project isolation
- Extensible metadata: JSON fields allow flexible storage of custom attributes
Core Data Models
Organization and User Models
The foundation of Langfuse's multi-tenant architecture begins with the Organization model, which represents the top-level tenant entity. Each organization can have multiple users with different roles and permission levels.
The User model stores authentication and profile information, linked to organizations through the Membership junction table. This many-to-many relationship enables users to belong to multiple organizations with potentially different roles in each.
erDiagram
Organization ||--o{ User : "contains"
Organization ||--o{ Project : "contains"
User ||--o{ Membership : "has"
Membership }o--|| Organization : "belongs to"
Membership }o--|| User : "belongs to"Project Model
Projects serve as the primary container for observability data. Each project belongs to exactly one organization and contains all traces, observations, and scores related to a specific application or use case.
| Field | Type | Description |
|---|---|---|
id | String | UUID primary key |
name | String | Project display name |
organizationId | String | Foreign key to organization |
createdAt | DateTime | Creation timestamp |
updatedAt | DateTime | Last modification timestamp |
deletedAt | DateTime? | Soft delete timestamp |
settings | Json | Project-specific configuration |
Sources: packages/shared/prisma/schema.prisma
Traces
Traces represent the top-level unit of observability in Langfuse. A trace encapsulates a complete interaction or request, typically corresponding to a single LLM call or a multi-step workflow.
Trace Model Schema
model Trace {
id String @id @default(cuid())
name String?
project Project @relation(fields: [projectId], references: [id])
projectId String
user String?
metadata Json?
sessionId String?
release String?
version String?
tags String[]
// Timestamps
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
// Soft delete
deletedAt DateTime?
// Relations
observations Observation[]
scores Score[]
@@index([projectId])
@@index([sessionId])
@@index([createdAt])
}
Sources: packages/shared/prisma/schema.prisma
Key Fields
| Field | Type | Description |
|---|---|---|
id | String | Unique identifier using CUID algorithm |
name | String? | Optional human-readable trace name |
projectId | String | Reference to parent project |
user | String? | Identifier for the end user |
sessionId | String? | Groups related traces into sessions |
release | String? | Application release version |
version | String? | Trace format version |
tags | String[] | Array of string tags for categorization |
Repository Pattern
Traces are accessed through the repository pattern defined in packages/shared/src/server/repositories/traces.ts. This abstraction provides a clean interface for CRUD operations while encapsulating query logic.
Sources: packages/shared/src/server/repositories/traces.ts
// Repository interface pattern (simplified)
interface ITraceRepository {
create(data: CreateTraceInput): Promise<Trace>;
getById(id: string, projectId: string): Promise<Trace | null>;
list(projectId: string, options?: ListTracesOptions): Promise<Trace[]>;
update(id: string, data: UpdateTraceInput): Promise<Trace>;
softDelete(id: string): Promise<void>;
}
Observations
Observations represent the individual components within a trace, such as LLM calls, retrievals, or custom events. They form a hierarchical structure that can be nested to represent complex workflows.
Observation Model Schema
model Observation {
id String @id @default(cuid())
// Type discrimination
type ObservationType
// Relations
trace Trace @relation(fields: [traceId], references: [id])
traceId String
parent Observation? @relation("ObservationHierarchy", fields: [parentId], references: [id])
parentId String?
children Observation[] @relation("ObservationHierarchy")
// Project reference for efficient querying
projectId String
// Core data
name String?
startTime DateTime
endTime DateTime?
status String?
metadata Json?
// LLM-specific fields
model String?
modelId String?
provider String?
promptTokens Int?
completionTokens Int?
totalTokens Int?
unitPrice Float?
currency String?
calculatedUnitCost Float?
// Retrieval-specific fields
input Json?
output Json?
// Timestamps
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
// Soft delete
deletedAt DateTime?
@@index([traceId])
@@index([projectId])
@@index([startTime])
}
Sources: packages/shared/prisma/schema.prisma Sources: packages/shared/src/server/repositories/observations.ts
Observation Types
Langfuse supports several observation types through an enum:
| Type | Description |
|---|---|
CHAT | Chat completion calls |
GENERATION | Text generation calls |
RETRIEVAL | Retrieval augmented generation steps |
EVENT | Custom events and markers |
TOOL | Tool/function calls |
Hierarchical Structure
Observations support nested hierarchies through self-referential relationships. This enables representing complex multi-step workflows where parent observations contain child observations representing sub-tasks or parallel operations.
graph TD
A[Trace] --> B[Observation: Chat]
B --> C[Observation: Retrieval]
B --> D[Observation: Generation]
C --> E[Observation: Event: Cache Hit]
D --> F[Observation: Tool: Calculator]
D --> G[Observation: Tool: Search]Sources: packages/shared/prisma/schema.prisma
Scores
Scores provide a mechanism for evaluating trace and observation quality. They can be human-generated or automated evaluations attached to specific traces or observations.
Score Model Schema
model Score {
id String @id @default(cuid())
// Target discrimination
traceId String?
observationId String?
// Project reference
projectId String
// Score data
name String
value Float
dataType ScoreDataType
comment String?
// Source tracking
source String?
// Author
authorId String?
// Timestamps
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
// Relations
trace Trace? @relation(fields: [traceId], references: [id])
observation Observation? @relation(fields: [observationId], references: [id])
@@index([projectId])
@@index([traceId])
@@index([observationId])
@@index([name, createdAt])
}
Sources: packages/shared/prisma/schema.prisma Sources: packages/shared/src/server/repositories/scores.ts
Score Data Types
The ScoreDataType enum defines the type of value stored:
| Data Type | Description |
|---|---|
NUMERIC | Continuous numerical value |
CATEGORICAL | Categorical label or classification |
BOOLEAN | True/false indicator |
Score Interfaces Architecture
Scores in Langfuse follow a layered interface architecture that separates concerns across different parts of the system:
graph LR
A[UI Types] --> B[Application Validation]
B --> C[Ingestion Validation]
C --> D[API v1 Schemas]
C --> E[API v2 Schemas]
D --> F[Database Models]
E --> FSources: packages/shared/src/features/scores/interfaces/README.md
Indexing Strategy
The schema defines strategic indexes to optimize common query patterns:
| Table | Indexes | Purpose |
|---|---|---|
Trace | projectId, sessionId, createdAt | Fast project filtering and time-based queries |
Observation | traceId, projectId, startTime | Trace traversal and time-series queries |
Score | projectId, traceId, observationId, name, createdAt | Score lookups and time-series analytics |
The composite index on Score(name, createdAt) specifically supports the score analytics feature's need to retrieve scores by name over time intervals.
Sources: packages/shared/prisma/schema.prisma
Prisma Client Usage
Prisma Client is generated from the schema and used throughout the application. The generated client provides type-safe access to all database operations.
Client Configuration
import { PrismaClient } from "@langfuse/shared/prisma";
const prisma = new PrismaClient({
log: process.env.NODE_ENV === "development" ? ["query", "error"] : ["error"],
});
Transaction Support
The schema supports atomic operations through Prisma's transaction API:
await prisma.$transaction([
prisma.trace.create({ data: traceData }),
prisma.observation.createMany({ data: observations }),
prisma.score.createMany({ data: scores }),
]);
Migrations
Database migrations are managed through Prisma Migrate. Migration files are stored in packages/shared/prisma/migrations/ and version-controlled alongside the schema.
Running Migrations
# Apply pending migrations
pnpm --filter=langfuse-prisma migrate deploy
# Create a new migration
pnpm --filter=langfuse-prisma migrate dev --name add_new_field
Related Components
Repository Layer
The repository pattern abstracts database access behind domain-specific interfaces:
| Repository | File | Purpose |
|---|---|---|
TraceRepository | repositories/traces.ts | Trace CRUD and querying |
ObservationRepository | repositories/observations.ts | Observation management |
ScoreRepository | repositories/scores.ts | Score operations |
Sources: packages/shared/src/server/repositories/traces.ts Sources: packages/shared/src/server/repositories/scores.ts
ClickHouse Integration
While PostgreSQL (via Prisma) stores transactional data like traces and scores, Langfuse also uses ClickHouse for analytics workloads. The Prisma schema defines PostgreSQL models for the primary application data, while ClickHouse handles high-volume analytical queries.
Sources: packages/shared/scripts/seeder/utils/README.md
Summary
The Prisma schema forms the backbone of Langfuse's data layer, defining:
- Multi-tenant structure: Organizations, projects, and user memberships
- Observability core: Traces and observations with hierarchical support
- Evaluation framework: Scores with multiple data types and sources
- Operational metadata: Timestamps, soft deletes, and JSON fields for flexibility
The schema design prioritizes query performance through strategic indexing, data integrity through proper relationships, and extensibility through JSON metadata fields.
Sources: [packages/shared/prisma/schema.prisma]()
ClickHouse Analytics Layer
Related topics: Database Schema (Prisma), Worker Service
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Database Schema (Prisma), Worker Service
ClickHouse Analytics Layer
Overview
The ClickHouse Analytics Layer is a core infrastructure component in Langfuse that provides high-performance analytical capabilities for processing and querying large-scale observability data. ClickHouse serves as the primary OLAP (Online Analytical Processing) database for storing traces, observations, and score analytics with optimized columnar storage and efficient aggregation queries.
Langfuse leverages ClickHouse for:
Sources: packages/shared/scripts/seeder/utils/README.md
- High-throughput event ingestion during trace collection
- Complex analytical queries for score comparisons and distributions
- Time-series analysis with efficient aggregation
- Large dataset sampling and optimization strategies
Architecture Overview
graph TD
subgraph Ingestion["Ingestion Layer"]
W[Worker Service] --> CW[ClickhouseWriter]
CW --> CH[ClickHouse Cluster]
end
subgraph Storage["Storage Layer"]
CH --> TS[Traces Table]
CH --> OS[Observations Table]
CH --> SS[Scores Table]
end
subgraph Query["Query Layer"]
CR[ClickHouse Repository] --> CH
SA[Score Analytics] --> CR
WEB[Web Frontend] --> SA
end
subgraph Optimization["Optimization Layer"]
CR --> HASH[cityHash64 Sampling]
CR --> FINAL[Adaptive FINAL]
CR --> INTERVAL[Time Interval Alignment]
endComponents Overview
| Component | Location | Purpose |
|---|---|---|
| ClickhouseWriter | worker/src/services/ClickhouseWriter/index.ts | Writes ingestion events to ClickHouse |
| ClickHouse Repository | packages/shared/src/server/repositories/clickhouse.ts | Provides query interface and optimization |
| Score Analytics | packages/shared/src/server/repositories/score-analytics.ts | Specialized analytics queries |
| Schema Definitions | packages/shared/src/server/clickhouse/schema.ts | TypeScript types for ClickHouse data |
| Migrations | packages/shared/clickhouse/migrations/clustered/ | Database schema migrations |
Sources: worker/src/services/ClickhouseWriter/index.ts Sources: packages/shared/src/server/repositories/clickhouse.ts
Data Schema
Traces Table
The traces table stores the fundamental trace records with hierarchical observation data. The clustered migration defines the primary schema with optimized column types for analytical queries.
Key columns include:
| Column | Type | Description |
|---|---|---|
| id | UUID | Unique trace identifier |
| project_id | String | Project association |
| timestamp | DateTime64 | Event timestamp with millisecond precision |
| name | String | Trace name |
| user_id | String | User identifier |
| metadata | JSON | Flexible metadata storage |
| tags | Array(String) | Tag-based categorization |
| input | Text | Input data |
| output | Text | Output data |
Sources: packages/shared/clickhouse/migrations/clustered/0001_traces.up.sql
Observations Table
Observations represent individual events within a trace, storing:
- Model inputs/outputs
- Function calls
- Embeddings
- Generation events
Each observation is linked to its parent trace via trace_id and supports nested hierarchies through parent_observation_id.
Sources: packages/shared/src/server/clickhouse/schema.ts
Scores Table
Scores store evaluation metrics associated with traces and observations:
| Column | Type | Purpose |
|---|---|---|
| trace_id | UUID | Associated trace |
| observation_id | UUID | Optional observation link |
| name | String | Score identifier |
| value | Float64 | Numeric score value |
| data_type | Enum | NUMERIC, BOOLEAN, or CATEGORICAL |
| source | String | Score origin (e.g., "framework-trace") |
Sources: packages/shared/src/server/repositories/score-analytics.ts
Ingestion Pipeline
Event Flow
sequenceDiagram
participant API as Ingestion API
participant Queue as Redis Queue
participant Worker as Worker Service
participant Writer as ClickhouseWriter
participant CH as ClickHouse
API->>Queue: Enqueue OtelIngestionEvent
Worker->>Queue: Dequeue Event
Worker->>Writer: Process Event
Writer->>CH: Insert Batch (ClickHouseQueryBuilder)
CH-->>Writer: Confirmation
Writer->>Worker: AcknowledgeClickhouseWriter Service
The ClickhouseWriter handles the actual data insertion into ClickHouse:
// Simplified flow from worker/src/services/ClickhouseWriter/index.ts
class ClickhouseWriter {
async writeBatch(events: IngestionEvent[]): Promise<void> {
const queryBuilder = new ClickHouseQueryBuilder();
for (const event of events) {
queryBuilder.addEvent(event);
}
await this.executeQuery(queryBuilder.build());
}
}
Key responsibilities:
- Batch Processing: Aggregates multiple events for efficient insertion
- Schema Validation: Ensures events match expected schema
- Query Building: Uses
ClickHouseQueryBuilderfor optimized INSERT queries - Error Recovery: Handles failed insertions with retry logic
Sources: worker/src/services/ClickhouseWriter/index.ts
ClickHouseQueryBuilder
The ClickHouseQueryBuilder class constructs optimized ClickHouse SQL queries with:
- Proper escaping for special characters
- Type-aware value formatting
- Batch insert optimization
- Efficient column mapping
Sources: packages/shared/scripts/seeder/utils/README.md
Query Layer
Repository Pattern
The clickhouse.ts repository provides a clean interface for all ClickHouse operations:
// packages/shared/src/server/repositories/clickhouse.ts
class ClickHouseRepository {
// Query execution with automatic connection management
async query<T>(sql: string, params?: QueryParams): Promise<T[]>
// Stream processing for large result sets
async streamQuery(sql: string, handler: (row: T) => void): Promise<void>
// Batch inserts with transaction support
async insertBatch(table: string, rows: Record<string, unknown>[]): Promise<void>
}
Sources: packages/shared/src/server/repositories/clickhouse.ts
Score Analytics Queries
The score analytics module provides specialized queries for evaluating model performance:
// packages/shared/src/server/repositories/score-analytics.ts
interface ScoreAnalyticsQuery {
getScoreIdentifiers(projectId: string): Promise<ScoreIdentifier[]>;
estimateScoreComparisonSize(
projectId: string,
score1Id: string,
score2Id?: string
): Promise<QueryEstimate>;
getScoreComparisonAnalytics(
params: ScoreAnalyticsParams
): Promise<ScoreAnalyticsResult>;
}
#### Query Estimation
Before executing expensive analytics queries, the system estimates query size:
| Metric | Description |
|---|---|
| scoreCount | Total number of scores matching criteria |
| matchedCount | Estimated rows that will match |
| willSample | Whether hash-based sampling is needed |
| estimatedQueryTime | Predicted query duration |
This estimation enables adaptive query optimization based on dataset size.
Sources: packages/shared/src/server/repositories/score-analytics.ts Sources: web/src/features/score-analytics/README.md
Optimization Strategies
Hash-Based Sampling
For large datasets (>100,000 matches), Langfuse uses cityHash64 for consistent sampling:
SELECT * FROM scores
WHERE cityHash64(trace_id) < 0.1 -- 10% sample
Benefits:
- Consistent sampling across query executions
- Reproducible results for the same query parameters
- Reduced query load while maintaining statistical validity
Adaptive FINAL Optimization
ClickHouse's FINAL modifier ensures up-to-date data but adds significant overhead. Langfuse uses adaptive application:
| Dataset Size | FINAL Applied |
|---|---|
| < 70,000 scores | Yes |
| > 70,000 scores | No |
Sources: web/src/features/score-analytics/README.md
Time Interval Alignment
Time series queries use proper interval alignment for accurate aggregation:
// ISO 8601 weeks
const weekInterval = "1W";
// Calendar months
const monthInterval = "1MONTH";
Proper alignment ensures:
- Consistent bucket boundaries
- Accurate period-over-period comparisons
- Correct aggregation across daylight saving time transitions
Seeding and Testing
The seeder utility (packages/shared/scripts/seeder/) generates realistic test data for ClickHouse:
Data Types
| Type | Environment | Purpose |
|---|---|---|
| Experiment Traces | langfuse-prompt-experiment | Realistic traces from actual datasets |
| Evaluation Data | langfuse-evaluation | Metrics and scoring for evaluations |
| Synthetic Data | default | Large-scale hierarchical test data |
ID Patterns
- Experiment:
trace-dataset-{datasetName}-{itemIndex}-{projectId}-{runNumber} - Evaluation:
trace-eval-{index}-{projectId} - Synthetic:
trace-synthetic-{index}-{projectId}
Sources: packages/shared/scripts/seeder/utils/README.md
DataGenerator
The DataGenerator class creates realistic data for all three types:
| Method | Output |
|---|---|
generateDatasetTrace() | Traces linked to dataset items |
generateSyntheticTraces() | Hierarchical traces with scores |
generateEvaluationTraces() | Evaluation-focused traces |
Sources: packages/shared/scripts/seeder/utils/README.md
Framework Traces
Framework traces are real traces produced through official Langfuse framework instrumentation. They can be added to the system for UI testing and demo purposes.
Adding New Framework Traces
``bash npx ts-node merge-observations.ts trace-file.json observations.json trace-merged.json ``
- Generate a trace using framework instrumentation
- Download from UI using the download button
- Convert to JSON format via "Log View (Beta)"
- Merge observations using the provided script:
- Save the merged file with date-based naming
Discovery
Framework traces use the ID pattern framework-frameworkName-traceId. Filter by:
source: "framework-trace"in trace table- "All Time" date range (timestamps not rewritten)
Sources: packages/shared/scripts/seeder/utils/framework-traces/README.md
TypeScript Integration
Schema Types
The schema.ts file provides TypeScript type definitions:
// packages/shared/src/server/clickhouse/schema.ts
interface ClickHouseTrace {
id: string;
project_id: string;
timestamp: Date;
name: string;
user_id?: string;
metadata?: Record<string, unknown>;
tags?: string[];
input?: string;
output?: string;
session_id?: string;
}
interface ClickHouseObservation {
id: string;
trace_id: string;
parent_observation_id?: string;
type: ObservationType;
timestamp: Date;
name?: string;
// ... additional fields
}
These types ensure compile-time safety when interacting with ClickHouse data.
Sources: packages/shared/src/server/clickhouse/schema.ts
Configuration
Required Environment Variables
| Variable | Description | Example |
|---|---|---|
CLICKHOUSE_URL | ClickHouse server URL | http://localhost:8123 |
CLICKHOUSE_USER | Database user | clickhouse |
CLICKHOUSE_PASSWORD | User password | clickhouse |
CLICKHOUSE_DATABASE | Target database | default |
Cluster Configuration
Migrations support clustered deployments:
# Clustered migration path
packages/shared/clickhouse/migrations/clustered/
The clustered migrations ensure schema consistency across all nodes in a ClickHouse cluster.
Sources: packages/shared/clickhouse/migrations/clustered/0001_traces.up.sql
Best Practices
Query Optimization
- Use projections for frequently accessed columns
- Leverageskipping indexes for high-cardinality columns
- Batch inserts to reduce overhead
- Filter early to minimize data processed
Data Management
- Partition by date for efficient time-range queries
- Use TTL policies for automatic data expiration
- Compress data using ClickHouse's native compression
Integration Guidelines
- Always use the repository pattern for query abstraction
- Implement query estimation before expensive operations
- Use hash-based sampling for large analytical queries
- Consider adaptive FINAL optimization for query performance
See Also
Sources: [packages/shared/scripts/seeder/utils/README.md]()
Queue System (Redis/BullMQ)
Related topics: System Architecture, Worker Service
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Worker Service
Queue System (Redis/BullMQ)
Langfuse employs a distributed queue system built on Redis for storage and BullMQ for job orchestration. This architecture enables asynchronous processing of high-volume operations including event ingestion, evaluation execution, batch actions, and webhook delivery.
Architecture Overview
The queue system follows a producer-consumer pattern where the web application enqueues jobs and worker processes consume them asynchronously.
graph TD
subgraph "Langfuse Web Application"
A[API Request] --> B[Queue Client]
B --> C[Redis Queue]
end
subgraph "Langfuse Worker"
C --> D[Worker Manager]
D --> E1[Ingestion Worker]
D --> E2[Eval Worker]
D --> E3[Batch Worker]
D --> E4[Webhook Worker]
end
subgraph "Redis"
C --> F[(Redis Cluster)]
end
subgraph "External Services"
E1 --> G[(ClickHouse)]
E1 --> H[(PostgreSQL)]
E2 --> H
E3 --> H
E4 --> I[(External APIs)]
endQueue Types
Langfuse defines multiple specialized queues for different workloads:
| Queue Name | Purpose | Processing Type | Priority |
|---|---|---|---|
ingestion | Event ingestion and processing | Async batch | Medium |
evalExecution | LLM evaluation execution | Async | Medium |
batchAction | Bulk operations on data | Async batch | Low |
webhook | Outbound webhook delivery | Async | High |
OtelIngestion | OpenTelemetry event ingestion | Async | Medium |
IngestionSecondary | Secondary ingestion processing | Async | Medium |
Sources: worker/src/queues/workerManager.ts
Queue Configuration
Redis Connection
All queues rely on a Redis connection string configured via environment variables:
REDIS_CONNECTION_STRING=redis://:[email protected]:6379
Queue Initialization
Each queue is initialized with specific BullMQ configuration:
const myQueue = new Queue<T>(queueName, {
connection: {
host: redisConfig.host,
port: redisConfig.port,
password: redisConfig.password,
},
defaultJobOptions: {
attempts: 3,
backoff: {
type: "exponential",
delay: 1000,
},
removeOnComplete: true,
removeOnFail: false,
},
});
Sources: packages/shared/src/server/redis/ingestionQueue.ts
Queue Implementations
Ingestion Queue
The ingestion queue handles event processing from SDK clients and the OpenTelemetry protocol.
graph LR
A[SDK Events] --> B[API Endpoint]
B --> C[ingestionQueue]
C --> D[Validate Events]
D --> E[Parse & Transform]
E --> F[(ClickHouse)]
E --> G[(PostgreSQL)]Key Features:
- Batch processing with configurable batch size
- Retry with exponential backoff
- Event validation against schema
- S3 file-based storage for large payloads
Sources: packages/shared/src/server/redis/ingestionQueue.ts
Evaluation Execution Queue
Handles asynchronous execution of LLM-based evaluations:
graph TD
A[Create Eval Job] --> B[evalExecutionQueue]
B --> C[Worker Pickup]
C --> D[Fetch Traces]
D --> E[Run LLM Evaluation]
E --> F[Store Results]
F --> G[(PostgreSQL)]Job Options:
{
attempts: 3,
backoff: {
type: "exponential",
delay: 2000,
},
removeOnComplete: 100, // Keep last 100 completed
removeOnFail: 1000, // Keep last 1000 failed
}
Sources: packages/shared/src/server/redis/evalExecutionQueue.ts
Batch Action Queue
Processes bulk operations such as batch updates and deletions:
| Parameter | Default | Description |
|---|---|---|
batchSize | 100 | Items per batch |
concurrency | 5 | Parallel workers |
attempts | 3 | Retry count |
backoffDelay | 1000 | Initial backoff ms |
Sources: packages/shared/src/server/redis/batchActionQueue.ts
Webhook Queue
Manages outbound webhook deliveries with priority handling:
graph TD
A[Trigger Event] --> B[webhookQueue]
B --> C{Has Retry Config?}
C -->|Yes| D[Schedule Retry]
C -->|No| E[Immediate Delivery]
D --> E
E --> F{HTTP Response}
F -->|2xx| G[Log Success]
F -->|4xx| H[Log Failure]
F -->|5xx| DSources: packages/shared/src/server/redis/webhookQueue.ts
Worker Manager
The WorkerManager orchestrates all queue workers within the worker process:
export class WorkerManager {
private workers: Map<string, Worker>;
async initialize(): Promise<void> {
// Initialize all queue workers
}
async gracefulShutdown(): Promise<void> {
// Gracefully close all workers
}
}
Sources: worker/src/queues/workerManager.ts
Worker Lifecycle
graph TD
A[Start Worker Process] --> B[Load Configuration]
B --> C[Initialize Redis Connection]
C --> D[Create Queue Instances]
D --> E[Create Worker Instances]
E --> F[Register Event Handlers]
F --> G[Workers Ready]
H[Shutdown Signal] --> I[Close Workers]
I --> J[Process Pending Jobs]
J --> K[Close Redis Connection]
K --> L[Exit]Event Handling
Workers register handlers for job lifecycle events:
| Event | Handler Purpose |
|---|---|
completed | Log successful job completion |
failed | Handle job failures and retries |
progress | Track job progress updates |
stalled | Detect and requeue stalled jobs |
Job Data Flow
Standard Event Ingestion
Events flow through the system as follows:
sequenceDiagram
participant SDK
participant API
participant Redis
participant Worker
participant DB
SDK->>API: POST /api/public/ingestion
API->>Redis: Add to IngestionQueue
API-->>SDK: 202 Accepted
Worker->>Redis: Dequeue Job
Worker->>DB: Validate & Store
Worker->>Redis: Job CompleteEvent Transformation
The ingestion endpoint transforms S3 keys into queue payloads:
Standard format:
{
"authCheck": {
"validKey": true,
"scope": { "projectId": "<projectId>" }
},
"data": {
"eventBodyId": "<eventBodyId>",
"fileKey": "<eventId>",
"type": "<type>-create"
}
}
OTEL format:
{
"authCheck": {
"validKey": true,
"scope": { "projectId": "<projectId>", "accessLevel": "project" }
},
"data": {
"fileKey": "otel/<projectId>/<yyyy>/<mm>/<dd>/<hh>/<mm>/<eventId>.json"
}
}
Sources: worker/src/scripts/replayIngestionEventsV2/README.md
Error Handling and Retries
Retry Strategy
All queues implement exponential backoff retry:
const jobOptions = {
attempts: 3,
backoff: {
type: "exponential",
delay: 1000, // 1s, 2s, 4s
},
};
Error Classification
| HTTP Status | Behavior | Retry |
|---|---|---|
| 2xx | Success | No |
| 429 | Rate limited | Yes (with backoff) |
| 5xx | Server error | Yes (up to 3 times) |
| 4xx (not 429) | Client error | No (logged and skipped) |
Monitoring and Debugging
Progress Tracking
The replay scripts provide progress updates:
[1200/45000] 2.7% โ 498 queued, 2 skipped
Checkpoint System
Scripts write checkpoints to enable resume after failures:
# Checkpoint file location
./worker/.checkpoint
# Resume from checkpoint
pnpm run --filter=worker replay-ingestion --resume
Error Logging
Failed jobs are logged to errors.csv for manual inspection:
"operation","key","error"
"REST.PUT.OBJECT","projectId/type/eventBodyId/eventId.json","Connection timeout"
Admin API for Queue Management
`POST /api/admin/ingestion-replay`
Enqueues batches of S3 keys for reprocessing:
Request:
{
"keys": [
"projectId/trace/eventBodyId/eventId.json",
"otel/projectId/2025/07/09/14/30/some-uuid.json"
]
}
Response:
{
"queued": 498,
"skipped": 2,
"errors": []
}
Authentication
Requires Authorization: Bearer {ADMIN_API_KEY} header validated by AdminApiAuthService.
Environment Variables
| Variable | Description | Required |
|---|---|---|
REDIS_CONNECTION_STRING | Redis connection URL | Yes |
LANGFUSE_S3_EVENT_UPLOAD_BUCKET | S3 bucket for event storage | Yes |
CLICKHOUSE_URL | ClickHouse connection URL | Yes |
CLICKHOUSE_USER | ClickHouse username | Yes |
CLICKHOUSE_PASSWORD | ClickHouse password | Yes |
ADMIN_API_KEY | Admin API authentication key | Yes (admin endpoints) |
Utility Scripts
Replay Ingestion Events V2
A streamlined replacement for v1 with improved features:
| Feature | v1 | v2 |
|---|---|---|
| Infrastructure | Redis, ClickHouse, PostgreSQL, S3 | Langfuse host URL only |
| Setup | Full repo clone | npx tsx + env vars |
| Event delivery | Direct BullMQ addBulk | HTTP POST to admin API |
| Resume support | Manual | Built-in checkpoint |
| Rate limiting | None | Client + server side |
Refill Queue Event
Backfills queues with events from local machines:
# 1. Create events file
echo '{"projectId": "project-123", "orgId": "org-456"}' > ./worker/events.jsonl
# 2. Configure environment
# Create .env with REDIS_CONNECTION_STRING and supporting services
# 3. Run the script
pnpm run --filter=worker refill-queue-event
Best Practices
- Connection Pooling: Reuse Redis connections across queue operations
- Graceful Shutdown: Always drain active jobs before stopping workers
- Monitoring: Track queue depth and processing times
- Error Boundaries: Isolate queue failures to prevent cascade
- Backoff Tuning: Adjust retry delays based on workload characteristics
Related Documentation
Sources: [worker/src/queues/workerManager.ts]()
Worker Service
Related topics: System Architecture, Queue System (Redis/BullMQ)
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Queue System (Redis/BullMQ)
Worker Service
Overview
The Worker Service is a core backend component in Langfuse responsible for asynchronous processing of long-running tasks. It operates as a separate Node.js process that communicates with the main Langfuse server through message queues, primarily using BullMQ backed by Redis.
graph TB
subgraph "Langfuse Server"
A[API Endpoints]
B[tRPC Routers]
end
subgraph "Redis"
C[(Ingestion Queues)]
D[(Evaluation Queues)]
E[(Batch Action Queues)]
end
subgraph "Worker Service"
F[Worker Manager]
G[Evaluation Service]
H[Batch Action Handler]
I[Queue Processors]
end
A -->|"Enqueue Jobs"| C
B -->|"Dispatch Tasks"| C
F -->|"Process"| C
G -->|"Execute"| D
H -->|"Execute"| E
C --> F
D --> G
E --> HPurpose and Scope
The Worker Service handles the following categories of work:
- Event Ingestion Processing: Processing and persisting trace events, observations, and spans from the ingestion queues
- Evaluation Execution: Running LLM-based evaluations on traces and observations
- Batch Actions: Executing bulk operations on datasets, traces, and other resources
- Queue Replay: Replaying historical ingestion events for data recovery or reprocessing
Sources: worker/src/app.ts
Sources: [worker/src/app.ts]()
API Layer
Related topics: System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture
API Layer
The API Layer is the central communication bridge between the Langfuse frontend and backend services. Built on tRPC (TypeScript RPC), it provides end-to-end type safety, enabling the web application to interact with server-side logic through strongly-typed procedure calls.
Overview
The API Layer serves multiple critical functions:
- Type-Safe Communication: All API calls are fully typed from server to client
- Authentication & Authorization: Every procedure is wrapped with auth middleware
- Business Logic Isolation: Procedures delegate to repository layer for data access
- Input Validation: Zod schemas validate all incoming requests
- Feature Organization: Procedures are grouped by domain (traces, observations, scores)
graph TD
subgraph Frontend
UI[React Components]
end
subgraph API Layer
TRPC[tRPC Client]
Procedures[tRPC Procedures]
Middleware[Auth Middleware]
end
subgraph Backend
Repositories[Repositories]
Database[(Database)]
end
UI --> TRPC
TRPC --> Procedures
Procedures --> Middleware
Middleware --> Repositories
Repositories --> DatabaseArchitecture
Core Components
| Component | File | Purpose |
|---|---|---|
| tRPC Instance | trpc.ts | Initialize tRPC with middleware and context |
| Root Router | root.ts | Register all feature routers |
| Trace Router | routers/traces.ts | Trace CRUD and query operations |
| Observation Router | routers/observations.ts | Span/generation/event operations |
| Score Router | routers/scores.ts | Score management and analytics |
| Score Analytics | features/score-analytics/server/scoreAnalyticsRouter.ts | Score aggregation and statistics |
Router Registration Flow
The root router aggregates all feature routers under a namespace:
// Simplified from web/src/server/api/root.ts
export const rootRouter = createTRPCRouter({
trace: traceRouter,
observation: observationRouter,
score: scoreRouter,
scoreAnalytics: scoreAnalyticsRouter,
// ... other routers
});
Sources: web/src/server/api/root.ts:1-50
tRPC Configuration
Initialization
The tRPC instance is initialized in trpc.ts with:
- Context Creation: Builds request-scoped context with authentication
- Middleware Chain: Applies auth, rate limiting, and logging
- Error Handling: Transforms errors into HTTP-compatible responses
// From web/src/server/api/trpc.ts
export const createTRPCContext = async (opts: CreateNextContextOptions) => {
return {
session: await getServerSession(authOptions),
// ... additional context
};
};
const t = initTRPC.context<typeof createTRPCContext>().create();
Sources: web/src/server/api/trpc.ts:1-30
Middleware Stack
| Middleware | Purpose |
|---|---|
isAuthed | Validates user session and project access |
isProjectMember | Ensures user belongs to the project scope |
isOwnerOrMember | Allows owner or member roles |
rateLimit | Prevents abuse with configurable limits |
API Routers
Trace Router
Handles all trace-related operations including retrieval, creation, and updates.
Key Procedures:
| Procedure | Type | Description |
|---|---|---|
getById | Query | Fetch single trace with full details |
list | Query | Paginated trace listing with filters |
create | Mutation | Create new trace record |
update | Mutation | Update trace metadata/tags |
delete | Mutation | Soft-delete trace |
Sources: web/src/server/api/routers/traces.ts:1-100
Observation Router
Manages spans, generations, and events that belong to traces.
Key Procedures:
| Procedure | Type | Description |
|---|---|---|
getById | Query | Fetch single observation |
list | Query | List observations with trace/session filters |
create | Mutation | Create observation linked to trace |
update | Mutation | Update observation metadata |
Sources: web/src/server/api/routers/observations.ts:1-100
Score Router
Provides score management with support for multiple API versions (v1 and v2).
API Versioning Strategy:
| Version | traceId Required | Session Support | Dataset Run Support |
|---|---|---|---|
| v1 | Yes | No | No |
| v2 | Optional | Yes | Yes |
The Score router supports both trace-level and session-level scores through different API versions.
Key Procedures:
| Procedure | Type | Description |
|---|---|---|
create | Mutation | Create score (POST endpoint) |
delete | Mutation | Delete score |
getById | Query | Fetch single score |
list | Query | List scores with filters |
Sources: web/src/server/api/routers/scores.ts:1-100
Score Analytics Router
Provides aggregated statistics and time-series data for scores.
// From web/src/features/score-analytics/server/scoreAnalyticsRouter.ts
export const scoreAnalyticsRouter = createTRPCRouter({
timeSeries: protectedProcedure.query(...),
statistics: protectedProcedure.query(...),
heatmapData: protectedProcedure.query(...),
});
Key Procedures:
| Procedure | Type | Description |
|---|---|---|
timeSeries | Query | Time-series score data with gap filling |
statistics | Query | Statistical summaries (count, mean, p50/p95/p99) |
heatmapData | Query | Heatmap matrix for visualization |
Sources: web/src/features/score-analytics/README.md
Request Flow
sequenceDiagram
participant Client
participant TRPC as tRPC Server
participant Middleware
participant Router
participant Repository
participant DB as Database
Client->>TRPC: Procedure Call
TRPC->>Middleware: Apply Chain
Middleware->>Middleware: Auth Check
Middleware->>Router: Validated Input
Router->>Repository: Domain Operation
Repository->>DB: SQL/Query
DB-->>Repository: Result
Repository-->>Router: Domain Object
Router-->>TRPC: Response
TRPC-->>Client: Typed ResponseInput Validation
All procedures use Zod schemas for runtime validation:
// Example pattern from routers
const createTraceSchema = z.object({
name: z.string().optional(),
userId: z.string().optional(),
metadata: z.record(z.unknown()).optional(),
tags: z.array(z.string()).optional(),
});
protectedProcedure
.input(createTraceSchema)
.mutation(async ({ input, ctx }) => {
return ctx.repo.trace.create(input);
});
Type Flow
The API Layer maintains type consistency across the stack:
graph LR
Client[Client Input] --> InputZ[Zod Schema]
InputZ --> InputTS[TypeScript Type]
InputTS --> Handler[Procedure Handler]
Handler --> Repo[Repository Return]
Repo --> OutputZ[Zod Response Schema]
OutputZ --> OutputTS[API Response Type]
OutputTS --> ClientResponse[Client]Type Transformation Points:
| Stage | Location | Purpose |
|---|---|---|
| Input | Routers | Zod validation + type inference |
| Domain | Repositories | Database models to domain objects |
| Output | Routers | Zod response schema validation |
| Client | React Hooks | Full type safety for UI |
Score Interface Architecture
Scores have a multi-layer type system:
| Layer | Location | Purpose |
|---|---|---|
| API v1 | interfaces/api/v1/ | Legacy trace-only scores |
| API v2 | interfaces/api/v2/ | Current with session/dataset support |
| Application | interfaces/application/ | Internal validation |
| UI | interfaces/ui/ | Simplified frontend types |
Sources: packages/shared/src/features/scores/interfaces/README.md
Public API Extension
The Langfuse Public API extends the internal API layer for external consumption:
// From web/src/features/README.md
// Pattern for adding new public API routes:
1. Wrap with withMiddleware
2. Type-safe route with createAuthedAPIRoute
3. Add Zod types to /features/public-api/types
4. Use coerce for date handling
5. Use strict() on response objects
SDK Generation Pipeline:
graph TD
Fern[Fern Definition] --> PythonSDK[Python SDK]
Fern --> JSSDK[JS/TS SDK]
Fern --> Docs[API Documentation]Best Practices
Procedure Design
- Use
protectedProcedurefor authenticated endpoints - Apply input validation at the procedure level
- Return consistent response structures
- Handle errors with typed error classes
Error Handling
| Error Type | HTTP Code | Usage |
|---|---|---|
UNAUTHORIZED | 401 | Missing/invalid session |
FORBIDDEN | 403 | Insufficient permissions |
NOT_FOUND | 404 | Resource doesn't exist |
BAD_REQUEST | 400 | Invalid input |
INTERNAL_SERVER_ERROR | 500 | Unexpected errors |
Performance Considerations
- Use cursor-based pagination for large datasets
- Leverage repository-level caching where applicable
- Batch database operations in mutations
- Limit response sizes with maxTake parameters
Sources: [web/src/server/api/root.ts:1-50]()
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
First-time setup may fail or require extra isolation and rollback planning.
First-time setup may fail or require extra isolation and rollback planning.
The project may affect permissions, credentials, data exposure, or host boundaries.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.
1. Installation risk: bug: Using client with context manager breaks the scoring
- Severity: high
- Finding: Installation risk is backed by a source signal: bug: Using client with context manager breaks the scoring. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/issues/8138
2. Installation risk: bug: unnamed trace name in Langfuse UI
- Severity: high
- Finding: Installation risk is backed by a source signal: bug: unnamed trace name in Langfuse UI. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/issues/13416
3. Security or permission risk: bug: AsyncStream' object has no attribute 'usage' when integrated with Semantic Kernel and Openlit
- Severity: high
- Finding: Security or permission risk is backed by a source signal: bug: AsyncStream' object has no attribute 'usage' when integrated with Semantic Kernel and Openlit. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/issues/8173
4. Security or permission risk: bug: Worker shutdown takes ~1 hour in self hosted kubernetes
- Severity: high
- Finding: Security or permission risk is backed by a source signal: bug: Worker shutdown takes ~1 hour in self hosted kubernetes. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/issues/8156
5. Installation risk: bug: Socket timeout. Expecting data, but didn't receive any in 30000ms on idle BullMQ queues
- Severity: medium
- Finding: Installation risk is backed by a source signal: bug: Socket timeout. Expecting data, but didn't receive any in 30000ms on idle BullMQ queues. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/issues/13601
6. Installation risk: v3.169.0
- Severity: medium
- Finding: Installation risk is backed by a source signal: v3.169.0. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/releases/tag/v3.169.0
7. Installation risk: v3.172.0
- Severity: medium
- Finding: Installation risk is backed by a source signal: v3.172.0. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/releases/tag/v3.172.0
8. Installation risk: v3.173.0
- Severity: medium
- Finding: Installation risk is backed by a source signal: v3.173.0. Treat it as a review item until the current version is checked.
- User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/langfuse/langfuse/releases/tag/v3.173.0
9. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:642497346 | https://github.com/langfuse/langfuse | README/documentation is current enough for a first validation pass.
10. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:642497346 | https://github.com/langfuse/langfuse | last_activity_observed missing
11. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:642497346 | https://github.com/langfuse/langfuse | no_demo; severity=medium
12. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:642497346 | https://github.com/langfuse/langfuse | no_demo; severity=medium
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using langfuse with real data or production workflows.
- bug: unnamed trace name in Langfuse UI - github / github_issue
- bug: Socket timeout. Expecting data, but didn't receive any in 30000ms o - github / github_issue
- bug: Using client with context manager breaks the scoring - github / github_issue
- bug: Worker shutdown takes ~1 hour in self hosted kubernetes - github / github_issue
- bug: AsyncStream' object has no attribute 'usage' when integrated with S - github / github_issue
- v3.174.0 - github / github_release
- v3.173.0 - github / github_release
- v3.172.1 - github / github_release
- v3.172.0 - github / github_release
- v3.171.0 - github / github_release
- v3.170.0 - github / github_release
- v3.169.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence