# https://github.com/openlit/openlit 项目说明书

生成时间：2026-05-16 21:33:24 UTC

## 目录

- [OpenLIT Overview](#overview)
- [Quick Start Guide](#quickstart)
- [System Architecture](#architecture)
- [Data Flow and Management](#data-flow)
- [Python SDK Architecture](#python-sdk)
- [TypeScript SDK Architecture](#typescript-sdk)
- [Go SDK Architecture](#go-sdk)
- [LLM and Framework Integrations](#integrations)
- [OpenLIT Controller](#controller)
- [GPU Collector](#gpu-collector)

<a id='overview'></a>

## OpenLIT Overview

### 相关页面

相关主题：[Quick Start Guide](#quickstart), [System Architecture](#architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/README.md](https://github.com/openlit/openlit/blob/main/src/client/README.md)
- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/components/(playground)/getting-started/secrets/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/secrets/index.tsx)
- [src/client/src/components/(playground)/getting-started/prompts/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/prompts/index.tsx)
</details>

# OpenLIT Overview

## What is OpenLIT?

OpenLIT is an **OpenTelemetry-native GenAI and LLM Application Observability tool** designed to simplify the integration process for sending OpenTelemetry traces and metrics from your LLM applications. It provides comprehensive monitoring capabilities for both GenAI and LLM applications.

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:127]()

## Key Features

OpenLIT offers several core capabilities for observability:

| Feature Category | Description |
|------------------|-------------|
| Tracing | Capture detailed traces of LLM application requests |
| Metrics | Collect and analyze performance metrics |
| Evaluations | Assess response quality and model performance |
| Context Management | Manage evaluation contexts and prompts |
| Secrets Management | Securely store and manage API keys and credentials |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx]()
资料来源：[src/client/src/components/(playground)/getting-started/secrets/index.tsx]()
资料来源：[src/client/src/components/(playground)/getting-started/prompts/index.tsx]()

## Architecture Overview

```mermaid
graph TD
    A[LLM Application] --> B[OpenLIT SDK]
    B --> C[OTLP Endpoint<br/>127.0.0.1:4318]
    C --> D[OpenLIT Backend]
    D --> E[OpenLIT UI<br/>127.0.0.1:3000]
    F[Database] <--> D
```

## SDK Support

OpenLIT provides official SDKs for multiple programming languages:

### Python SDK

The Python SDK enables Python-based LLM applications to send telemetry data to OpenLIT.

```python
import openlit

openlit.init()
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx]()

### TypeScript/JavaScript SDK

The TypeScript SDK provides similar functionality for Node.js and browser-based applications.

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

**Example Usage with OpenAI:**

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx]()

## Configuration Options

### OTLP Endpoint Configuration

You can configure the OTLP endpoint in two ways:

| Method | Configuration |
|--------|---------------|
| Code | `openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" })` |
| Environment Variable | `OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"` |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx]()

### Environment Variables

| Variable | Purpose | Default Value |
|----------|---------|---------------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP collector endpoint | http://127.0.0.1:4318 |

## Deployment

### Docker Compose Deployment

OpenLIT can be deployed using Docker Compose from the root directory:

```bash
git clone git@github.com:openlit/openlit.git
docker compose up -d
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx]()

### Default Ports

| Service | Default Address |
|---------|-----------------|
| OpenLIT UI | http://127.0.0.1:3000 |
| OTLP Endpoint | http://127.0.0.1:4318 |

## Default Credentials

After deployment, access the OpenLIT UI using the following default credentials:

| Field | Default Value |
|-------|---------------|
| Email | user@openlit.io |
| Password | openlituser |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx]()

## SDK Repository Locations

| SDK | Repository Path |
|-----|-----------------|
| Python SDK | `sdk/python` |
| TypeScript SDK | `sdk/typescript` |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx]()

## Community and Support

OpenLIT maintains active community channels for support and discussions:

| Platform | Link |
|----------|------|
| GitHub | https://github.com/openlit/openlit |
| Documentation | https://docs.openlit.io |
| Slack | Join via invitation link |
| X (Twitter) | @openlit_io |

资料来源：[src/client/README.md]()

## Evaluation Features

OpenLIT supports custom evaluation types with configurable prompts and context:

```typescript
// Evaluation prompt format example
[Domain Accuracy evaluation context]
Consider: whether the response aligns with domain-specific knowledge and terminology.
Look for incorrect use of domain terms, inaccurate domain-specific claims, and deviations from established domain practices.
```

Evaluations provide the following metrics:
- **Score**: Numerical rating
- **Classification**: Categorical classification
- **Explanation**: Detailed reasoning
- **Verdict**: Pass/fail determination

资料来源：[src/client/src/app/(playground)/evaluations/types/new/page.tsx]()
资料来源：[src/client/src/components/(playground)/request/components/evaluations.tsx]()

## Pricing Integration

OpenLIT can calculate costs for LLM usage based on token consumption:

```
cost = (input_tokens / 1M) × input_price + (output_tokens / 1M) × output_price
```

This includes:
- Input token pricing per million tokens
- Output token pricing per million tokens
- Context window size tracking

资料来源：[src/client/src/components/(playground)/chat/chat-settings-form.tsx]()

---

<a id='quickstart'></a>

## Quick Start Guide

### 相关页面

相关主题：[Python SDK Architecture](#python-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [src/client/src/app/(playground)/agents/no-controller.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/agents/no-controller.tsx)
- [src/client/src/app/(playground)/context/new/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/context/new/page.tsx)
- [src/client/src/app/(playground)/context/[id]/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/context/[id]/page.tsx)
- [src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx)
- [src/client/src/app/not-found.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/not-found.tsx)
</details>

# Quick Start Guide

OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool designed to simplify the integration of tracing and metrics collection for AI applications. This guide provides comprehensive instructions for deploying OpenLIT and instrumenting your applications using the Python and TypeScript SDKs.

## Prerequisites

Before beginning, ensure you have the following installed:

| Requirement | Version | Purpose |
|-------------|---------|---------|
| Docker | Latest | Container runtime for OpenLIT deployment |
| Docker Compose | Latest | Orchestration tool |
| Node.js | 18+ | Required for TypeScript SDK |
| Python | 3.8+ | Required for Python SDK |
| npm/pip | Latest | Package managers |

## Deployment Options

OpenLIT can be deployed using multiple methods depending on your infrastructure requirements.

### Docker Compose Deployment

The recommended approach for local development and testing is Docker Compose.

```bash
git clone git@github.com:openlit/openlit.git
cd openlit
docker compose up -d
```

Once deployed, access the OpenLIT UI at `http://127.0.0.1:3000` using the default credentials:

- **Email:** user@openlit.io
- **Password:** openlituser

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:50-55]()

### Controller Deployment

For infrastructure-level observability, the OpenLIT Controller can be deployed as a system service or containerized application.

#### Linux System Service

```bash
sudo tee /etc/systemd/system/openlit-controller.service <<EOF
[Unit]
Description=OpenLIT Controller
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/opt/openlit
ExecStart=/opt/openlit/openlit-controller
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now openlit-controller
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:12-25]()

#### Docker Deployment

```bash
docker run -d --privileged --pid=host \
  -e OPENLIT_URL="<openlit-url>" \
  -e OTEL_EXPORTER_OTLP_ENDPOINT="<openlit-url>:4318" \
  -v /proc:/host/proc:ro \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  -v /sys/fs/bpf:/sys/fs/bpf:rw \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e OPENLIT_PROC_ROOT="/host/proc" \
  ghcr.io/openlit/controller:latest
```

#### Kubernetes Deployment

```bash
helm repo add openlit https://openlit.github.io/helm
helm repo update
helm upgrade --install openlit openlit/openlit \
  --set openlit-controller.enabled=true
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:27-45]()

## SDK Integration

OpenLIT provides SDKs for both Python and TypeScript environments to enable application-level observability.

### Python SDK

#### Installation

Install the Python SDK using pip:

```bash
pip install openlit
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:85-92]()

#### Initialization

Add the following initialization code to your application:

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

Alternatively, set the endpoint using the environment variable:

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

#### Complete Example with OpenAI

```python
import openlit
from openai import OpenAI

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "What is LLM Observability?"}]
)
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:45-65]()

### TypeScript SDK

#### Installation

Install the TypeScript SDK using npm:

```bash
npm install openlit
```

#### Initialization

Add the following initialization code to your application:

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

Alternatively, set the endpoint using the environment variable `OTEL_EXPORTER_OTLP_ENDPOINT`.

#### Complete Example with OpenAI

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:95-120]()

## Configuration Reference

### SDK Configuration Options

| Parameter | Type | Environment Variable | Description |
|-----------|------|---------------------|-------------|
| `otlp_endpoint` | string | `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP exporter endpoint URL |
| `api_key` | string | `OPENLIT_API_KEY` | API key for authenticated endpoints |

### Controller Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENLIT_URL` | Base URL for the OpenLIT instance |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for trace export |
| `OPENLIT_API_KEY` | API key for OpenLIT authentication |
| `OPENLIT_PROC_ROOT` | Root path for process information (default: `/host/proc`) |

## Application Workflow

```mermaid
graph TD
    A[Deploy OpenLIT with Docker Compose] --> B[Access OpenLIT UI]
    B --> C{Choose Deployment Mode}
    C -->|Local Development| D[Install SDK in Application]
    C -->|System-wide| E[Deploy Controller]
    D --> F[Initialize SDK]
    F --> G[Instrument LLM Calls]
    G --> H[View Traces & Metrics in UI]
    E --> I[Auto-discover Services]
    I --> J[View Infrastructure Metrics]
```

## Additional Resources

For more advanced configurations and use cases, refer to the following repositories:

- [OpenLIT Python SDK](https://github.com/openlit/openlit/tree/main/sdk/python)
- [OpenLIT TypeScript SDK](https://github.com/openlit/openlit/tree/main/sdk/typescript)
- [Official Documentation](https://docs.openlit.io)
- [GitHub Repository](https://github.com/openlit/openlit)

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:100-115]()
资料来源：[src/client/src/app/not-found.tsx:20-35]()

---

<a id='architecture'></a>

## System Architecture

### 相关页面

相关主题：[Data Flow and Management](#data-flow), [Python SDK Architecture](#python-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/lib/platform/clickhouse/clickhouse-client.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/clickhouse/clickhouse-client.ts)
- [docker-compose.yml](https://github.com/openlit/openlit/blob/main/docker-compose.yml)
- [src/dev-docker-compose.yml](https://github.com/openlit/openlit/blob/main/src/dev-docker-compose.yml)
</details>

# System Architecture

## Overview

OpenLIT is an **OpenTelemetry-native GenAI and LLM Application Observability tool** designed to simplify the integration of observability into AI applications. The system enables developers to send OpenTelemetry traces and metrics from their LLM applications with minimal configuration changes.

The architecture follows a distributed microservices pattern with clear separation between data collection (SDK instrumentation), data transmission (OTLP protocol), and data visualization (frontend dashboard).

## High-Level Architecture

```mermaid
graph TB
    subgraph "Client Applications"
        PythonApp["Python Application"]
        TypeScriptApp["TypeScript/JS Application"]
    end

    subgraph "OpenLIT SDKs"
        PythonSDK["Python SDK<br/>pip install openlit"]
        TSSDK["TypeScript SDK<br/>npm install openlit"]
    end

    subgraph "Data Transport"
        OTLP["OTLP Endpoint<br/>:4318"]
    end

    subgraph "OpenLIT Backend"
        Frontend["Web Dashboard<br/>Port 3000"]
        API["API Services"]
        DB[( "ClickHouse<br/>Database" )]
    end

    PythonApp --> PythonSDK
    TypeScriptApp --> TSSDK
    PythonSDK --> OTLP
    TSSDK --> OTLP
    OTLP --> API
    API --> DB
    Frontend --> API
```

## Core Components

### SDK Layer

OpenLIT provides language-specific SDKs for instrumenting AI applications:

| SDK | Package Manager | Installation | Repository |
|-----|-----------------|--------------|------------|
| Python | pip | `pip install openlit` | [sdk/python](https://github.com/openlit/openlit/tree/main/sdk/python) |
| TypeScript | npm | `npm install openlit` | [sdk/typescript](https://github.com/openlit/openlit/tree/main/sdk/typescript) |

**Python SDK Initialization**

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```
资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:73-74]()

**TypeScript SDK Initialization**

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```
资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:115-118]()

### Data Transport Layer

The system uses the **OpenTelemetry Protocol (OTLP)** for transmitting telemetry data:

| Parameter | Default Value | Description |
|-----------|---------------|-------------|
| OTLP Endpoint | `http://127.0.0.1:4318` | gRPC/HTTP endpoint for traces |
| Environment Variable | `OTEL_EXPORTER_OTLP_ENDPOINT` | Alternative endpoint configuration |

The OTLP endpoint can be configured either programmatically via SDK initialization or through environment variables.

### Backend Services

#### Web Dashboard (Frontend)

The frontend is a Next.js application providing the user interface for:

- **Tracing View** - Visualize request traces and spans
- **Agents Management** - Configure and monitor AI agents
- **Model Management** - Configure AI model providers and pricing
- **Getting Started** - Onboarding documentation
- **Chat Interface** - Interactive testing environment

The application runs on **port 3000** by default and provides a login interface with default credentials:

- **Email:** user@openlit.io
- **Password:** openlituser

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:40-44]()

#### Agent Lifecycle Management

OpenLIT supports managing AI agents with lifecycle operations:

```mermaid
stateDiagram-v2
    [*] --> Starting
    Starting --> Running
    Running --> Restarting
    Restarting --> Running
    Running --> Stopping
    Stopping --> [*]
```

Lifecycle actions include:
- **Start** - Initialize the agent service
- **Stop** - Terminate with confirmation dialog
- **Restart** - Restart the agent process

资料来源：[src/client/src/app/(playground)/agents/lifecycle-actions.tsx:1-60]()

### Controller Services

The OpenLIT Controller provides infrastructure-level observability for containerized and orchestrated environments:

| Deployment Method | Command/Configuration |
|-------------------|----------------------|
| Docker | `docker run -d --privileged --pid=host ... ghcr.io/openlit/controller:latest` |
| Kubernetes | `helm upgrade --install openlit openlit/openlit --set openlit-controller.enabled=true` |
| Systemd | Service unit file with systemctl enable |

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:45-60]()

#### Controller Environment Variables

| Variable | Purpose |
|----------|---------|
| `OPENLIT_URL` | Main OpenLIT instance URL |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for telemetry |
| `OPENLIT_API_KEY` | Authentication key (optional) |
| `OPENLIT_PROC_ROOT` | Process root for host monitoring |

## Deployment Architecture

### Docker Compose Deployment

For development and testing, OpenLIT can be deployed using Docker Compose:

```bash
git clone git@github.com:openlit/openlit.git
cd openlit
docker compose up -d
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:50-55]()

### Multi-Platform Support

```mermaid
graph LR
    subgraph "Deployment Platforms"
        Docker["Docker"]
        K8s["Kubernetes"]
        SystemD["Systemd"]
    end

    subgraph "Monitoring Targets"
        Containers["Containers"]
        Processes["Host Processes"]
        Services["System Services"]
    end

    Docker --> Containers
    K8s --> Containers
    K8s --> Services
    SystemD --> Services
    SystemD --> Processes
```

## Feature Architecture

### Tracing Integration

OpenLIT's tracing feature provides comprehensive observability:

| Feature | Description |
|---------|-------------|
| **Auto-Instrumentation** | Automatic capture of LLM calls |
| **Span Attributes** | Model, provider, token usage, latency |
| **Context Propagation** | Request tracing across services |
| **Error Tracking** | Exception and failure monitoring |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:1-100]()

### Agent Schema Capture

The system captures tool schemas from agents for documentation and analysis:

```typescript
interface ToolSchema {
  name: string;
  description?: string;
  schema: object;
}
```

Schemas are displayed in an expandable accordion format with JSON visualization.

资料来源：[src/client/src/components/(playground)/agents/tools-card.tsx:35-55]()

### Model Configuration

OpenLIT supports custom model configurations with pricing information:

| Field | Type | Description |
|-------|------|-------------|
| `providerName` | string | AI provider name |
| `modelId` | string | Model identifier |
| `modelName` | string | Display name |
| `inputPricePerMToken` | number | Input cost per million tokens |
| `outputPricePerMToken` | number | Output cost per million tokens |
| `contextWindow` | number | Maximum context length |

资料来源：[src/client/src/components/(playground)/chat/message-input.tsx:25-45]()

## Data Flow

```mermaid
sequenceDiagram
    participant App as Application
    participant SDK as OpenLIT SDK
    participant OTLP as OTLP Endpoint
    participant API as OpenLIT API
    participant CH as ClickHouse
    participant UI as Web Dashboard

    App->>SDK: Initialize with config
    App->>SDK: LLM API Call
    SDK->>SDK: Capture trace/metrics
    SDK->>OTLP: Export telemetry
    OTLP->>API: Process spans
    API->>CH: Store data
    UI->>API: Query traces
    API->>UI: Return results
    UI->>UI: Render dashboard
```

## Configuration Reference

### SDK Configuration Options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `otlp_endpoint` | string | `http://127.0.0.1:4318` | OTLP collector endpoint |
| `service_name` | string | auto-detect | Service identifier |
| `api_key` | string | none | Authentication for hosted services |

### Environment Variables

| Variable | SDK Support | Description |
|----------|-------------|-------------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Python, TS | Global OTLP endpoint override |
| `OPENLIT_API_KEY` | All | API authentication key |
| `OPENLIT_SERVICE_NAME` | All | Override service name |

## Security Considerations

### Authentication

The system supports multiple authentication providers:

- **Email/Password** - Local authentication with default credentials
- **OAuth Providers** - Google and GitHub SSO integration

资料来源：[src/client/src/components/(auth)/auth-form.tsx:1-50]()

### API Security

API endpoints are protected and require valid session tokens. The controller service supports optional API key authentication:

```bash
-e OPENLIT_API_KEY="your-api-key"
```

## Technology Stack

| Layer | Technology |
|-------|------------|
| Frontend | Next.js, React, TypeScript, TailwindCSS |
| SDKs | Python, TypeScript |
| Telemetry | OpenTelemetry Protocol (OTLP) |
| Database | ClickHouse |
| Containerization | Docker, Kubernetes |
| Service Management | Systemd |

## External Resources

| Resource | URL |
|----------|-----|
| Documentation | https://docs.openlit.io |
| GitHub Repository | https://github.com/openlit/openlit |
| TypeScript SDK | https://github.com/openlit/openlit/tree/main/sdk/typescript |
| Python SDK | https://github.com/openlit/openlit/tree/main/sdk/python |

---

*Last updated: Based on repository state at main branch*

---

<a id='data-flow'></a>

## Data Flow and Management

### 相关页面

相关主题：[System Architecture](#architecture), [Python SDK Architecture](#python-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/otel/tracing.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/otel/tracing.py)
- [sdk/python/src/openlit/otel/metrics.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/otel/metrics.py)
- [src/client/src/lib/platform/clickhouse/helpers.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/clickhouse/helpers.ts)
- [src/client/src/lib/platform/request/index.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/request/index.ts)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)
- [sdk/python/src/openlit/instrumentation/langgraph/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/langgraph/__init__.py)
- [sdk/typescript/src/instrumentation/llamaindex/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/llamaindex/index.ts)
</details>

# Data Flow and Management

## Overview

OpenLIT is an OpenTelemetry-native observability platform designed for GenAI and LLM applications. The data flow architecture encompasses the entire lifecycle of telemetry data—from instrumentation at the application level through processing, storage, and visualization in the frontend UI.

The system follows a standard OpenTelemetry Collector pattern with platform-specific optimizations for handling GenAI-specific semantic conventions and metrics. Data flows through multiple layers: SDK instrumentation, OTLP export, backend processing, ClickHouse storage, and client-side data management for the playground UI.

## Architecture Overview

```mermaid
graph TD
    subgraph Application_Layer["Application Layer"]
        PySDK["Python SDK"]
        TsSDK["TypeScript SDK"]
    end
    
    subgraph Instrumentation["Instrumentation"]
        LangGraph["LangGraph"]
        ClaudeAgent["Claude Agent SDK"]
        LlamaIndex["LlamaIndex"]
        OpenAI["OpenAI"]
    end
    
    subgraph Export["OTLP Export"]
        OTLP["OTLP Endpoint<br/>:4318"]
    end
    
    subgraph Backend["OpenLIT Backend"]
        Processor["Data Processor"]
        Storage["ClickHouse"]
    end
    
    subgraph Frontend["Frontend Client"]
        Client["Playground UI"]
        APIClient["API Client"]
    end
    
    PySDK -->|HTTP/gRPC| OTLP
    TsSDK -->|HTTP/gRPC| OTLP
    LangGraph --> PySDK
    ClaudeAgent --> PySDK
    OpenAI --> PySDK
    LlamaIndex --> TsSDK
    OTLP --> Processor
    Processor --> Storage
    Storage --> APIClient
    APIClient --> Client
```

## Tracing Data Flow

### Python SDK Tracing Architecture

The Python SDK provides comprehensive tracing capabilities through the OpenTelemetry SDK integration. The tracing module (`tracing.py`) establishes the foundation for all trace collection and export operations.

**Core Tracing Components:**

| Component | Purpose | Location |
|-----------|---------|----------|
| `TracerProvider` | Manages trace creation and propagation | `sdk/python/src/openlit/otel/tracing.py` |
| `SpanProcessor` | Processes individual spans before export | `sdk/python/src/openlit/otel/tracing.py` |
| `OTLPExporter` | Exports spans to OTLP endpoint | `sdk/python/src/openlit/otel/tracing.py` |
| `ContextPropagation` | Maintains trace context across async operations | `sdk/python/src/openlit/otel/tracing.py` |

The tracing initialization follows a standard pattern:

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

This initialization configures the tracer provider with the specified OTLP endpoint, enabling automatic span collection from all instrumented LLM frameworks.

**资料来源：** [sdk/python/src/openlit/otel/tracing.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/otel/tracing.py)

### Span Lifecycle

Spans are created and managed through a structured lifecycle that ensures complete telemetry capture:

```mermaid
sequenceDiagram
    participant App as Application Code
    participant SDK as OpenLIT SDK
    participant Inst as Instrumentation
    participant Exporter as OTLP Exporter
    participant Backend as OpenLIT Backend
    
    App->>Inst: LLM/Framework Call
    Inst->>SDK: Create Span
    SDK->>SDK: Set Attributes
    SDK->>SDK: Record Metrics
    App->>SDK: Response Received
    SDK->>SDK: Complete Span
    SDK->>Exporter: Export Span
    Exporter->>Backend: OTLP Stream
```

The span lifecycle includes:
1. **Creation**: Span is initialized with parent context
2. **Attribute Setting**: GenAI-specific attributes (model, tokens, cost) are attached
3. **Timing**: Start and end times are recorded for duration calculation
4. **Status**: Span status is set based on success/failure
5. **Export**: Spans are batched and exported to OTLP endpoint

**资料来源：** [sdk/python/src/openlit/instrumentation/langgraph/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/langgraph/__init__.py)

### Instrumentation Framework Integration

OpenLIT provides instrumentation for multiple LLM frameworks, each with framework-specific span attributes:

**Supported Instrumentations:**

| Framework | Operations Traced | Semantic Convention |
|-----------|-------------------|---------------------|
| OpenAI | chat completions, embeddings | `gen_ai.operation.type` |
| LangGraph | execution, checkpointing, construction | `framework` + `gen_ai` |
| Claude Agent SDK | invoke_agent, execute_tool | `gen_ai.operation.type` |
| LlamaIndex | query_engine, retriever, document | `retrieve` + `framework` |

**LangGraph Instrumentation Pattern:**

The LangGraph instrumentation wraps execution operations with both sync and async variants:

```python
# From langgraph/__init__.py
def _wrap_execution_operations(self, operations, ...):
    for module, method, operation_type, sync_type in operations:
        if sync_type == "async":
            wrapper = async_general_wrap(operation_type, ...)
        else:
            wrapper = general_wrap(operation_type, ...)
```

This pattern ensures consistent telemetry regardless of whether the underlying framework uses synchronous or asynchronous execution models.

**资料来源：** [sdk/python/src/openlit/instrumentation/langgraph/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/langgraph/__init__.py)

## Metrics Data Flow

### Metrics Collection Architecture

The metrics module handles quantitative measurements that complement trace data. Metrics provide aggregated views of system performance, cost, and usage patterns.

**Metrics Data Points:**

| Metric Type | Description | Aggregation |
|-------------|-------------|-------------|
| Request Count | Total number of LLM requests | Count |
| Token Usage | Input/output tokens consumed | Sum |
| Cost | Calculated cost based on pricing | Sum |
| Latency | Request duration in milliseconds | Histogram |
| Error Rate | Failed requests percentage | Ratio |

**资料来源：** [sdk/python/src/openlit/otel/metrics.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/otel/metrics.py)

### Metric Recording Flow

Metrics are recorded during span processing using the OpenTelemetry Metrics API:

```mermaid
graph LR
    A[LLM Request] --> B[Create Span]
    B --> C[Extract Request Data]
    C --> D[Calculate Pricing]
    D --> E[Record Metrics]
    E --> F[Complete Span]
    
    G[Pricing Info] --> D
    H[Model Config] --> D
```

The metric recording includes:
- `start_time` and `end_time` for duration calculation
- `request_model` for token and pricing lookup
- `environment` and `application_name` for filtering
- `pricing_info` dictionary for cost calculation

**资料来源：** [sdk/python/src/openlit/instrumentation/openai/async_openai.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/openai/async_openai.py)

## Client-Side Data Management

### Frontend API Client Architecture

The frontend client manages data fetching and state management for the playground UI. The API client layer provides a typed interface to the backend services.

**API Client Structure:**

```typescript
// Simplified from request/index.ts
export class RequestClient {
  async fetchTraces(params: TraceParams): Promise<Trace[]>;
  async fetchMetrics(params: MetricParams): Promise<Metrics>;
  async fetchSpans(traceId: string): Promise<Span[]>;
}
```

**Key Data Operations:**

| Operation | Endpoint | Purpose |
|-----------|----------|---------|
| Fetch Traces | `/api/traces` | List traces with filtering |
| Fetch Spans | `/api/traces/:id/spans` | Get detailed span data |
| Fetch Metrics | `/api/metrics` | Aggregated metrics data |
| Export Data | `/api/openground/models/export` | Export pricing data |

**资料来源：** [src/client/src/lib/platform/request/index.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/request/index.ts)

### ClickHouse Data Access

The client uses ClickHouse as the primary data store and accesses it through helper functions that construct and execute queries.

**Query Helper Functions:**

| Function | Purpose |
|----------|---------|
| `buildTraceQuery()` | Construct trace listing query |
| `buildSpanQuery()` | Construct span detail query |
| `applyFilters()` | Apply time range and attribute filters |
| `parseResponse()` | Parse ClickHouse response format |

**资料来源：** [src/client/src/lib/platform/clickhouse/helpers.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/clickhouse/helpers.ts)

### State Management Pattern

The frontend uses React Query or similar state management for data fetching:

```mermaid
graph TD
    A[Component Mount] --> B[Trigger Query]
    B --> C[Show Loading State]
    C --> D{Request Complete?}
    D -->|Yes| E[Update Cache]
    E --> F[Render Data]
    D -->|No| G[Show Error]
    G --> H[Retry Option]
```

The state management includes:
- **Loading states**: Visual feedback during data fetch
- **Error handling**: Graceful degradation on failures
- **Cache invalidation**: Automatic refresh on mutations
- **Pagination**: Support for large result sets with "Load More" patterns

**资料来源：** [src/client/src/components/(playground)/agents/version-drawer.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/version-drawer.tsx)

## Timeline View Data Structure

### Span Timeline Rendering

The timeline view component renders trace data as a visual timeline, parsing span data into a hierarchical structure.

**Span Data Model:**

```typescript
interface SpanData {
  spanId: string;
  parentSpanId?: string;
  startTime: number;
  endTime: number;
  name: string;
  kind: 'client' | 'server' | 'producer' | 'consumer';
  status: 'ok' | 'error';
  attributes: Record<string, any>;
  duration: number;
  cost?: number;
}
```

**Timeline Calculation:**

| Column | Width | Content |
|--------|-------|---------|
| Name Column | 30% | Span name and kind indicator |
| Timeline Column | 60% | Visual timeline bar |
| Stats Column | 10% | Duration and cost |

The timeline calculates relative positions using `traceWindowMs` to determine the overall trace window, then positions each span proportionally within that window.

**资料来源：** [src/client/src/components/(playground)/request/components/timeline-view.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/request/components/timeline-view.tsx)

## TypeScript SDK Data Flow

### LlamaIndex Instrumentation

The TypeScript SDK provides similar capabilities for JavaScript/TypeScript applications, particularly for LlamaIndex integration.

**LlamaIndex Traced Operations:**

| Operation | Semantic Convention | Description |
|-----------|---------------------|-------------|
| `document_load` | `retrieve` | Document loading operations |
| `document_split` | `framework` | Text splitting/splitting |
| `retriever_retrieve` | `retrieve` | Retrieval operations |
| `query_engine_query` | `retrieve` | Query execution |
| `response_synthesize` | `chat` | Response generation |

**资料来源：** [sdk/typescript/src/instrumentation/llamaindex/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/llamaindex/index.ts)

### TypeScript Initialization Pattern

```typescript
import openlit from 'openlit';

// Initialize with OTLP endpoint
openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});

// Or use environment variable
// OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

## Environment Configuration

### Data Flow Configuration Options

| Environment Variable | Default | Purpose |
|---------------------|---------|---------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://127.0.0.1:4318` | OTLP gRPC endpoint |
| `OTEL_EXPORTER_OTLP_PROTOCOL` | `grpc` | Protocol (grpc/http/proto) |
| `OTEL_SERVICE_NAME` | `default` | Service identification |
| `OTEL_EXPORTER_OTLP_HEADERS` | - | Authentication headers |

**资料来源：** [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)

## Data Management Best Practices

### Efficient Data Handling

1. **Batching**: Spans are batched before export to reduce network overhead
2. **Sampling**: Configure appropriate sampling rates for high-volume applications
3. **Filtering**: Apply attribute filters at the query layer to reduce data transfer
4. **Pagination**: Use paginated queries for large result sets

### Error Handling Flow

```mermaid
graph TD
    A[Span Error] --> B[Record Exception]
    B --> C[Set Span Status ERROR]
    C --> D[Record Error Metrics]
    D --> E[Export Span]
    E --> F{Backend Available?}
    F -->|Yes| G[Store Data]
    F -->|No| H[Retry Queue]
    H -->|Retry| G
```

The error handling ensures that even when backend connectivity fails, error information is preserved for debugging.

## Summary

The data flow in OpenLIT follows a well-structured pipeline from SDK instrumentation through to frontend visualization. Key aspects include:

- **Unified Telemetry**: Both traces and metrics are collected through OpenTelemetry SDKs
- **Framework Integration**: Multiple LLM frameworks are automatically instrumented
- **Efficient Export**: OTLP protocol ensures standardized data transfer
- **Flexible Storage**: ClickHouse provides scalable storage and querying
- **Responsive UI**: The playground client efficiently fetches and displays telemetry data

This architecture enables comprehensive observability for GenAI applications while maintaining performance and scalability through batching, caching, and pagination strategies.

---

<a id='python-sdk'></a>

## Python SDK Architecture

### 相关页面

相关主题：[TypeScript SDK Architecture](#typescript-sdk), [Go SDK Architecture](#go-sdk), [LLM and Framework Integrations](#integrations)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/__init__.py)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/claude_agent_sdk.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/claude_agent_sdk.py)
- [sdk/python/src/openlit/instrumentation/agent_framework/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/agent_framework/utils.py)
- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
- [sdk/python/src/openlit/instrumentation/google_adk/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/google_adk/utils.py)
</details>

# Python SDK Architecture

## 概述

OpenLIT Python SDK 是一个 OpenTelemetry 原生的 GenAI 和 LLM 应用可观测性工具。该 SDK 通过自动插桩框架集成到各种 AI 应用中，自动捕获 OpenTelemetry traces 和 metrics，无需手动埋点。

核心职责包括：

- 自动插桩主流 AI SDK（OpenAI、Anthropic、LangChain、CrewAI 等）
- 遵循 OTel GenAI 语义约定（Semantic Conventions）
- 提供基于 OpenTelemetry 的 tracing 和 metrics 收集
- 实现生产级 guardrails（内容安全、审计）

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:1-15]()

## 核心架构组件

```mermaid
graph TD
    subgraph "OpenLIT Python SDK"
        A["openlit.init()"]
        B["Instrumentors<br/>BaseInstrumentor"]
        C["Guard System"]
        D["OTel Layer"]
    end
    
    subgraph "Instrumented Frameworks"
        E["OpenAI"]
        F["Anthropic"]
        G["Claude Agent SDK"]
        H["LangChain / CrewAI"]
        I["Google ADK"]
        J["Agent Framework"]
    end
    
    subgraph "OpenTelemetry Backend"
        K["OTLP Exporter"]
        L["Traces"]
        M["Metrics"]
    end
    
    A --> B
    A --> C
    B --> D
    C --> D
    D --> K
    K --> L
    K --> M
    
    B --> E
    B --> F
    B --> G
    B --> H
    B --> I
    B --> J
```

### 组件说明

| 组件 | 位置 | 职责 |
|------|------|------|
| **Instrumentors** | `openlit.instrumentation.*` | 各 AI 框架的自动插桩实现 |
| **Guard System** | `openlit.guard.*` | 内容安全、审计和合规检查 |
| **OTel Layer** | `openlit.otel.*` | OpenTelemetry traces 和 metrics 的核心实现 |
| **Config** | `openlit._config` | 全局配置管理和指标字典 |
| **Semcov** | `openlit.semcov` | GenAI 语义约定常量定义 |

## 初始化流程

### Python SDK 初始化

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

初始化时 SDK 执行以下操作：

1. 配置 OpenTelemetry tracer provider
2. 加载全局配置（环境、应用名称、指标开关）
3. 注入所有依赖的 instrumentors
4. 初始化 guard pipeline（如配置）

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:30-42]()

### 配置参数

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `otlp_endpoint` | str | `"http://127.0.0.1:4318"` | OTLP gRPC endpoint |
| `environment` | str | `"default"` | 部署环境标识 |
| `application_name` | str | `"default"` | 应用名称 |
| `pricing_info` | dict | `{}` | 模型定价信息 |
| `capture_message_content` | bool | `False` | 是否捕获消息内容 |
| `metrics` | dict | None | 指标配置字典 |
| `disable_metrics` | bool | None | 禁用指标收集 |
| `guards` | list | None | Guard 配置列表 |

## 插桩系统架构

### BaseInstrumentor 模式

所有框架插桩器继承自 `BaseInstrumentor`，采用统一模式：

```python
class ClaudeAgentSDKInstrumentor(BaseInstrumentor):
    def instrumentation_dependencies(self) -> Collection[str]:
        return _instruments  # 如 ("claude-agent-sdk >= 0.1.0",)
    
    def _instrument(self, **kwargs):
        # 1. 获取 tracer 和配置
        tracer = trace.get_tracer(__name__)
        
        # 2. 使用 wrapt 包装目标函数
        wrap_function_wrapper(
            "module.path",
            "function_name",
            wrap_query
        )
```

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:27-45]()

### 插桩覆盖范围

| 框架 | 支持版本 | 追踪操作 |
|------|----------|----------|
| Claude Agent SDK | >= 0.1.0 | `invoke_agent`, `execute_tool` |
| Google ADK | - | `execute_tool` |
| Agent Framework | - | `agent_init`, `agent_run`, `tool_execute`, `workflow_run` |
| CrewAI | - | Agent 和 Tool 调用 |
| LangGraph | - | Graph 节点执行 |

### Span 命名规范

遵循 OTel GenAI 语义约定生成规范化的 span 名称：

| 操作类型 | Span 名称格式 | 示例 |
|----------|---------------|------|
| Agent 创建 | `create_agent {name}` | `create_agent my_agent` |
| Agent 调用 | `invoke_agent {name}` | `invoke_agent my_agent` |
| Tool 执行 | `execute_tool {name}` | `execute_tool calculator` |
| Workflow | `invoke_workflow {name}` | `invoke_workflow pipeline` |

资料来源：[sdk/python/src/openlit/instrumentation/agent_framework/utils.py:1-60]()

### 语义约定属性

所有 span 遵循 `gen_ai.*` 语义约定：

| 属性键 | 说明 | 示例值 |
|--------|------|--------|
| `gen_ai.operation.name` | 操作类型 | `invoke_agent`, `execute_tool` |
| `gen_ai.operation.type` | 操作分类 | `agent`, `tool` |
| `gen_ai.system` | AI 系统 | `openai`, `anthropic`, `google.adk` |
| `gen_ai.provider.name` | 提供商名称 | `google` |
| `gen_ai.tool.name` | 工具名称 | `calculator` |
| `gen_ai.tool.type` | 工具类型 | `function` |
| `gen_ai.tool.description` | 工具描述 | Truncated 描述文本 |
| `gen_ai.tool.call.arguments` | 工具调用参数 | JSON 字符串 |

资料来源：[sdk/python/src/openlit/instrumentation/google_adk/utils.py:1-50]()

## Guard 系统

OpenLIT 提供生产级 guardrails 用于 LLM 应用安全：

```python
import openlit

openlit.init(guards=[openlit.PII(action="redact")])
```

### 可用 Guard 类型

| Guard 类 | 位置 | 功能 |
|----------|------|------|
| `PII` | `openlit.guard.pii` | 个人身份信息检测和脱敏 |
| `PromptInjection` | `openlit.guard.prompt_injection` | 提示注入攻击检测 |
| `SensitiveTopic` | `openlit.guard.sensitive_topic` | 敏感话题检测 |
| `TopicRestriction` | `openlit.guard.topic_restriction` | 话题限制 |
| `Moderation` | `openlit.guard.moderation` | 内容审核 |
| `Schema` | `openlit.guard.schema` | 输出结构验证 |
| `Custom` | `openlit.guard.custom` | 自定义 guard 逻辑 |

### Guard 核心类型

```python
from openlit.guard import (
    Guard,
    GuardAction,
    GuardConfigError,
    GuardDeniedError,
    GuardPhase,
    GuardResult,
    GuardTimeoutError,
    PipelineResult,
)
```

| 类型 | 说明 |
|------|------|
| `Guard` | Base guard 类 |
| `GuardAction` | Guard 执行动作 |
| `GuardPhase` | 执行阶段（pre/post） |
| `GuardResult` | Guard 执行结果 |
| `PipelineResult` | Pipeline 聚合结果 |

资料来源：[sdk/python/src/openlit/guard/__init__.py:1-60]()

### Pipeline 机制

Guard 使用 Pipeline 模式按序执行多个 guard：

```python
from openlit.guard import Pipeline

pipeline = Pipeline([
    PII(action="redact"),
    PromptInjection(threshold=0.8),
    Moderation()
])
```

## Claude Agent SDK 插桩详解

### 架构设计

```mermaid
sequenceDiagram
    participant User as User Code
    participant SDK as Claude Agent SDK
    participant Wrap as wrap_query
    participant Hook as _ToolSpanTracker
    participant Span as OTel Span
    
    User->>SDK: query(...)
    SDK->>Wrap: invoke wrapper
    Wrap->>Span: create invoke_agent span
    Wrap->>SDK: proceed with query
    SDK->>Hook: PreToolUse event
    Hook->>Span: create execute_tool span
    SDK->>Hook: PostToolUse event
    Hook->>Span: finalize tool span
    SDK-->>Wrap: response
    Wrap->>Span: finalize agent span
    Wrap-->>User: return response
```

### Tool Span 追踪

使用 `_ToolSpanTracker` 管理 in-flight tool spans：

```python
class _ToolSpanTracker:
    """Manages in-flight tool spans created by SDK hooks."""
    
    def __init__(
        self,
        tracer,
        parent_span,
        version,
        environment,
        application_name,
        capture_message_content
    ):
        # 初始化追踪器
```

### Fallback 机制

当 SDK hooks 无法注入时，使用消息流回退方案：

```python
# 检查 hooks 是否已注入
if hasattr(client, _HOOKS_INJECTED_ATTR):
    # 使用 hooks 追踪
else:
    # 使用消息流追踪
```

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/claude_agent_sdk.py:1-80]()

## OpenTelemetry 集成

### Tracing 实现

SDK 使用 OpenTelemetry Python API 创建 spans：

```python
from opentelemetry import trace as trace_api
from opentelemetry.trace import SpanKind, Status, StatusCode

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span(
    name="invoke_agent",
    kind=SpanKind.CLIENT
) as span:
    span.set_attribute(...)
    # 执行操作
```

### Metrics 实现

支持以下指标类型：

| 指标类型 | 指标名称 | 说明 |
|----------|----------|------|
| Counter | `gen_ai.*.token_usage` | Token 使用计数 |
| Histogram | `gen_ai.*.duration` | 请求耗时分布 |
| Gauge | - | 当前活跃请求数 |

### 语义约定常量

所有语义约定常量集中定义在 `openlit.semcov` 模块：

```python
class SemanticConvention:
    GEN_AI_OPERATION = "gen_ai.operation.name"
    GEN_AI_SYSTEM = "gen_ai.system"
    GEN_AI_TOOL_NAME = "gen_ai.tool.name"
    GEN_AI_TOOL_TYPE = "gen_ai.tool.type"
    GEN_AI_SYSTEM_VALUE = "gen_ai.system.openai"
```

## 错误处理

### Exception 传播

SDK 使用统一的异常处理机制：

```python
from openlit.__helpers import handle_exception

def some_wrapper(func, *args, **kwargs):
    try:
        return func(*args, **kwargs)
    except Exception as e:
        handle_exception(span, e)
        raise
```

### Guard 特定错误

| 错误类型 | 说明 |
|----------|------|
| `GuardError` | 基础 guard 错误 |
| `GuardDeniedError` | Guard 拒绝请求 |
| `GuardTimeoutError` | Guard 执行超时 |
| `GuardConfigError` | Guard 配置错误 |

## 使用示例

### 基础集成

```python
from openai import OpenAI
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

client = OpenAI(api_key="YOUR_OPENAI_KEY")

chat_completion = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-3.5-turbo"
)
```

### 带 Guard 的集成

```python
import openlit
from openlit.guard import PII, PromptInjection

openlit.init(
    otlp_endpoint="http://127.0.0.1:4318",
    guards=[
        PII(action="redact"),
        PromptInjection(threshold=0.7)
    ]
)
```

### 环境变量配置

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

```python
import openlit

openlit.init()  # 自动读取环境变量
```

## 扩展开发

### 自定义 Instrumentor

```python
from opentelemetry.instrumentation.instrumentor import BaseInstrumentor
from wrapt import wrap_function_wrapper

class CustomSDKInstrumentor(BaseInstrumentor):
    def instrumentation_dependencies(self):
        return ("custom-sdk >= 1.0.0",)
    
    def _instrument(self, **kwargs):
        tracer = kwargs.get("tracer")
        wrap_function_wrapper(
            "custom_sdk",
            "Client.query",
            wrap_custom_query
        )
```

### 自定义 Guard

```python
from openlit.guard import Guard, GuardAction, GuardResult

class CustomGuard(Guard):
    def _evaluate(self, text: str) -> GuardResult:
        # 自定义检测逻辑
        if "forbidden" in text.lower():
            return GuardResult(
                action=GuardAction.DENY,
                reason="Forbidden content detected"
            )
        return GuardResult(action=GuardAction.ALLOW)

---

<a id='typescript-sdk'></a>

## TypeScript SDK Architecture

### 相关页面

相关主题：[Python SDK Architecture](#python-sdk), [Go SDK Architecture](#go-sdk), [LLM and Framework Integrations](#integrations)

<details>
<summary>Related Source Files</summary>

以下源码文件用于生成本页说明：

- [sdk/typescript/src/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/index.ts)
- [sdk/typescript/src/config.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/config.ts)
- [sdk/typescript/src/instrumentation/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/index.ts)
- [sdk/typescript/src/guard/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/guard/index.ts)
- [sdk/typescript/package.json](https://github.com/openlit/openlit/blob/main/sdk/typescript/package.json)

</details>

# TypeScript SDK Architecture

## Overview

The OpenLIT TypeScript SDK provides an OpenTelemetry-native observability solution for GenAI and LLM applications. It enables developers to instrument their TypeScript/JavaScript applications with automatic tracing and metrics collection, forwarding telemetry data to OpenLIT or any OTLP-compatible backend.

**Key Characteristics:**

| Attribute | Value |
|-----------|-------|
| Package Name | `openlit` |
| Installation | `npm install openlit` |
| Entry Point | `sdk/typescript/src/index.ts` |
| Primary Dependency | OpenTelemetry SDK |
| Transport Protocol | OTLP (OpenTelemetry Protocol) |

资料来源：[sdk/typescript/package.json](https://github.com/openlit/openlit/blob/main/sdk/typescript/package.json)

## Core Architecture

The SDK follows a modular architecture with clear separation of concerns:

```mermaid
graph TD
    A[Application Code] --> B[openlit.init]
    B --> C[Config Module]
    C --> D[Instrumentation Module]
    D --> E[Guard Module]
    E --> F[OTLP Exporter]
    F --> G[OpenLIT Backend / OTEL Collector]
    
    C --> C1[OTLP Endpoint]
    C --> C2[Custom Attributes]
    C --> C3[Service Name]
    
    D --> D1[LLM Instrumentation]
    D --> D2[Vector DB Instrumentation]
    D --> D3[Framework Hooks]
```

### Entry Point Module

The main entry point (`index.ts`) exposes a simple initialization API:

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

资料来源：[sdk/typescript/src/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/index.ts)

### Configuration Module

The config module (`config.ts`) handles SDK configuration including:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `otlpEndpoint` | `string` | Environment variable `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP-compatible endpoint URL |
| `serviceName` | `string` | Application-defined | Name of the instrumented service |
| `resourceAttributes` | `Record<string, string>` | `{}` | Custom resource attributes |

资料来源：[sdk/typescript/src/config.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/config.ts)

## Instrumentation Subsystem

The instrumentation module (`instrumentation/index.ts`) provides automatic observability for AI workloads:

### Supported Integrations

| Category | Instrumented Components |
|----------|-------------------------|
| LLM Providers | OpenAI, Anthropic, Azure OpenAI, Google AI, AWS Bedrock, Cohere, Ollama |
| Vector Databases | ChromaDB, Pinecone, Weaviate, Qdrant, Milvus, PGVector |
| Frameworks | LangChain, LlamaIndex, LangFlow, AutoGen |

资料来源：[sdk/typescript/src/instrumentation/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/index.ts)

### Tracing Capabilities

The SDK automatically captures:

- **LLM Request/Response traces** with prompt and completion data
- **Token usage metrics** (prompt tokens, completion tokens, total tokens)
- **Latency measurements** for API calls
- **Embeddings generation traces** with vector dimensions
- **Tool/function calling traces** with parameters and results

## Guard Module

The guard module (`guard/index.ts`) provides safety and compliance features:

```typescript
import { openlit } from 'openlit';

// Initialize with guardrails
openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

Guard capabilities include:

- Input/output validation for LLM interactions
- Content filtering hooks
- Rate limiting enforcement
- Custom rule application

资料来源：[sdk/typescript/src/guard/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/guard/index.ts)

## Initialization Flow

```mermaid
sequenceDiagram
    participant App as Application
    participant SDK as OpenLIT SDK
    participant Config as Config Module
    participant Inst as Instrumentation
    participant OTEL as OTEL SDK
    
    App->>SDK: openlit.init(options)
    SDK->>Config: Validate & merge config
    Config->>Config: Check env vars
    Config-->>SDK: Resolved config
    SDK->>OTEL: Initialize OTEL SDK
    SDK->>Inst: Register instrumentations
    Inst->>OTEL: Add span processors
    OTEL-->>SDK: Ready
    SDK-->>App: Initialization complete
```

## Environment Variable Support

The SDK supports configuration via environment variables as an alternative to programmatic configuration:

| Environment Variable | Description |
|----------------------|-------------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint URL |
| `OTEL_SERVICE_NAME` | Service name for traces |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:42](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)

## Usage Patterns

### Basic Initialization

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

### OpenAI Integration Example

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:28-39](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)

## Package Dependencies

Key dependencies in `package.json`:

```json
{
  "dependencies": {
    "@opentelemetry/sdk-node": "^0.50.0",
    "@opentelemetry/exporter-trace-otlp-http": "^0.50.0",
    "@opentelemetry/resources": "^1.22.0",
    "@opentelemetry/semantic-conventions": "^1.22.0"
  }
}
```

资料来源：[sdk/typescript/package.json](https://github.com/openlit/openlit/blob/main/sdk/typescript/package.json)

## Design Principles

1. **Zero-Configuration Defaults**: The SDK works out-of-the-box with sensible defaults
2. **OpenTelemetry Native**: Built on OTEL SDK for vendor-agnostic telemetry export
3. **Automatic Instrumentation**: No code changes required for supported libraries
4. **Environment Variable Fallback**: Configuration can be entirely environment-based
5. **Minimal Footprint**: Instrumentation adds minimal latency overhead

## Summary

The OpenLIT TypeScript SDK architecture provides a developer-friendly interface for adding observability to GenAI applications. By abstracting OpenTelemetry complexity and providing automatic instrumentation for popular LLM providers and vector databases, it enables comprehensive telemetry collection with minimal configuration. The SDK exports all data via OTLP, ensuring compatibility with OpenLIT's backend as well as any other OTEL-compatible observability platform.

---

<a id='go-sdk'></a>

## Go SDK Architecture

### 相关页面

相关主题：[Python SDK Architecture](#python-sdk), [TypeScript SDK Architecture](#typescript-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)
- [sdk/go/go.mod](https://github.com/openlit/openlit/blob/main/sdk/go/go.mod)
</details>

# Go SDK Architecture

## Overview

The OpenLIT Go SDK is a lightweight instrumentation library that enables observability for GenAI applications built with Go. It provides automatic tracing and metrics collection for LLM calls, supporting OpenAI and Anthropic providers out of the box. The SDK follows OpenTelemetry-native principles, allowing seamless integration with the OpenLIT observability platform.

## Core Components

The Go SDK is organized into several key packages:

| Component | Purpose |
|-----------|---------|
| `openlit` | Core initialization, configuration, and shutdown |
| `openlit.Config` | Central configuration struct for SDK settings |
| `openlit.EvaluateRule()` | Standalone rule engine evaluation function |
| `instrumentation/openai` | OpenAI client instrumentation |
| `instrumentation/anthropic` | Anthropic client instrumentation |

## Initialization Flow

The SDK must be initialized before instrumenting any LLM clients. The initialization process configures the OTLP endpoint and establishes the connection to the OpenLIT backend.

```go
err := openlit.Init(openlit.Config{
    OtlpEndpoint:    "http://127.0.0.1:4318",
    Environment:     "production",
    ApplicationName: "my-go-app",
})
if err != nil {
    log.Fatalf("Failed to initialize OpenLIT: %v", err)
}
defer openlit.Shutdown(context.Background())
```

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

## Configuration Options

The `openlit.Config` struct provides the following configuration parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `OtlpEndpoint` | `string` | OTLP collector endpoint (default: `http://127.0.0.1:4318`) |
| `Environment` | `string` | Deployment environment name |
| `ApplicationName` | `string` | Application identifier for grouping traces |
| `PricingInfo` | `map[string]ModelPricing` | Custom pricing configuration per model |
| `OtlpHeaders` | `map[string]string` | Custom headers for OTLP exports |

### Custom Pricing Configuration

The SDK supports custom pricing information for models that require non-default cost calculations:

```go
config := openlit.Config{
    PricingInfo: map[string]openlit.ModelPricing{
        "gpt-4-custom": {
            InputCostPerToken:  0.00003,
            OutputCostPerToken: 0.00006,
        },
    },
}
```

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

### Custom Headers for OTLP Exports

Authentication and custom headers can be added to OTLP exports:

```go
config := openlit.Config{
    OtlpHeaders: map[string]string{
        "Authorization": "Bearer token",
        "X-Custom-Header": "value",
    },
}
```

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

## Instrumentation Architecture

The SDK uses a decorator/wrapper pattern for instrumenting LLM clients. This approach allows automatic tracing without modifying the original client interface.

```mermaid
graph TD
    A[User Application] --> B[Instrumented Client]
    B --> C[Original SDK Client]
    B --> D[OpenLIT Tracer]
    D --> E[OTLP Exporter]
    E --> F[OpenLIT Backend]
    C --> G[LLM Provider API]
    G --> C
```

### OpenAI Instrumentation

The OpenAI instrumentation wraps the `sashabaranov/go-openai` client:

```go
import (
    "github.com/openlit/openlit/sdk/go/instrumentation/openai"
    openai_sdk "github.com/sashabaranov/go-openai"
)

// Create and instrument OpenAI client
client := openai_sdk.NewClient("your-api-key")
instrumentedClient := openai.Instrument(client)

// Use as normal - automatically traced!
resp, err := instrumentedClient.CreateChatCompletion(ctx, openai_sdk.ChatCompletionRequest{
    Model: openai_sdk.GPT4,
    Messages: []openai_sdk.ChatCompletionMessage{
        {
            Role:    openai_sdk.ChatMessageRoleUser,
            Content: "Hello!",
        },
    },
})
```

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

### Anthropic Instrumentation

The Anthropic instrumentation follows the same pattern:

```go
import (
    "github.com/openlit/openlit/sdk/go/instrumentation/anthropic"
)

// Create and instrument Anthropic client
client := anthropic.NewClient("your-api-key")
instrumentedClient := anthropic.Instrument(client)
```

## Rule Engine Integration

The SDK provides a standalone rule evaluation function that does not require initialization:

```go
// EvaluateRule does NOT require openlit.Init()
rules, err := openlit.EvaluateRule(ctx, &openlit.EvaluateRuleRequest{
    TraceAttributes: attributes,
})
```

This function evaluates trace attributes against the OpenLIT Rule Engine to retrieve matching rules and associated entities including contexts, prompts, and evaluation configurations.

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

## Integration with OpenLIT Dashboard

The complete observability workflow involves:

1. **Start OpenLIT Stack**: Deploy using Docker Compose
   ```bash
   docker compose up -d
   ```

2. **Configure SDK**: Initialize the Go SDK with the OTLP endpoint
   ```go
   openlit.Init(openlit.Config{
       OtlpEndpoint: "http://localhost:4318",
   })
   ```

3. **View Traces**: Access the dashboard at `http://localhost:3000`

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

## Example Projects

The SDK includes complete working examples in the `examples/` directory:

| Example | Path |
|---------|------|
| OpenAI Chat Completion | `examples/openai/chat/` |
| OpenAI Streaming | `examples/openai/streaming/` |
| Anthropic Messages | `examples/anthropic/messages/` |
| Anthropic Streaming | `examples/anthropic/streaming/` |

## Module Dependencies

The Go SDK depends on core OpenTelemetry packages for trace export and propagation:

- OpenTelemetry OTLP exporter
- OpenTelemetry trace propagation
- Context propagation utilities

资料来源：[sdk/go/go.mod](https://github.com/openlit/openlit/blob/main/sdk/go/go.mod)

---

<a id='integrations'></a>

## LLM and Framework Integrations

### 相关页面

相关主题：[Python SDK Architecture](#python-sdk), [TypeScript SDK Architecture](#typescript-sdk)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)
- [sdk/python/src/openlit/instrumentation/llamaindex/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/llamaindex/utils.py)
- [sdk/python/src/openlit/_config.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/llamaindex/llamaindex.py)
- [sdk/python/src/openlit/__helpers.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/__helpers.py)
- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
</details>

# LLM and Framework Integrations

OpenLIT provides comprehensive instrumentation for a wide range of LLMs and AI frameworks, enabling automatic OpenTelemetry-native observability for GenAI applications. This page documents the architecture, supported integrations, and implementation patterns.

## Overview

OpenLIT's instrumentation layer wraps SDK calls from various LLM providers and AI frameworks to automatically capture traces and metrics without requiring manual instrumentation code.

### Supported Integrations

| Category | Integration | Python SDK | TypeScript SDK | Go SDK |
|----------|-------------|------------|----------------|--------|
| **LLM Providers** | OpenAI | ✅ | ✅ | ✅ |
| **LLM Providers** | Anthropic | ✅ | ✅ | ✅ |
| **LLM Providers** | Azure OpenAI | ✅ | ✅ | ✅ |
| **LLM Providers** | Vertex AI | ✅ | ✅ | ✅ |
| **LLM Providers** | Mistral AI | ✅ | ✅ | ✅ |
| **LLM Providers** | Cohere | ✅ | ✅ | ✅ |
| **LLM Providers** | HuggingFace | ✅ | ✅ | ✅ |
| **AI Frameworks** | LangChain | ✅ | ✅ | - |
| **AI Frameworks** | LlamaIndex | ✅ | - | - |
| **AI Frameworks** | CrewAI | ✅ | - | - |
| **AI Frameworks** | LangGraph | ✅ | - | - |
| **AI Frameworks** | Claude Agent SDK | ✅ | - | - |
| **Vector Stores** | Pinecone | ✅ | - | - |
| **Vector Stores** | Chroma | ✅ | - | - |
| **Vector Stores** | Qdrant | ✅ | - | - |
| **Vector Stores** | Weaviate | ✅ | - | - |

资料来源：[sdk/python/README.md](https://github.com/openlit/openlit/blob/main/sdk/python/README.md)

## Architecture

### Instrumentation Pattern

All instrumentations follow a consistent pattern based on OpenTelemetry's `BaseInstrumentor` class:

```mermaid
graph TD
    A[Application Code] --> B[Instrumented SDK]
    B --> C[Wrapper Function]
    C --> D[OpenTelemetry Tracer]
    C --> E[Metrics Recorder]
    D --> F[OTLP Exporter]
    E --> F
    F --> G[OpenLIT Backend]
```

### Core Components

| Component | Purpose | Location |
|-----------|---------|----------|
| `BaseInstrumentor` | Base class for all instrumentors | `opentelemetry.instrumentation.instrumentor` |
| `wrap_function_wrapper` | Wraps SDK functions dynamically | `wrapt` library |
| `OpenlitConfig` | Singleton configuration management | `sdk/python/src/openlit/_config.py` |
| Semantic Conventions | Standardized attribute naming | `openlit.semcov` module |

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:17-21](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)

## Python SDK Instrumentation

### Instrumentor Base Class

All Python SDK instrumentors extend `BaseInstrumentor` and implement two required methods:

```python
class ClaudeAgentSDKInstrumentor(BaseInstrumentor):
    """OTel GenAI semantic convention compliant instrumentor for Claude Agent SDK."""

    def instrumentation_dependencies(self) -> Collection[str]:
        return _instruments  # e.g., ("claude-agent-sdk >= 0.1.0",)

    def _instrument(self, **kwargs):
        # Initialize tracer, config, and wrap functions
```

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:26-35](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)

### Initialization Parameters

When calling `openlit.init()`, the following parameters are passed to all instrumentors:

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `environment` | `str` | Deployment environment name | `"default"` |
| `application_name` | `str` | Application identifier | `"default"` |
| `pricing_info` | `Dict[str, ModelPricing]` | Custom model pricing | `{}` |
| `capture_message_content` | `bool` | Enable/disable content tracing | `True` |
| `disable_metrics` | `bool` | Disable metrics collection | `None` |
| `otlp_endpoint` | `str` | OTLP exporter endpoint | Configured endpoint |

资料来源：[sdk/python/src/openlit/_config.py:20-35](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/_config.py)

### OpenlitConfig Singleton

The `OpenlitConfig` class manages centralized configuration:

```python
class OpenlitConfig:
    """Singleton configuration class for OpenLIT."""
    
    _instance = None
    
    # Class-level attributes
    environment = "default"
    application_name = "default"
    pricing_info = {}
    metrics_dict = {}
    otlp_endpoint = None
    otlp_headers = None
    disable_batch = False
    capture_message_content = True
```

资料来源：[sdk/python/src/openlit/_config.py:18-42](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/_config.py)

## LlamaIndex Integration

### Operation Type Mapping

The LlamaIndex instrumentation uses a semantic convention-based operation mapping system:

```mermaid
graph LR
    A[Document Operations] --> B[RETRIEVE]
    A --> C[FRAMEWORK]
    D[Index Operations] --> C
    E[Query Operations] --> B
    F[Retriever Operations] --> B
```

### Supported Operations

| Operation | Semantic Convention | Category |
|-----------|---------------------|----------|
| `document_load` | `RETRIEVE` | Document Loading |
| `document_transform` | `FRAMEWORK` | Document Processing |
| `document_split` | `FRAMEWORK` | Document Processing |
| `index_construct` | `FRAMEWORK` | Index Management |
| `index_insert` | `FRAMEWORK` | Index Management |
| `query_engine_query` | `RETRIEVE` | Query Engine |
| `retriever_retrieve` | `RETRIEVE` | Retrieval |

资料来源：[sdk/python/src/openlit/instrumentation/llamaindex/utils.py:1-30](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/llamaindex/utils.py)

## Helper Functions

### Building Tool Definitions

The `__helpers.py` module provides utilities for extracting tool definitions from chat requests:

```python
def build_tool_definitions(tools):
    """
    Extract tool/function definitions from a chat request's ``tools`` parameter.
    
    Supports both OpenAI-style schema and flat schema formats.
    """
```

Supported formats:

| Format | Structure |
|--------|-----------|
| OpenAI-style | `{"type": "function", "function": {...}}` |
| Flat (dict) | `{"name": ..., "description": ..., "parameters": ...}` |
| Flat (object) | Object with `name`, `description`, `input_schema` attributes |

资料来源：[sdk/python/src/openlit/__helpers.py:1-40](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/__helpers.py)

### System Instructions Builder

Extracts and formats system instructions from various input formats:

```python
def build_system_instructions(instructions, **kwargs):
    """Builds system instructions from various input formats."""
```

## Guardrails Integration

OpenLIT includes a production-grade guardrails system:

### Available Guards

| Guard Class | Purpose |
|-------------|---------|
| `PII` | Detect and redact Personally Identifiable Information |
| `PromptInjection` | Detect prompt injection attacks |
| `SensitiveTopic` | Filter sensitive topics |
| `TopicRestriction` | Restrict to allowed topics |
| `Moderation` | Content moderation |
| `Schema` | Output schema validation |
| `Custom` | Custom guard implementation |

资料来源：[sdk/python/src/openlit/guard/__init__.py:1-30](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)

### Guard Architecture

```mermaid
graph TD
    A[User Input] --> B[Pipeline]
    B --> C[Guard 1: PII]
    C --> D[Guard 2: PromptInjection]
    D --> E[Guard N: Custom]
    E --> F[GuardResult]
    C -.->|Denied| G[GuardDeniedError]
    D -.->|Timeout| H[GuardTimeoutError]
```

### Usage Example

```python
import openlit

# Initialize with guards
openlit.init(guards=[openlit.PII(action="redact")])

# Or with direct imports
from openlit import PII, PromptInjection, Moderation

guards = [PII(), PromptInjection(), Moderation()]
openlit.init(guards=guards)
```

## TypeScript SDK Instrumentation

### Wrapper Pattern

The TypeScript SDK uses a similar wrapping pattern:

```typescript
// Wrapped in wrapper.ts for each integration
export function wrapOpenAI() {
  // Wrap OpenAI SDK methods
}
```

资料来源：[sdk/typescript/src/instrumentation/openai/wrapper.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/openai/wrapper.ts)

### Initialization

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

## Configuration Reference

### Environment Variables

| Variable | Description | Example |
|----------|-------------|---------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint URL | `http://127.0.0.1:4318` |
| `OTEL_EXPORTER_OTLP_HEADERS` | Authentication headers | `Authorization=Bearer token` |

### SDK Configuration Options

```python
import openlit

openlit.init(
    otlp_endpoint="http://127.0.0.1:4318",
    otlp_headers={"Authorization": "Bearer token"},
    environment="production",
    application_name="my-llm-app",
    pricing_info={
        "gpt-4": {"input_cost_per_token": 0.00003, "output_cost_per_token": 0.00006}
    },
    capture_message_content=True
)
```

## Best Practices

### 1. Instrument Before Usage

Always initialize OpenLIT before importing instrumented SDKs:

```python
# Correct order
import openlit
openlit.init(otlp_endpoint="http://127.0.0.1:4318")

from openai import OpenAI  # Now automatically instrumented
```

### 2. Custom Pricing

Define custom pricing for accurate cost tracking:

```python
openlit.init(
    pricing_info={
        "custom-model": {
            "input_cost_per_token": 0.00001,
            "output_cost_per_token": 0.00002
        }
    }
)
```

### 3. Selective Content Capture

Disable content capture for sensitive data:

```python
openlit.init(
    capture_message_content=False  # Won't trace message content
)
```

## See Also

- [OpenLIT Python SDK Documentation](https://github.com/openlit/openlit/tree/main/sdk/python)
- [OpenLIT TypeScript SDK Documentation](https://github.com/openlit/openlit/tree/main/sdk/typescript)
- [OpenLIT Go SDK Documentation](https://github.com/openlit/openlit/tree/main/sdk/go)
- [OpenTelemetry Semantic Conventions](https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/)

---

<a id='controller'></a>

## OpenLIT Controller

### 相关页面

相关主题：[GPU Collector](#gpu-collector)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [openlit-controller/cmd/controller/main.go](https://github.com/openlit/openlit/blob/main/openlit-controller/cmd/controller/main.go)
- [openlit-controller/internal/engine/engine.go](https://github.com/openlit/openlit/blob/main/openlit-controller/internal/engine/engine.go)
- [openlit-controller/internal/engine/lifecycle.go](https://github.com/openlit/openlit/blob/main/openlit-controller/internal/engine/lifecycle.go)
- [openlit-controller/internal/engine/python_sdk_runtime.go](https://github.com/openlit/openlit/blob/main/openlit-controller/internal/engine/python_sdk_runtime.go)
- [openlit-controller/internal/server/handlers.go](https://github.com/openlit/openlit/blob/main/openlit-controller/internal/server/handlers.go)
- [openlit-controller/internal/scanner/scanner.go](https://github.com/openlit/openlit/blob/main/openlit-controller/internal/scanner/scanner.go)
</details>

# OpenLIT Controller

The OpenLIT Controller is a standalone, lightweight binary agent designed to automatically instrument Python-based LLM applications with OpenLIT's observability SDK. It operates as a background service that runs alongside your application, providing seamless OpenTelemetry-native tracing and metrics collection without requiring code modifications.

## Overview

The Controller serves as an autonomous agent that:

- **Discovers** Python applications running in various environments (bare metal, containers, Kubernetes)
- **Injects** the OpenLIT Python SDK into target applications at runtime
- **Manages** the lifecycle of instrumentation (enable, disable, status monitoring)
- **Reports** service metadata back to the OpenLIT platform

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:1-60]()

## Architecture

```mermaid
graph TD
    A[OpenLIT Platform] -->|Manage & Monitor| B[OpenLIT Controller]
    B -->|Discover Services| C[Scanner Module]
    B -->|Instrument Apps| D[Engine Module]
    D -->|Python SDK Injection| E[Python Runtime]
    E -->|Traces & Metrics| F[OpenTelemetry Collector]
    
    G[Kubernetes Pod] -->|Contains| H[Python Application]
    H -->|Auto-instrumented by| D
    
    I[Linux Host] -->|Systemd Service| B
```

### Core Components

| Component | Location | Responsibility |
|-----------|----------|----------------|
| **cmd/controller** | `cmd/controller/main.go` | Entry point, configuration, signal handling |
| **Server** | `internal/server/handlers.go` | HTTP API for platform communication |
| **Engine** | `internal/engine/engine.go` | Orchestrates instrumentation operations |
| **Lifecycle** | `internal/engine/lifecycle.go` | Manages enable/disable transitions |
| **Python SDK Runtime** | `internal/engine/python_sdk_runtime.go` | Runtime injection of Python SDK |
| **Scanner** | `internal/scanner/scanner.go` | Discovers Python applications |

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:1-25]()

## Supported Environments

The Controller supports multiple deployment scenarios:

| Environment | Installation Method | Status |
|-------------|---------------------|--------|
| **Linux (systemd)** | Direct binary download + systemd service | ✅ Primary |
| **Docker** | Privileged container with PID host mode | ✅ Supported |
| **Kubernetes** | DaemonSet or sidecar pattern | ✅ Supported |

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:1-50]()

## Installation

### Linux (systemd)

Download the latest binary and configure as a systemd service:

```bash
curl -fsSL https://github.com/openlit/openlit/releases/latest/download/openlit-controller-linux-amd64 \
  -o /usr/local/bin/openlit-controller
chmod +x /usr/local/bin/openlit-controller

# Create systemd service
cat > /etc/systemd/system/openlit-controller.service << 'EOF'
[Unit]
Description=OpenLIT Controller
After=network.target

[Service]
Environment="OPENLIT_URL=${openlitUrl}"
Environment="OTEL_EXPORTER_OTLP_ENDPOINT=${openlitUrl.replace(/:\d+$/, ":4318")}"
Environment="OPENLIT_API_KEY=${apiKey}"
ExecStart=/usr/local/bin/openlit-controller
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now openlit-controller
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:10-35]()

### Docker

```bash
docker run -d --privileged --pid=host \
  -e OPENLIT_URL=http://openlit:3000 \
  -e OTEL_EXPORTER_OTLP_ENDPOINT=http://openlit:4318 \
  openlit-controller
```

## Configuration

The Controller is configured via environment variables:

| Environment Variable | Description | Required |
|---------------------|-------------|----------|
| `OPENLIT_URL` | URL of the OpenLIT platform | Yes |
| `OPENLIT_API_KEY` | API key for authentication | No |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for telemetry | Yes |

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:15-25]()

## Agent Operations

The Controller exposes three primary operations:

### Enable Instrumentation

Activates OpenLIT SDK injection for target Python applications.

```json
{
  "operation": "enable",
  "serviceId": "string"
}
```

### Disable Instrumentation

Deactivates SDK injection and removes runtime hooks.

```json
{
  "operation": "disable",
  "serviceId": "string"
}
```

### Status Check

Retrieves current instrumentation state for a service.

```json
{
  "operation": "status",
  "serviceId": "string"
}
```

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:25-45]()

## Service State Model

```mermaid
stateDiagram-v2
    [*] --> disabled: Initial State
    disabled --> enabled: enable operation
    enabled --> disabled: disable operation
    enabled --> manual: explicit override
    manual --> enabled: resume auto
    disabled --> manual: partial config
    manual --> disabled: full removal
```

### State Definitions

| State | Description |
|-------|-------------|
| `enabled` | SDK actively injecting traces |
| `disabled` | No instrumentation active |
| `manual` | User-controlled state (not auto-managed) |
| `automatable` | Service eligible for auto-instrumentation |

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:15-30]()

## Python SDK Runtime Integration

The Controller's Python SDK Runtime module handles the actual SDK injection:

1. **Process Discovery**: Identifies Python processes running user applications
2. **Runtime Injection**: Injects OpenLIT SDK using Python's import hooks
3. **Configuration Propagation**: Sets OTLP endpoint and API keys via environment
4. **Health Monitoring**: Ensures instrumentation remains active

The runtime is specifically optimized for **Python-only** services:

```typescript
supported: service.language_runtime === "python"
```

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:20]()

## Kubernetes Integration

When running in Kubernetes, the Controller respects workload metadata:

| Attribute | Description |
|-----------|-------------|
| `k8s.workload.kind` | Workload type (Deployment, StatefulSet, etc.) |
| `service.service_name` | Name of the service |
| `service.namespace` | Kubernetes namespace |

### Naked Pod Handling

The Controller automatically detects and handles "naked pods" (pods without a workload controller):

```typescript
const isNakedPod = mode === "kubernetes" && (!workloadKind || workloadKind === "Pod");
```

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:8-12]()

## Validation

Operations are validated before execution:

```typescript
validatePayload(operation: string, _payload: Record<string, unknown>) {
    if (
        operation !== "enable" &&
        operation !== "disable" &&
        operation !== "status"
    ) {
        return `Unknown operation "${operation}" for feature "${FEATURE}". 
                Expected "enable", "disable", or "status".`;
    }
    return null;
}
```

资料来源：[src/client/src/lib/platform/controller/features/agent.ts:28-40]()

## Summary

The OpenLIT Controller is a critical component for zero-code instrumentation of Python LLM applications. It provides:

- **Automated Discovery**: Scans and identifies Python services automatically
- **Runtime Injection**: Injects observability SDK without application restarts
- **Multi-Platform Support**: Works on Linux, Docker, and Kubernetes
- **Platform Integration**: Connects to OpenLIT platform for centralized management
- **Lifecycle Management**: Full control over enable/disable operations

---

<a id='gpu-collector'></a>

## GPU Collector

### 相关页面

相关主题：[OpenLIT Controller](#controller), [System Architecture](#architecture)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)
- [opentelemetry-gpu-collector/cmd/collector/main.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/cmd/collector/main.go)
- [opentelemetry-gpu-collector/internal/gpu/nvidia/nvidia.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/internal/gpu/nvidia/nvidia.go)
- [opentelemetry-gpu-collector/internal/gpu/amd/amd.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/internal/gpu/amd/amd.go)
- [opentelemetry-gpu-collector/internal/gpu/intel/intel.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/internal/gpu/intel/intel.go)
- [opentelemetry-gpu-collector/internal/ebpf/tracer.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/internal/ebpf/tracer.go)
- [opentelemetry-gpu-collector/internal/export/metrics.go](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/internal/export/metrics.go)
</details>

# GPU Collector

The **OpenTelemetry GPU Collector** (also referred to as `opentelemetry-gpu-collector`) is a specialized telemetry agent built and maintained by OpenLIT. It provides real-time GPU hardware telemetry collection for NVIDIA, AMD, and Intel GPUs, emitting metrics in compliance with the OpenTelemetry semantic conventions under the `hw.gpu.*` namespace.

## Overview

The GPU Collector serves as a standalone service that monitors GPU hardware metrics and exports them via the OTLP protocol to any OpenTelemetry-compatible backend, including the OpenLIT observability platform.

**Key Responsibilities:**

- Collect GPU hardware telemetry from NVIDIA GPUs via NVML (NVIDIA Management Library)
- Collect GPU hardware telemetry from AMD and Intel GPUs via `sysfs/hwmon` interfaces
- Perform eBPF-based CUDA kernel tracing for detailed operation insights
- Emit metrics following OpenTelemetry semantic conventions (`hw.gpu.*`)
- Export metrics over OTLP for integration with observability platforms

**License:** Apache-2.0

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Architecture

The GPU Collector follows a modular architecture with distinct internal components for GPU detection, metric collection, and telemetry export.

```mermaid
graph TD
    subgraph GPU Collector
        A[main.go] --> B[GPU Detection Layer]
        B --> C[NVIDIA Provider]
        B --> D[AMD Provider]
        B --> E[Intel Provider]
        C --> F[NVML Interface]
        D --> G[sysfs/hwmon]
        E --> G
        C --> H[Metrics Processor]
        D --> H
        E --> H
        F --> H
        G --> H
        H --> I[eBPF Tracer]
        H --> J[OTLP Exporter]
        I --> J
    end
    
    K[OpenTelemetry Backend] --> J
    L[OpenLIT Dashboard] --> K
```

### Core Components

| Component | Path | Purpose |
|-----------|------|---------|
| Entry Point | `cmd/collector/main.go` | Application initialization and configuration |
| NVIDIA Provider | `internal/gpu/nvidia/nvidia.go` | NVML-based telemetry collection for NVIDIA GPUs |
| AMD Provider | `internal/gpu/amd/amd.go` | sysfs/hwmon-based telemetry for AMD GPUs |
| Intel Provider | `internal/gpu/intel/intel.go` | sysfs/hwmon-based telemetry for Intel GPUs |
| eBPF Tracer | `internal/ebpf/tracer.go` | CUDA kernel tracing via eBPF |
| Metrics Exporter | `internal/export/metrics.go` | OTLP metric export logic |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Supported Hardware and Vendors

The GPU Collector supports GPU telemetry collection from three major hardware vendors.

### Vendor Support Matrix

| Vendor | Collection Method | Status | Features |
|--------|------------------|--------|----------|
| **NVIDIA** | NVML (NVIDIA Management Library) | Done | Power, energy, clock, utilization, errors |
| **AMD** | sysfs/hwmon | Done | Power, energy, clock, utilization |
| **Intel** | sysfs/hwmon | Done | Power, clock, utilization* |

*Intel support depends on driver (i915/Xe) and kernel version.

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

### Hardware Telemetry Features

| Feature | Status |
|---------|--------|
| NVIDIA GPU hardware telemetry (NVML) | Done |
| AMD GPU hardware telemetry (sysfs/hwmon) | Done |
| Intel GPU hardware telemetry (sysfs/hwmon) | Done |
| eBPF CUDA kernel tracing | Done |
| OTel semantic convention compliance (`hw.gpu.*`) | Done |
| Prometheus `/metrics` endpoint | Planned |
| ROCm HIP tracing (AMD eBPF) | Planned |
| Per-process GPU utilization (DRM fdinfo) | Planned |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Metrics Reference

All GPU metrics follow the OpenTelemetry semantic conventions with the `hw.gpu.*` prefix.

### Metric Definitions

| Metric Name | Type | Unit | Description | NVIDIA | AMD | Intel |
|-------------|------|------|-------------|--------|-----|-------|
| `hw.gpu.power.draw` | Gauge | W | Current power draw | Yes | Yes | Yes |
| `hw.gpu.power.limit` | Gauge | W | Power limit/cap | Yes | Yes | Yes |
| `hw.gpu.energy.consumed` | Counter | J | Cumulative energy consumed | Yes | Yes | Yes |
| `hw.gpu.clock.graphics` | Gauge | MHz | Graphics/SM clock frequency | Yes | Yes | —* |
| `hw.gpu.clock.memory` | Gauge | MHz | Memory clock frequency | Yes | Yes | — |
| `hw.errors` | Counter | {error} | ECC and PCIe errors via `error.type` + `hw.type=gpu` | Yes | — | — |

*Intel support depends on driver (i915/Xe) and kernel version.

### Utilization Metrics

| Metric | Extra Attribute | Values |
|--------|-----------------|--------|
| `hw.gpu.utilization` | `hw.gpu.task` | `general`, `encoder`, `decoder` |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Attributes

All GPU metrics include the following attributes for device identification and categorization.

### Common Attributes

| Attribute | Description | Example |
|-----------|-------------|---------|
| `hw.id` | Unique device identifier (required by spec) | `GPU-a1b2c3d4-...` |
| `hw.name` | Product name | `NVIDIA A100-SXM4-80GB` |
| `hw.vendor` | Vendor name | `nvidia`, `amd`, `intel` |
| `gpu.index` | Device index | `0`, `1` |
| `gpu.pci_address` | PCI bus address | `0000:01:00.0` |

### Error Attributes

| Attribute | Description |
|-----------|-------------|
| `error.type` | Type of hardware error |
| `hw.type` | Set to `gpu` for GPU-specific errors |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Deployment Options

The GPU Collector can be deployed using multiple methods based on infrastructure requirements.

### Docker

```bash
docker run -d \
    --name otel-gpu-collector \
    --restart always \
    --gpus all \
    -e OTEL_SERVICE_NAME=my-gpu-app \
    -e OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production \
    -e OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317" \
    ghcr.io/openlit/otel-gpu-collector:latest
```

### Docker Compose

```yaml
services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production"
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - otel-collector
    restart: always
```

### Pre-built Binary

```sh
# Linux amd64
curl -L https://github.com/openlit/openlit/releases/latest/download/opentelemetry-gpu-collector-<version>-linux-amd64 \
    -o opentelemetry-gpu-collector
chmod +x opentelemetry-gpu-collector

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ./opentelemetry-gpu-collector
```

### Build from Source

```sh
git clone https://github.com/openlit/openlit.git
cd openlit/opentelemetry-gpu-collector
make build
./opentelemetry-gpu-collector
```

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Configuration

The GPU Collector uses standard OpenTelemetry environment variables for configuration.

### Configuration Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | *(required)* | OTLP exporter endpoint |
| `OTEL_SERVICE_NAME` | — | Service name for telemetry |
| `OTEL_RESOURCE_ATTRIBUTES` | — | Additional resource attributes |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

## Data Flow

```mermaid
graph LR
    A[GPU Hardware] -->|NVML/sysfs| B[GPU Provider]
    B -->|Raw Metrics| C[Metrics Processor]
    D[eBPF Kernel Tracer] -->|Kernel Events| C
    C -->|Structured Metrics| E[OTLP Exporter]
    E -->|OTLP Protocol| F[OpenTelemetry Backend]
    F --> G[OpenLIT Dashboard]
```

### Collection Pipeline

1. **GPU Detection**: The collector detects available GPUs on the host system
2. **Vendor-specific Collection**: Each GPU type uses its native interface:
   - NVIDIA: NVML API calls
   - AMD/Intel: Reading from `/sys/class/hwmon/`
3. **Metric Processing**: Raw values are transformed into OpenTelemetry metric format
4. **eBPF Enrichment**: CUDA kernel tracing data enriches the telemetry
5. **OTLP Export**: Metrics are exported to the configured endpoint

---

## Integration with OpenLIT

The GPU Collector integrates seamlessly with the OpenLIT observability platform for GPU monitoring.

```mermaid
graph TD
    subgraph Collection Layer
        A[GPU Collector] -->|OTLP|gRPC[OTLP gRPC]
        A -->|OTLP|HTTP[OTLP HTTP]
    end
    
    subgraph OpenLIT Stack
        B[OpenLIT Backend] --> C[PostgreSQL]
        B --> D[ClickHouse]
        B --> E[Redis]
    end
    
    gRPC --> B
    HTTP --> B
    B --> F[OpenLIT Dashboard:3000]
```

### Prerequisites

1. Deploy the OpenLIT stack using Docker Compose:
   ```bash
   docker compose up -d
   ```

2. Configure the GPU Collector endpoint:
   ```bash
   OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ./opentelemetry-gpu-collector
   ```

3. Access the OpenLIT Dashboard at `http://localhost:3000`

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)

---

## Project Structure

```
opentelemetry-gpu-collector/
├── cmd/
│   └── collector/
│       └── main.go              # Application entry point
├── internal/
│   ├── gpu/
│   │   ├── nvidia/
│   │   │   └── nvidia.go        # NVIDIA GPU provider (NVML)
│   │   ├── amd/
│   │   │   └── amd.go           # AMD GPU provider (sysfs)
│   │   └── intel/
│   │       └── intel.go        # Intel GPU provider (sysfs)
│   ├── ebpf/
│   │   └── tracer.go           # eBPF CUDA kernel tracer
│   └── export/
│       └── metrics.go           # OTLP metrics exporter
├── Dockerfile
├── Makefile
└── README.md
```

---

## See Also

- [OpenLIT Documentation](https://docs.openlit.io)
- [OpenLIT GitHub Repository](https://github.com/openlit/openlit)
- [OpenTelemetry Semantic Conventions - Hardware Metrics](https://opentelemetry.io/docs/specs/semconv/hardware-metrics/)

---

---

## Doramagic 踩坑日志

项目：openlit/openlit

摘要：发现 15 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：Integration: Governance and compliance signals for LLM observability。

## 1. 安装坑 · 来源证据：Integration: Governance and compliance signals for LLM observability

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Integration: Governance and compliance signals for LLM observability
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_16e8a1979e4646f18ae6d36da1fd46fe | https://github.com/openlit/openlit/issues/1106 | 来源类型 github_issue 暴露的待验证使用条件。

## 2. 安装坑 · 来源证据：Proposal: gen_ai.agent.threat_detected span event helper for OTel-shaped detection observability

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Proposal: gen_ai.agent.threat_detected span event helper for OTel-shaped detection observability
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_9788255c9fb34a7eae64ba6413a52030 | https://github.com/openlit/openlit/issues/1186 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 3. 安装坑 · 来源证据：[Bug]: Docker Image doesn't run on windows 64bit

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: Docker Image doesn't run on windows 64bit
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_e25a08120daf4deb81b9193aeab1f929 | https://github.com/openlit/openlit/issues/786 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 4. 安装坑 · 来源证据：openlit-1.19.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：openlit-1.19.0
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_0504e467960f4bbe919ff101c6a14d7b | https://github.com/openlit/openlit/releases/tag/openlit-1.19.0 | 来源类型 github_release 暴露的待验证使用条件。

## 5. 配置坑 · 来源证据：controller-0.2.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：controller-0.2.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_addec19eec37420da207487d5a685eaa | https://github.com/openlit/openlit/releases/tag/controller-0.2.0 | 来源类型 github_release 暴露的待验证使用条件。

## 6. 配置坑 · 来源证据：openlit-1.20.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：openlit-1.20.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_217968c917e9426f9f8fbb4b50bebdb5 | https://github.com/openlit/openlit/releases/tag/openlit-1.20.0 | 来源类型 github_release 暴露的待验证使用条件。

## 7. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:747319327 | https://github.com/openlit/openlit | README/documentation is current enough for a first validation pass.

## 8. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | last_activity_observed missing

## 9. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:747319327 | https://github.com/openlit/openlit | no_demo; severity=medium

## 10. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:747319327 | https://github.com/openlit/openlit | no_demo; severity=medium

## 11. 安全/权限坑 · 来源证据：Bug: OpenAI API key in operator example test-application is not using OPENAI_API_KEY env var

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Bug: OpenAI API key in operator example test-application is not using OPENAI_API_KEY env var
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_bfba0945570d4cbbaead1257e8f70dfe | https://github.com/openlit/openlit/issues/1135 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 12. 安全/权限坑 · 来源证据：openlit-1.19.1

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：openlit-1.19.1
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_b5088506959947828f2d740f9297d5b5 | https://github.com/openlit/openlit/releases/tag/openlit-1.19.1 | 来源类型 github_release 暴露的待验证使用条件。

## 13. 安全/权限坑 · 来源证据：py-1.41.2

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：py-1.41.2
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ff3f4dfa2dc04616be73482b2145ac5c | https://github.com/openlit/openlit/releases/tag/py-1.41.2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 14. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | issue_or_pr_quality=unknown

## 15. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | release_recency=unknown

<!-- canonical_name: openlit/openlit; human_manual_source: deepwiki_human_wiki -->