# https://github.com/openlit/openlit 项目说明书

生成时间：2026-05-16 21:10:55 UTC

## 目录

- [项目概述](#project-overview)
- [系统架构](#system-architecture)
- [Python SDK](#python-sdk)
- [TypeScript SDK](#typescript-sdk)
- [Go SDK](#go-sdk)
- [可观测性功能](#observability)
- [评估功能](#evaluations)
- [Guardrails与规则引擎](#guardrails)
- [OpenLIT Controller](#controller)
- [GPU Collector](#gpu-collector)

<a id='project-overview'></a>

## 项目概述

### 相关页面

相关主题：[系统架构](#system-architecture), [Python SDK](#python-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx)
- [src/client/src/app/(playground)/evaluations/types/new/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/evaluations/types/new/page.tsx)
- [src/client/src/components/(playground)/agents/version-drawer.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/version-drawer.tsx)
- [src/client/src/app/(playground)/context/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/context/page.tsx)
- [src/client/src/components/(auth)/auth-form.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(auth)/auth-form.tsx)
- [src/client/src/app/(playground)/pricing/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/pricing/page.tsx)
</details>

# 项目概述

## 1. 项目简介

OpenLIT 是一个基于 OpenTelemetry 原生设计的 **GenAI 和 LLM 应用程序可观测性工具**。它旨在简化将可观测性功能集成到 LLM 应用中的过程，使开发者能够轻松地收集、监控和分析 AI 应用产生的追踪数据和指标数据。

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:130-132]()

### 1.1 核心定位

OpenLIT 的主要目标是为 GenAI 和 LLM 应用程序提供全面的可观测性支持。通过集成 OpenTelemetry 标准，OpenLIT 能够：

- 自动收集 AI 应用中的追踪数据（Traces）
- 采集关键性能指标（Metrics）
- 提供可视化的监控仪表板
- 支持多语言 SDK（Python、TypeScript）

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:131-133]()

### 1.2 技术架构

OpenLIT 采用现代化的微服务架构设计，核心组件包括：

| 组件 | 功能描述 | 技术栈 |
|------|---------|--------|
| 前端界面 | 提供可视化监控和配置界面 | React/Next.js |
| 后端服务 | 处理数据存储和 API 请求 | Node.js/Go |
| OpenTelemetry 接收器 | 接收并处理 OTLP 数据 | OpenTelemetry SDK |
| 数据库 | 存储追踪和指标数据 | PostgreSQL/TimescaleDB |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:135-140]()

## 2. 快速部署

### 2.1 Docker Compose 部署

OpenLIT 支持通过 Docker Compose 进行快速部署，适合本地开发和测试环境。

```bash
# 克隆仓库
git clone git@github.com:openlit/openlit.git

# 启动服务
cd openlit
docker compose up -d
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:145-150]()

### 2.2 服务访问

部署完成后，通过以下地址访问 OpenLIT：

- **访问地址**：http://127.0.0.1:3000
- **OTLP 端点**：http://127.0.0.1:4318

默认登录凭证：

| 字段 | 默认值 |
|------|--------|
| 邮箱 | user@openlit.io |
| 密码 | openlituser |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:127-130]()

## 3. SDK 集成

OpenLIT 提供多语言 SDK，支持在不同技术栈中快速集成可观测性功能。

### 3.1 Python SDK

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

安装命令：

```bash
pip install openlit
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:165-175]()

### 3.2 TypeScript SDK

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

安装命令：

```bash
npm install openlit
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:85-95]()

### 3.3 环境变量配置

除代码配置外，也可通过环境变量设置 OTLP 端点：

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:178-180]()

## 4. 核心功能模块

### 4.1 追踪功能（Tracing）

OpenLIT 的追踪模块提供对 AI 应用请求链路的完整可视化。通过追踪，开发者可以：

- 查看完整的请求调用链路
- 分析各环节的延迟和性能
- 识别潜在的性能瓶颈
- 追踪 prompt 和响应内容

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:1-10]()

### 4.2 评估功能（Evaluations）

评估模块允许用户创建和管理自定义评估类型，用于衡量 AI 应用输出的质量：

| 字段 | 说明 |
|------|------|
| 名称 | 评估类型的标识名称 |
| 描述 | 评估目的和使用场景的说明 |
| 评估 Prompt | LLM 评判使用的提示词模板 |

资料来源：[src/client/src/app/(playground)/evaluations/types/new/page.tsx:1-25]()

### 4.3 上下文管理（Context）

上下文管理功能用于存储和管理 AI 应用中的共享上下文数据，包括：

- 上下文描述
- 状态管理（ACTIVE/INACTIVE）
- 创建者和创建时间
- 版本控制

资料来源：[src/client/src/app/(playground)/context/page.tsx:1-25]()

### 4.4 代理管理（Agents）

代理模块追踪和展示 AI 代理的行为信息：

| 属性 | 说明 |
|------|------|
| first_seen | 首次发现时间 |
| last_seen | 最后活动 时间 |
| request_count | 请求计数 |
| primary_model | 主要使用的模型 |

资料来源：[src/client/src/components/(playground)/agents/version-drawer.tsx:1-15]()

## 5. 认证与授权

OpenLIT 支持多种认证方式：

| 认证方式 | 描述 |
|---------|------|
| Google OAuth | 通过 Google 账户登录 |
| GitHub OAuth | 通过 GitHub 账户登录 |
| 邮箱密码 | 本地账户密码认证 |

资料来源：[src/client/src/components/(auth)/auth-form.tsx:1-20]()

## 6. 数据流向架构

```mermaid
graph TD
    A[AI Application] -->|SDK Instrumentation| B[OpenLIT SDK]
    B -->|OTLP Protocol| C[OTLP Endpoint :4318]
    C -->|Traces & Metrics| D[OpenLIT Backend]
    D -->|Storage| E[(Database)]
    D -->|Query| F[Frontend Dashboard :3000]
    G[User] -->|Authentication| H[Auth Provider]
    H -->|Session| F
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:127-145]()

## 7. 定价模式

OpenLIT 支持灵活的计费模式，包括自动计费功能。系统按以下流程运作：

1. 用户配置使用计划
2. 系统自动监控使用量
3. 实时更新计费信息
4. 支持多种支付方式

资料来源：[src/client/src/app/(playground)/pricing/page.tsx:1-20]()

## 8. SDK 使用示例

### 8.1 Python 与 OpenAI 集成

```python
import openlit
from openai import OpenAI

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "What is LLM Observability?"}]
)
```

### 8.2 TypeScript 与 OpenAI 集成

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:95-115]()

## 9. 项目结构

```
openlit/
├── sdk/
│   ├── python/          # Python SDK
│   └── typescript/      # TypeScript SDK
├── src/
│   └── client/          # 前端应用
│       └── src/
│           ├── app/             # Next.js 应用页面
│           ├── components/      # React 组件
│           └── lib/             # 工具库
├── docker-compose.yml    # Docker 编排配置
└── README.md             # 项目说明文档
```

## 10. 总结

OpenLIT 作为一个开源的 LLM 可观测性平台，通过以下优势为 AI 开发者提供价值：

- **OpenTelemetry 原生**：遵循行业标准，便于与现有监控体系集成
- **多语言支持**：提供 Python 和 TypeScript SDK，覆盖主流 AI 开发场景
- **快速部署**：支持 Docker Compose 一键部署
- **开箱即用**：提供完整的监控面板和可视化界面

开发者可以通过访问官方文档 https://docs.openlit.io 获取更多信息和技术支持。

---

<a id='system-architecture'></a>

## 系统架构

### 相关页面

相关主题：[项目概述](#project-overview), [Python SDK](#python-sdk), [OpenLIT Controller](#controller)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/app/(playground)/agents/no-controller.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/agents/no-controller.tsx)
- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [src/client/src/components/(playground)/agents/observability-block.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/observability-block.tsx)
- [src/client/src/app/(playground)/agents/controller/[instance_id]/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/agents/controller/[instance_id]/page.tsx)
- [src/client/README.md](https://github.com/openlit/openlit/blob/main/src/client/README.md)
</details>

# 系统架构

## 概述

OpenLIT 是一个基于 OpenTelemetry 原生设计的 **GenAI 和 LLM 应用可观测性平台**。平台通过采集、传输和处理来自 LLM 应用的遥测数据（Traces 和 Metrics），为开发者提供全面的 AI 应用监控能力。资料来源：[src/client/README.md:1]()

## 核心设计理念

OpenLIT 的架构遵循以下核心原则：

| 原则 | 说明 |
|------|------|
| OpenTelemetry 原生 | 完全兼容 OpenTelemetry 标准协议和 SDK |
| 无侵入集成 | 通过 SDK 初始化即可完成自动埋点 |
| 多语言支持 | 提供 Python 和 TypeScript 双语言 SDK |
| 灵活部署 | 支持 Linux、Docker 和 Kubernetes 多种部署方式 |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:42-60]()

## 系统组件架构

### 整体架构图

```mermaid
graph TD
    subgraph "LLM 应用层"
        A[Python SDK] 
        B[TypeScript SDK]
    end
    
    subgraph "数据采集层"
        C[OpenTelemetry Collector]
    end
    
    subgraph "OpenLIT 平台层"
        D[前端界面]
        E[后端服务]
        F[数据库]
    end
    
    A --> C
    B --> C
    C --> E
    E --> F
    E --> D
```

### SDK 层

OpenLIT 提供两种语言的 SDK 用于在应用中埋点：

#### Python SDK

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:70-75]()

#### TypeScript SDK

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:48-54]()

### 数据流向

```mermaid
graph LR
    A[LLM 应用] -->|OTLP Protocol| B[OTEL Collector]
    B -->|Traces| C[OpenLIT Backend]
    B -->|Metrics| C
    C -->|存储| D[(Database)]
    D -->|查询| E[Web UI]
```

## 部署架构

OpenLIT 支持三种主要的部署模式，适用于不同的基础设施环境。

### Linux 系统部署

适用于直接在 Linux 主机上运行监控代理的场景。使用 systemd 管理服务生命周期：

```bash
cat <<EOF | sudo tee /etc/systemd/system/openlit-controller.service
[Unit]
Description=OpenLIT Controller

[Service]
ExecStart=/usr/local/bin/openlit-controller
Environment="OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318"
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now openlit-controller
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:10-22]()

### Docker 容器部署

适用于 Docker 容器化环境，提供隔离的运行环境：

```bash
docker run -d --privileged --pid=host \
  -e OPENLIT_URL="${openlitUrl}" \
  -e OTEL_EXPORTER_OTLP_ENDPOINT="${openlitUrl.replace(/:\d+$/, ":4318")}" \
  -e OPENLIT_PROC_ROOT="/host/proc" \
  -v /proc:/host/proc:ro \
  -v /sys/kernel/debug:/sys/kernel/debug:ro \
  -v /sys/fs/bpf:/sys/fs/bpf:rw \
  -v /var/run/docker.sock:/var/run/docker.sock \
  ghcr.io/openlit/controller:latest
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:24-36]()

### Kubernetes 部署

适用于大规模容器编排环境，通过 Helm Chart 进行管理：

```bash
helm repo add openlit https://openlit.github.io/helm
helm repo update
helm upgrade --install openlit openlit/openlit \
  --set openlit-controller.enabled=true
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:38-45]()

### 部署模式对比

| 部署方式 | 适用场景 | 复杂度 | 资源占用 |
|----------|----------|--------|----------|
| Linux (systemd) | 物理机/虚拟机 | 低 | 中 |
| Docker | 容器化环境 | 中 | 中 |
| Kubernetes | 微服务/云原生 | 高 | 高 |

## 可观测性数据模型

### 资源属性

OpenLIT 采集多种资源属性用于标识和分类服务：

| 属性名 | 说明 | 示例 |
|--------|------|------|
| `node_name` | 节点名称 | `prod-server-01` |
| `version` | 控制器版本 | `v1.2.0` |
| `mode` | 运行环境模式 | `kubernetes` / `docker` / `linux` |
| `last_heartbeat` | 最后心跳时间 | `2024-01-15T10:30:00Z` |

资料来源：[src/client/src/app/(playground)/agents/controller/[instance_id]/page.tsx:5-8]()

### 统计指标

控制器实例页面展示以下核心指标：

| 指标名 | 说明 |
|--------|------|
| `services_discovered` | 发现的服务数量 |
| `services_instrumented` | 已接入（埋点）的服务数量 |

资料来源：[src/client/src/app/(playground)/agents/controller/[instance_id]/page.tsx:11-12]()

### 工具 Schema

AI Agent 的工具定义包含以下结构：

| 字段 | 类型 | 说明 |
|------|------|------|
| `description` | string | 工具功能描述 |
| `schema` | JSON | 工具参数 JSON Schema |

资料来源：[src/client/src/components/(playground)/agents/tools-card.tsx:8-18]()

## 前端架构

### 页面路由结构

```mermaid
graph TD
    A[Playground] --> B[Getting Started]
    A --> C[Agents]
    A --> D[Context]
    
    C --> C1[Controller Instance]
    C --> C2[Tools]
    
    D --> D1[New Context]
    D --> D2[Context Detail]
```

### 技术栈

OpenLIT 前端基于 Next.js 框架构建，使用以下核心组件库：

| 组件 | 用途 |
|------|------|
| `Tabs` | 语言/模式切换 |
| `Card` | 内容区块展示 |
| `Accordion` | 可折叠面板 |
| `CodeBlock` | 代码高亮显示 |
| `Dialog` | 模态对话框 |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:80-95]()

### 主题支持

平台支持亮色和暗色主题，通过 Tailwind CSS 的 dark mode 类实现：

| 主题 | 类名前缀 |
|------|----------|
| 亮色 | 默认 |
| 暗色 | `dark:` |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:5-6]()

## 接入流程

### 快速接入步骤

```mermaid
graph LR
    A[安装 SDK] --> B[初始化配置]
    B --> C[设置 OTLP Endpoint]
    C --> D[启动应用]
    D --> E[查看监控数据]
```

### 环境变量配置

除了代码初始化外，还可以通过环境变量配置端点：

```bash
OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:6-8]()

## 上下文管理

OpenLIT 提供上下文（Context）管理功能，允许用户创建和管理监控上下文：

| 功能 | 说明 |
|------|------|
| 创建上下文 | 支持描述和 Markdown 内容 |
| 编辑上下文 | 提供 Write/Preview 双模式 |
| 关联规则 | 可为上下文绑定业务规则 |

资料来源：[src/client/src/app/(playground)/context/[id]/page.tsx:45-55]()

## 版本管理

控制器实例支持版本历史追踪，每个版本记录：

| 属性 | 说明 |
|------|------|
| `first_seen` | 首次发现时间 |
| `last_seen` | 最后活跃时间 |
| `request_count` | 请求计数 |
| `primary_model` | 主要使用的 LLM 模型 |

资料来源：[src/client/src/components/(playground)/agents/version-drawer.tsx:10-20]()

## 安全配置

### API Key 认证

在生产环境中部署时，可通过 API Key 进行认证：

```bash
helm upgrade --install openlit openlit/openlit \
  --set openlit-controller.apiKey="${apiKey}"
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:42-45]()

## 默认凭证

平台提供默认登录凭证供初次使用：

| 字段 | 值 |
|------|-----|
| 访问地址 | `http://127.0.0.1:3000` |
| 邮箱 | `user@openlit.io` |
| 密码 | `openlituser` |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:45-48]()

## 总结

OpenLIT 采用现代化的微服务架构设计，通过 OpenTelemetry 标准协议实现与各类 LLM 框架的无缝集成。平台提供了从 SDK 埋点到数据可视化展示的完整链路，支持灵活的部署方式和多种运行环境，能够满足从开发测试到生产部署的不同场景需求。

---

<a id='python-sdk'></a>

## Python SDK

### 相关页面

相关主题：[TypeScript SDK](#typescript-sdk), [Go SDK](#go-sdk), [可观测性功能](#observability)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/claude_agent_sdk.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/claude_agent_sdk.py)
- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
- [sdk/python/src/openlit/__helpers.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/__helpers.py)
- [sdk/python/src/openlit/instrumentation/agent_framework/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/agent_framework/utils.py)
- [sdk/python/src/openlit/instrumentation/google_adk/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/google_adk/utils.py)
</details>

# Python SDK

OpenLIT Python SDK 是一个基于 OpenTelemetry 原生的 GenAI 和 LLM 应用可观测性工具。它通过自动插桩主流 AI 框架和模型提供商，自动捕获追踪（traces）、指标（metrics）和日志，帮助开发者实现对 AI 应用的深度可视化监控。

## 核心架构

OpenLIT Python SDK 的架构围绕三个主要模块展开：

```mermaid
graph TD
    A[用户应用代码] --> B[OpenLIT Python SDK]
    B --> C[插桩模块 Instrumentation]
    B --> D[防护模块 Guardrails]
    B --> E[辅助工具 Helpers]
    
    C --> C1[Claude Agent SDK]
    C --> C2[Google ADK]
    C --> C3[Agent Framework]
    C --> C4[LangGraph]
    C --> C5[CrewAI]
    
    D --> D1[PII 检测]
    D --> D2[提示注入检测]
    D --> D3[敏感话题检测]
    D --> D4[内容审核]
    D --> D5[主题限制]
    
    E --> E1[工具定义构建]
    E --> E2[系统指令构建]
    E --> E3[自定义属性应用]
```

## 快速开始

### 安装

```bash
pip install openlit
```

### 初始化

在应用代码中添加以下两行：

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

### 与 OpenAI 配合使用

```python
from openai import OpenAI
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

client = OpenAI(api_key="YOUR_OPENAI_KEY")

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is LLM Observability?",
        }
    ],
    model="gpt-3.5-turbo",
)
```

## 插桩模块（Instrumentation）

插桩模块是 SDK 的核心，负责自动捕获 AI 框架调用并生成符合 OTel GenAI 语义约定的追踪数据。

### 支持的框架

| 框架 | 版本要求 | 功能 |
|------|----------|------|
| Claude Agent SDK | >= 0.1.0 | `invoke_agent` 和 `execute_tool` span |
| Google ADK | - | 工具执行追踪 |
| Agent Framework | - | Agent 和工作流追踪 |
| LangGraph | - | 图执行追踪 |
| CrewAI | - | Agent 和任务追踪 |

### Claude Agent SDK 插桩

Claude Agent SDK 插桩模块实现了对 `query()` 方法和 `ClaudeSDKClient` 的包装，生成 `invoke_agent` 和 `execute_tool` 两种 span 类型。

```python
from openlit.instrumentation.claude_agent_sdk import ClaudeAgentSDKInstrumentor

# 启用插桩
ClaudeAgentSDKInstrumentor().instrument()
```

插桩模块通过 SDK 的 Hook 系统（`PreToolUse` / `PostToolUse` / `PostToolUseFailure`）创建工具 span，并使用基于消息流的回退机制处理 Hook 无法注入的场景。

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:1-35]()

### Agent Framework 插桩

Agent Framework 插桩提供标准化的 span 命名和操作类型映射：

| 端点 | 操作类型 | Span 名称格式 |
|------|----------|---------------|
| `agent_init` | agent | `create_agent {name}` |
| `agent_run` | agent | `invoke_agent {name}` |
| `tool_execute` | tools | `execute_tool {name}` |
| `workflow_run` | workflow | `invoke_workflow {name}` |

```python
# 工具执行 span 名称生成逻辑
def generate_span_name(endpoint, instance, args=None, kwargs=None):
    operation_type = get_operation_type(endpoint)
    
    if endpoint == "agent_init":
        name = getattr(instance, "name", None) or getattr(instance, "id", None) or "agent"
        return f"create_agent {name}"
    
    if endpoint == "tool_execute":
        name = getattr(instance, "name", None) or type(instance).__name__
        return f"execute_tool {name}"
```

资料来源：[sdk/python/src/openlit/instrumentation/agent_framework/utils.py:1-70]()

### Google ADK 插桩

Google ADK 插桩为工具调用添加 OTel GenAI 语义约定属性：

```python
span.set_attribute(SemanticConvention.GEN_AI_OPERATION, 
                   SemanticConvention.GEN_AI_OPERATION_TYPE_TOOLS)
span.set_attribute(SemanticConvention.GEN_AI_PROVIDER_NAME,
                   SemanticConvention.GEN_AI_SYSTEM_GOOGLE_ADK)
```

资料来源：[sdk/python/src/openlit/instrumentation/google_adk/utils.py:1-50]()

## 防护模块（Guardrails）

OpenLIT 提供了生产级的 LLM 应用防护栏功能，用于过滤和验证输入输出内容。

### 可用防护类型

| 防护类型 | 功能说明 |
|----------|----------|
| `PII` | 检测并处理个人身份信息 |
| `PromptInjection` | 检测提示注入攻击 |
| `SensitiveTopic` | 检测敏感话题内容 |
| `TopicRestriction` | 限制允许的话题范围 |
| `Moderation` | 内容审核 |
| `Schema` | 输出结构验证 |
| `Custom` | 自定义防护规则 |

### 使用方法

防护类可以直接在 `openlit.init()` 中配置：

```python
import openlit

openlit.init(
    otlp_endpoint="http://127.0.0.1:4318",
    guards=[openlit.PII(action="redact")]
)
```

或者直接导入使用：

```python
from openlit import PII, PromptInjection, Moderation

# 单独使用
pii_guard = PII(action="redact")
result = pii_guard.check(user_input)
```

### 核心类结构

| 类名 | 说明 |
|------|------|
| `Guard` | 防护基类 |
| `GuardAction` | 防护动作枚举 |
| `GuardConfigError` | 配置错误异常 |
| `GuardDeniedError` | 防护拒绝异常 |
| `GuardPhase` | 执行阶段枚举 |
| `GuardResult` | 防护结果数据类 |
| `GuardTimeoutError` | 超时异常 |
| `PipelineResult` | 管道执行结果 |

资料来源：[sdk/python/src/openlit/guard/__init__.py:1-55]()

## 辅助工具（Helpers）

`__helpers.py` 模块提供通用的辅助函数，用于处理 AI 请求中的常见数据结构。

### 构建工具定义

`build_tool_definitions()` 函数从聊天请求的 `tools` 参数中提取工具/函数定义：

```python
def build_tool_definitions(tools):
    """
    支持两种模式：
    1. OpenAI 风格: {"type": "function", "function": {...}}
    2. 扁平模式: {"name": ..., "description": ..., "parameters": ...}
    """
```

返回值格式：

```python
{
    "type": "function",
    "name": str,
    "description": str,
    "parameters": dict
}
```

### 构建系统指令

`build_system_instructions()` 函数从各种格式中提取系统指令：

```python
instructions = [
    {"type": "text", "content": str(content)},
    {"type": "resource", "uri": str(uri), "content": str(content)}
]
```

### 异常处理

`handle_exception()` 函数用于统一处理插桩过程中的异常，确保不影响主业务流程。

资料来源：[sdk/python/src/openlit/__helpers.py:1-100]()

## 配置选项

### 初始化参数

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `otlp_endpoint` | str | 环境变量 | OTLP 接收端点 |
| `application_name` | str | "default" | 应用名称 |
| `environment` | str | "default" | 环境名称 |
| `pricing_info` | dict | {} | 价格信息映射 |
| `capture_message_content` | bool | False | 是否捕获消息内容 |
| `disable_metrics` | bool | None | 是否禁用指标 |
| `guards` | list | [] | 防护配置列表 |

### 环境变量

| 变量名 | 说明 |
|--------|------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP 接收端点 |
| `OTEL_SERVICE_NAME` | 服务名称 |

## 追踪数据模型

### Span 类型与属性

OpenLIT 使用 OTel GenAI 语义约定定义 span 属性：

| 属性键 | 值 | 适用 Span |
|--------|-----|-----------|
| `gen_ai.operation.name` | `agent` / `tools` / `workflow` | Agent 调用 |
| `gen_ai.operation.type` | `create` / `invoke` | 操作类型 |
| `gen_ai.system` | `openai` / `anthropic` 等 | AI 系统 |
| `gen_ai.tool.name` | 工具名称 | 工具调用 |
| `gen_ai.tool.type` | `function` 等 | 工具类型 |
| `gen_ai.tool.call.arguments` | 调用参数 | 工具调用 |
| `gen_ai.tool.call.id` | 调用 ID | 工具调用 |
| `gen_ai.response.id` | 响应 ID | 模型响应 |
| `gen_ai.prompt.token_count` | 提示 token 数 | 请求 |
| `gen_ai.completion.token_count` | 完成 token 数 | 响应 |

### SpanKind 映射

| 操作类型 | SpanKind |
|----------|----------|
| `agent` | `CLIENT` |
| `tools` | `INTERNAL` |
| `workflow` | `INTERNAL` |

## 工作流程

### 自动插桩流程

```mermaid
sequenceDiagram
    participant App as 应用代码
    participant Inst as 插桩器
    participant SDK as AI SDK
    participant OTel as OpenTelemetry
    
    App->>Inst: instrument()
    Inst->>SDK: 包装目标函数
    App->>SDK: 调用 AI 方法
    SDK->>Inst: 触发包装函数
    Inst->>OTel: 创建 Span
    Inst->>Inst: 提取模型/工具信息
    Inst->>OTel: 设置语义属性
    SDK-->>App: 返回结果
    Inst->>OTel: 结束 Span
```

### 防护检查流程

```mermaid
graph LR
    A[用户输入] --> B{PII 检测}
    B -->|通过| C{提示注入检测}
    B -->|发现 PII| D[处理/拒绝]
    C -->|通过| E{内容审核}
    C -->|检测到注入| F[拒绝]
    E -->|通过| G[发送给 LLM]
    E -->|违规| H[拒绝]
```

## 与 TypeScript SDK 的对比

| 特性 | Python SDK | TypeScript SDK |
|------|------------|----------------|
| 安装命令 | `pip install openlit` | `npm install openlit` |
| 初始化语法 | `openlit.init(otlp_endpoint="...")` | `openlit.init({ otlpEndpoint: "..." })` |
| 插桩方式 | 自动包装函数 | 自动包装函数 |
| 防护模块 | 完整支持 | 完整支持 |

## 下一步

- 访问 [OpenLIT 官方文档](https://docs.openlit.io) 获取更多详细信息
- 查看 [GitHub 仓库](https://github.com/openlit/openlit) 获取最新更新
- 加入 [Slack 社区](https://join.slack.com/t/openlit/shared_invite/zt-2etnfttwg-TjP_7BZXfYg84oAukY8QRQ) 参与讨论

---

<a id='typescript-sdk'></a>

## TypeScript SDK

### 相关页面

相关主题：[Python SDK](#python-sdk), [Go SDK](#go-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/openground/sdk-usage-dialog.tsx)
- [src/client/src/components/(playground)/agents/observability-block.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/observability-block.tsx)
- [src/client/src/components/(playground)/agents/tools-card.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/tools-card.tsx)
</details>

# TypeScript SDK

## 概述

OpenLIT TypeScript SDK 是一个用于为 GenAI 和 LLM 应用程序添加可观测性的客户端库。它基于 OpenTelemetry 标准设计，能够自动捕获 LLM 调用、追踪请求链路、收集指标数据，并将其发送至 OpenLIT 后端进行可视化分析。

SDK 通过简单的初始化配置即可集成到现有的 TypeScript/Node.js 应用中，支持与 OpenAI 等主流 LLM 提供商的自动集成。

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:1-50]()

## 核心功能

### 自动插桩

SDK 提供开箱即用的自动插桩功能，能够拦截并追踪 LLM API 调用。主要支持以下功能：

| 功能 | 说明 |
|------|------|
| 请求捕获 | 自动捕获发送给 LLM 的所有请求 |
| 响应记录 | 记录 LLM 返回的完整响应 |
| Token 统计 | 统计输入/输出 Token 数量 |
| 延迟追踪 | 测量请求处理耗时 |
| 错误捕获 | 记录请求过程中发生的错误 |

### 环境变量配置

除代码配置外，SDK 还支持通过环境变量进行配置：

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:80-85]()

## 安装与快速开始

### 安装 SDK

使用 npm 安装 OpenLIT SDK：

```bash
npm install openlit
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:25-30]()

### 初始化配置

在应用入口处初始化 SDK：

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:35-42]()

### OpenAI 集成示例

SDK 与 OpenAI API 完美集成：

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:55-70]()

## 配置参数

### init() 方法参数

| 参数 | 类型 | 必填 | 默认值 | 说明 |
|------|------|------|--------|------|
| otlpEndpoint | string | 否 | OTEL_EXPORTER_OTLP_ENDPOINT 环境变量 | OTLP 接收端点地址 |

### 环境变量

| 变量名 | 说明 |
|--------|------|
| OTEL_EXPORTER_OTLP_ENDPOINT | OTLP exporter 的服务端点 |
| OPENAI_API_KEY | OpenAI API 密钥（如使用 OpenAI） |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:45-50]()

## SDK 启用状态指示

在 OpenLIT 前端界面中，已通过 SDK 接入的 Agent 会显示特定的标识状态：

```typescript
<span className="inline-flex items-center gap-1.5 px-3 py-1.5 text-xs font-medium rounded-md border border-emerald-200 dark:border-emerald-900 text-emerald-700 dark:text-emerald-400 bg-emerald-50/60 dark:bg-emerald-900/20">
  <span className="w-1.5 h-1.5 rounded-full bg-emerald-500" />
  {getMessage().AGENTS_SDK_ENABLED_VIA}
</span>
```

资料来源：[src/client/src/components/(playground)/agents/observability-block.tsx:30-38]()

### 状态标识说明

| 状态 | 样式 | 含义 |
|------|------|------|
| SDK 已启用 | 绿色边框、绿色圆点 | 通过 OpenLIT SDK 进行追踪 |
| 静态分析 | 绿色边框、绿色圆点 | 源代码已集成 SDK |
| 等待确认 | 禁用按钮 | 等待 SDK 连接确认 |

## 可观测性数据捕获

### 工具定义捕获

SDK 能够自动捕获 Agent 使用的工具定义和 schema：

```typescript
{tool.description && (
  <p className="text-xs text-stone-600 dark:text-stone-300 whitespace-pre-wrap">
    {tool.description}
  </p>
)}
{hasSchema(tool.schema) ? (
  <div className="rounded-md bg-stone-100 dark:bg-stone-900 p-3 text-xs overflow-x-auto">
    <JSONViewer value={tool.schema} />
  </div>
) : (
  <div className="rounded-md border border-dashed border-stone-200 dark:border-stone-800 p-3 text-xs text-stone-500 dark:text-stone-400">
    {getMessage().AGENTS_DEFINITION_SCHEMA_NOT_CAPTURED}
  </div>
)}
```

资料来源：[src/client/src/components/(playground)/agents/tools-card.tsx:20-35]()

## 架构流程

```mermaid
graph TD
    A[TypeScript 应用] --> B[OpenLIT SDK]
    B --> C[自动插桩层]
    C --> D[OpenAI API]
    D --> E[响应数据]
    C --> F[OTLP Exporter]
    F --> G[OpenLIT Collector]
    G --> H[数据存储]
    G --> I[前端可视化]
    
    J[环境变量配置] --> B
    K[otlpEndpoint 参数] --> B
    
    style A fill:#e1f5fe
    style D fill:#fff3e0
    style G fill:#e8f5e9
    style I fill:#f3e5f5
```

## 多语言 SDK 支持

OpenLIT 提供多种语言的 SDK，TypeScript SDK 是其中之一：

| SDK | 安装命令 | 初始化方式 |
|-----|----------|------------|
| Python | `pip install openlit` | `openlit.init(otlp_endpoint="...")` |
| TypeScript | `npm install openlit` | `openlit.init({ otlpEndpoint: "..." })` |

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:10-45]()

## 使用场景

### 场景一：LLM 应用监控

在生产环境中部署 LLM 应用时，通过 SDK 实时监控：

- API 调用频率和响应时间
- Token 消耗统计
- 错误率追踪

### 场景二：Agent 行为分析

结合 OpenLIT 的 Agent 可视化功能，分析：

- 工具调用模式
- 决策链路追踪
- 上下文使用效率

### 场景三：性能优化

基于收集的遥测数据：

- 识别性能瓶颈
- 优化 Prompt 设计
- 降低 API 成本

## 高级配置

### 异步初始化

SDK 支持在异步环境中初始化：

```typescript
import openlit from 'openlit';

async function initializeApp() {
  await openlit.init({
    otlpEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT
  });
}
```

### 条件初始化

在开发环境中可选择性地禁用追踪：

```typescript
openlit.init({
  otlpEndpoint: process.env.NODE_ENV === 'production' 
    ? "http://127.0.0.1:4318" 
    : undefined
});
```

## 注意事项

1. **端点配置优先级**：代码中的 `otlpEndpoint` 参数优先于环境变量
2. **网络要求**：确保应用能够访问配置的 OTLP 端点
3. **性能影响**：SDK 设计为低侵入性，对应用性能影响极小
4. **数据类型**：自动捕获的数据包括文本内容，可能涉及敏感信息，请确保合规处理

## 相关资源

- 官方文档：https://docs.openlit.io
- SDK 仓库：https://github.com/openlit/openlit/tree/main/sdk/typescript
- OpenTelemetry 官方：https://opentelemetry.io

---

<a id='go-sdk'></a>

## Go SDK

### 相关页面

相关主题：[Python SDK](#python-sdk), [TypeScript SDK](#typescript-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/go/openlit.go](https://github.com/openlit/openlit/blob/main/sdk/go/openlit.go)
- [sdk/go/config.go](https://github.com/openlit/openlit/blob/main/sdk/go/config.go)
- [sdk/go/instrumentation/openai/instrumentor.go](https://github.com/openlit/openlit/blob/main/sdk/go/instrumentation/openai/instrumentor.go)
- [sdk/go/instrumentation/anthropic/instrumentor.go](https://github.com/openlit/openlit/blob/main/sdk/go/instrumentation/anthropic/instrumentor.go)
- [sdk/go/go.mod](https://github.com/openlit/openlit/blob/main/sdk/go/go.mod)
- [sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)
</details>

# Go SDK

OpenLIT Go SDK 是一个原生支持 OpenTelemetry 的观测工具，专为 Go 语言开发的 GenAI 应用和 LLM 应用设计。该 SDK 提供自动化的链路追踪（Tracing）和指标收集（Metrics）功能，使开发者能够轻松地将应用程序的观测数据导出至 OpenLIT 平台进行可视化和分析。

## 核心架构

```mermaid
graph TD
    A[Go 应用] --> B[OpenLIT SDK]
    B --> C[OpenAI Instrumentation]
    B --> D[Anthropic Instrumentation]
    C --> E[OpenTelemetry Collector]
    D --> E
    E --> F[OpenLIT Dashboard]
    
    G[Rule Engine] --> H[HTTP API 调用]
    H --> F
```

OpenLIT Go SDK 采用模块化架构，核心模块负责 SDK 初始化和配置管理，而插桩（Instrumentation）模块则负责拦截和追踪具体的 AI SDK 调用。Rule Engine 模块独立运行，无需调用 `Init()`，通过 HTTP 与 OpenLIT 平台通信。

## 快速开始

### 环境要求

- Go 1.21 或更高版本
- OpenLIT 后端服务运行中

### 安装 SDK

```bash
go get github.com/openlit/openlit/sdk/go
```

### 初始化 OpenLIT

在应用程序启动时调用初始化方法：

```go
package main

import (
    "context"
    "log"
    
    "github.com/openlit/openlit/sdk/go"
)

func main() {
    err := openlit.Init(openlit.Config{
        OtlpEndpoint:    "http://127.0.0.1:4318",
        Environment:     "production",
        ApplicationName: "my-go-app",
    })
    if err != nil {
        log.Fatalf("初始化 OpenLIT 失败: %v", err)
    }
    defer openlit.Shutdown(context.Background())
}
```

资料来源：[sdk/go/README.md:Quick Start]()

## 配置选项

`Config` 结构体是 SDK 的核心配置单元，支持多种自定义选项：

| 配置项 | 类型 | 说明 | 默认值 |
|--------|------|------|--------|
| `OtlpEndpoint` | string | OTLP 导出端点地址 | `http://localhost:4318` |
| `Environment` | string | 运行环境名称 | `"default"` |
| `ApplicationName` | string | 应用名称 | `"default"` |
| `PricingInfo` | `map[string]ModelPricing` | 自定义模型定价信息 | 空 |
| `OtlpHeaders` | `map[string]string` | 自定义 OTLP 导出头 | 空 |

### 自定义模型定价

```go
config := openlit.Config{
    PricingInfo: map[string]openlit.ModelPricing{
        "gpt-4-custom": {
            InputCostPerToken:  0.00003,
            OutputCostPerToken: 0.00006,
        },
    },
}
```

资料来源：[sdk/go/README.md:Custom Pricing]()

### 自定义 Headers

```go
config := openlit.Config{
    OtlpHeaders: map[string]string{
        "Authorization": "Bearer token",
        "X-Custom-Header": "value",
    },
}
```

资料来源：[sdk/go/README.md:Custom Headers]()

## OpenAI 插桩

OpenAI 插桩模块提供了对 `sashabaranov/go-openai` 客户端的自动追踪支持。

### 使用方式

```go
import (
    "github.com/openlit/openlit/sdk/go/instrumentation/openai"
    openai_sdk "github.com/sashabaranov/go-openai"
)

// 创建并插桩 OpenAI 客户端
client := openai_sdk.NewClient("your-api-key")
instrumentedClient := openai.Instrument(client)

// 使用方式与普通客户端完全相同，自动产生追踪数据
resp, err := instrumentedClient.CreateChatCompletion(ctx, openai_sdk.ChatCompletionRequest{
    Model: openai_sdk.GPT4,
    Messages: []openai_sdk.ChatCompletionMessage{
        {
            Role:    openai_sdk.ChatMessageRoleUser,
            Content: "Hello!",
        },
    },
})
```

资料来源：[sdk/go/README.md:Instrument OpenAI]()

### 工作原理

```mermaid
sequenceDiagram
    participant App as 应用代码
    participant Inst as InstrumentedClient
    participant OpenAI as OpenAI API
    participant OTel as OpenTelemetry
    
    App->>Inst: CreateChatCompletion()
    Inst->>OpenAI: 调用 OpenAI API
    OpenAI-->>Inst: 返回响应
    Inst->>OTel: 创建 Span 和 Metrics
    Inst-->>App: 返回响应结果
```

插桩客户端内部自动拦截所有 API 调用，创建相应的 OpenTelemetry Span，并记录输入/输出 token 数量、成本等指标数据。

## Anthropic 插桩

Anthropic 插桩模块支持对 Anthropic Claude API 的追踪。

### 使用方式

```go
import (
    "github.com/openlit/openlit/sdk/go/instrumentation/anthropic"
)

// 创建并插桩 Anthropic 客户端
client := anthropic.NewClient("your-api-key")
instrumentedClient := anthropic.Instrument(client)
```

资料来源：[sdk/go/README.md:Instrument Anthropic]()

## 规则引擎

OpenLIT Go SDK 提供独立的规则引擎评估功能，允许开发者对追踪属性进行规则匹配，并获取关联的实体信息。

### 核心函数

```go
result := openlit.EvaluateRule(ruleConfig)
```

### 特性

- **独立运行**：无需调用 `openlit.Init()`，仅需 HTTP 连接即可使用
- **规则匹配**：根据追踪属性评估匹配规则
- **实体获取**：返回关联的上下文、提示词和评估配置

资料来源：[sdk/go/README.md:Rule Engine]()

## 与 OpenLIT Dashboard 集成

### 1. 启动 OpenLIT 堆栈

```bash
docker compose up -d
```

### 2. 配置 SDK 发送数据

```go
openlit.Init(openlit.Config{
    OtlpEndpoint: "http://localhost:4318",
})
```

### 3. 查看追踪数据

访问 http://localhost:3000 查看可视化追踪和指标数据。

资料来源：[sdk/go/README.md:Integration with OpenLIT Dashboard]()

## 示例项目

SDK 仓库包含完整的可运行示例：

| 示例路径 | 说明 |
|----------|------|
| `examples/openai/chat/` | OpenAI 聊天补全示例 |
| `examples/openai/streaming/` | OpenAI 流式响应示例 |
| `examples/anthropic/messages/` | Anthropic 消息 API 示例 |
| `examples/anthropic/streaming/` | Anthropic 流式响应示例 |

资料来源：[sdk/go/README.md:Examples]()

## 模块结构

```
sdk/go/
├── openlit.go              # 核心初始化和配置
├── config.go               # 配置结构体定义
├── go.mod                  # 模块依赖声明
├── instrumentation/
│   ├── openai/
│   │   └── instrumentor.go  # OpenAI 插桩实现
│   └── anthropic/
│       └── instrumentor.go  # Anthropic 插桩实现
└── examples/               # 示例代码
```

## 环境变量

除代码配置外，SDK 也支持通过环境变量进行配置：

| 环境变量 | 说明 |
|----------|------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP 导出端点 |

## 关闭 SDK

应用程序结束时，应优雅地关闭 SDK 以确保所有数据被正确刷新：

```go
defer openlit.Shutdown(context.Background())

---

<a id='observability'></a>

## 可观测性功能

### 相关页面

相关主题：[Python SDK](#python-sdk), [GPU Collector](#gpu-collector), [评估功能](#evaluations)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/app/(playground)/getting-started/page.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/getting-started/page.tsx)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)
- [sdk/python/src/openlit/instrumentation/llamaindex/utils.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/llamaindex/utils.py)
- [sdk/python/src/openlit/instrumentation/langgraph/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/langgraph/__init__.py)
- [sdk/python/src/openlit/instrumentation/openai/async_openai.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/openai/async_openai.py)
- [sdk/typescript/src/instrumentation/llamaindex/index.ts](https://github.com/openlit/openlit/blob/main/sdk/typescript/src/instrumentation/llamaindex/index.ts)
- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/src/components/(playground)/agents/observability-block.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/agents/observability-block.tsx)
</details>

# 可观测性功能

## 概述

OpenLIT 是一个基于 OpenTelemetry 原生的 GenAI 和 LLM 应用可观测性工具，旨在简化 LLM 应用与 OpenTelemetry 追踪和指标系统的集成过程。OpenLIT 通过自动插桩技术，为开发者提供开箱即用的可观测性能力，无需对现有代码进行大规模改造。资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:50]()

### 核心设计理念

OpenLIT 的可观测性功能遵循以下设计原则：

1. **OpenTelemetry 原生支持**：完全兼容 OpenTelemetry 标准协议
2. **零侵入式集成**：通过自动插桩（Auto-Instrumentation）实现透明监控
3. **多框架支持**：覆盖主流 LLM 框架和应用框架
4. **语义约定合规**：遵循 GenAI 语义约定规范

---

## 架构设计

### 系统架构图

```mermaid
graph TD
    A[用户应用] --> B[OpenLIT SDK]
    B --> C[自动插桩层]
    C --> D[OpenTelemetry Collector]
    D --> E[追踪数据]
    D --> F[指标数据]
    D --> G[事件数据]
    
    H[Python SDK] --> C
    I[TypeScript SDK] --> C
    
    J[OpenAI] --> C
    K[Anthropic] --> C
    L[LlamaIndex] --> C
    M[LangGraph] --> C
    N[Claude Agent SDK] --> C
    
    E --> H1[追踪后端]
    F --> M1[指标后端]
    G --> E1[事件后端]
```

### 组件层次结构

| 层级 | 组件 | 说明 |
|------|------|------|
| 应用层 | 用户代码 | 集成 OpenLIT SDK 的业务应用 |
| SDK 层 | Python/TypeScript SDK | 提供初始化和配置接口 |
| 插桩层 | 自动插桩模块 | 拦截并增强框架调用 |
| 传输层 | OTLP 导出器 | 将遥测数据传输至后端 |
| 后端层 | OpenTelemetry Collector | 收集和处理遥测数据 |

---

## 核心功能模块

### 1. 追踪功能（Tracing）

OpenLIT 的追踪功能通过包装框架的核心方法来创建分布式追踪 span，记录 LLM 调用的完整生命周期。

#### 1.1 支持的框架和操作

| 框架 | 操作类型 | 语义约定 |
|------|----------|----------|
| OpenAI | `chat`、`embedding` | `gen_ai.operation.type` |
| Anthropic | `chat` | `gen_ai.operation.type` |
| LlamaIndex | `query_engine_query`、`document_load`、`document_split`、`response_synthesize` | `gen_ai.operation.type` |
| LangGraph | `execution`、`construction`、`checkpointing` | `gen_ai.operation.type` |
| Claude Agent SDK | `invoke_agent`、`execute_tool` | `gen_ai.operation.type` |

#### 1.2 LlamaIndex 操作映射

OpenLIT 为 LlamaIndex 定义了精细的操作类型映射，采用简化的语义约定以提高处理效率：

```python
OPERATION_MAP = {
    # 文档加载与处理
    "document_load": SemanticConvention.GEN_AI_OPERATION_TYPE_RETRIEVE,
    "document_transform": SemanticConvention.GEN_AI_OPERATION_TYPE_FRAMEWORK,
    "document_split": SemanticConvention.GEN_AI_OPERATION_TYPE_FRAMEWORK,
    
    # 索引构建与管理
    "index_construct": SemanticConvention.GEN_AI_OPERATION_TYPE_FRAMEWORK,
    "index_insert": SemanticConvention.GEN_AI_OPERATION_TYPE_FRAMEWORK,
    
    # 查询引擎操作
    "query_engine_query": SemanticConvention.GEN_AI_OPERATION_TYPE_RETRIEVE,
    "query_engine_query_async": SemanticConvention.GEN_AI_OPERATION_TYPE_RETRIEVE,
    
    # 检索器操作
    "retriever_retrieve": SemanticConvention.GEN_AI_OPERATION_TYPE_RETRIEVE,
}
```

资料来源：[sdk/python/src/openlit/instrumentation/llamaindex/utils.py:14-36]()

#### 1.3 TypeScript SDK 的 LlamaIndex 插桩

TypeScript SDK 采用原型方法打补丁的方式实现 LlamaIndex 插桩：

```typescript
// 文档操作（框架 / 检索 span）
this._patchProto(m, ['SimpleDirectoryReader'], 'loadData',
  LlamaIndexWrapper._patchFrameworkMethod(tracer, 'document_load'));

// 文本分割
this._patchProto(m, ['SentenceSplitter', 'NodeParser'], 'getNodesFromDocuments',
  LlamaIndexWrapper._patchFrameworkMethod(tracer, 'document_split'));

// 检索操作
this._patchProto(m, ['BaseRetriever'], 'retrieve',
  LlamaIndexWrapper._patchFrameworkMethod(tracer, 'retriever_retrieve'));
```

资料来源：[sdk/typescript/src/instrumentation/llamaindex/index.ts:78-92]()

### 2. 指标功能（Metrics）

OpenLIT 自动收集并记录关键性能指标，包括：

| 指标类型 | 说明 | 记录方式 |
|----------|------|----------|
| 请求计数 | LLM 调用总次数 | 计数器 |
| 令牌使用 | 输入/输出令牌数 | 计量器 |
| 响应延迟 | 请求到响应的耗时 | 直方图 |
| 错误率 | 失败请求的比例 | 计数器 |

#### 指标收集流程

```mermaid
graph LR
    A[LLM 调用] --> B[插桩包装器]
    B --> C{是否禁用指标?}
    C -->|否| D[记录完成指标]
    C -->|是| E[跳过指标记录]
    D --> F[record_completion_metrics]
    F --> G[更新指标后端]
    
    style C fill:#f9f,stroke:#333
    style D fill:#bbf,stroke:#333
```

### 3. 事件功能（Events）

OpenLIT 通过事件提供者（Event Provider）记录重要的应用事件，支持日志级别的事件追踪：

```python
event_provider = _logs.get_logger_provider().get_logger(__name__)
```

事件功能允许开发者在追踪 span 之外记录额外的上下文信息，增强问题的排查能力。

### 4. 异常处理

OpenLIT 的插桩层实现了完善的异常处理机制，确保即使在监控过程中发生错误也不会影响应用正常运行：

```python
except Exception as e:
    handle_exception(span, e)
    if not disable_metrics and metrics:
        record_completion_metrics(
            metrics,
            # ... 参数配置
            error_type=type(e).__name__ or "_OTHER",
        )
```

资料来源：[sdk/python/src/openlit/instrumentation/openai/async_openai.py:95-108]()

---

## SDK 使用指南

### Python SDK

#### 安装

```bash
pip install openlit
```

#### 初始化配置

```python
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")
```

或者通过环境变量配置：

```bash
export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318"
```

#### 完整使用示例

```python
from openai import OpenAI
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

client = OpenAI(api_key="YOUR_OPENAI_KEY")

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "What is LLM Observability?",
        }
    ],
    model="gpt-3.5-turbo",
)
```

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:90-110]()

### TypeScript SDK

#### 安装

通过 npm 或 yarn 安装 TypeScript SDK

#### 初始化配置

```typescript
import openlit from 'openlit';

openlit.init({
  otlpEndpoint: "http://127.0.0.1:4318"
});
```

#### 完整使用示例

```typescript
import OpenAI from 'openai';
import openlit from 'openlit';

openlit.init({ otlpEndpoint: "http://127.0.0.1:4318" });

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

const chatCompletion = await client.chat.completions.create({
  messages: [{ role: 'user', content: 'What is LLM Observability?' }],
  model: 'gpt-3.5-turbo',
});
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:95-110]()

---

## 插桩模块架构

### 基类设计

OpenLIT 的所有插桩模块都继承自 `BaseInstrumentor` 基类，提供统一的接口规范：

```python
class ClaudeAgentSDKInstrumentor(BaseInstrumentor):
    """OTel GenAI semantic convention compliant instrumentor for Claude Agent SDK."""
    
    def instrumentation_dependencies(self) -> Collection[str]:
        return _instruments  # 声明依赖版本
    
    def _instrument(self, **kwargs):
        # 执行插桩逻辑
        tracer = trace.get_tracer(__name__)
        # ... 配置和包装逻辑
```

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:26-38]()

### LangGraph 插桩分类

LangGraph 的插桩功能分为三大类：

| 操作类别 | 说明 | 包含操作 |
|----------|------|----------|
| 执行操作 | 图执行相关 | `invoke`、`ainvoke`、`stream` 等 |
| 构建操作 | 组件构建 | `add_node`、`add_edge` 等 |
| 检查点操作 | 状态保存 | `get`、`put` 等 |

```python
# 执行操作包装
self._wrap_execution_operations(
    EXECUTION_OPERATIONS,
    version,
    environment,
    application_name,
    tracer,
    pricing_info,
    capture_message_content,
    metrics,
    disable_metrics,
)

# 检查点操作包装
self._wrap_checkpoint_operations(
    version,
    environment,
    application_name,
    tracer,
    pricing_info,
    capture_message_content,
    metrics,
    disable_metrics,
)
```

资料来源：[sdk/python/src/openlit/instrumentation/langgraph/__init__.py:50-100]()

---

## 配置选项

### 初始化参数

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `otlp_endpoint` | string | - | OTLP 导出器端点 |
| `environment` | string | "default" | 运行环境标识 |
| `application_name` | string | "default" | 应用名称 |
| `pricing_info` | dict | {} | 定价信息映射 |
| `capture_message_content` | bool | False | 是否捕获消息内容 |
| `metrics` | bool | True | 是否启用指标收集 |
| `disable_metrics` | bool | False | 是否禁用指标 |

### 环境变量

| 变量名 | 说明 |
|--------|------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP 端点地址 |
| `OTEL_SERVICE_NAME` | 服务名称 |

---

## 前端可视化

### 可观测性状态显示

OpenLIT 前端提供了实时的可观测性状态展示功能：

```typescript
// SDK 启用状态标识
{isStatic ? (
    <span className="inline-flex items-center gap-1.5 px-3 py-1.5 text-xs font-medium rounded-md border border-emerald-200 dark:border-emerald-900 text-emerald-700 dark:text-emerald-400 bg-emerald-50/60 dark:bg-emerald-900/20">
        <span className="w-1.5 h-1.5 rounded-full bg-emerald-500" />
        {getMessage().AGENTS_SDK_ENABLED_VIA}
    </span>
) : pending ? (
    // 待处理状态
)}
```

资料来源：[src/client/src/components/(playground)/agents/observability-block.tsx:45-55]()

### 版本追踪

前端组件支持追踪不同版本的代理（Agent）使用情况：

| 显示字段 | 说明 |
|----------|------|
| 首次出现时间 | `first_seen` |
| 最后出现时间 | `last_seen` |
| 请求计数 | `request_count` |
| 主要模型 | `primary_model` |

资料来源：[src/client/src/components/(playground)/agents/version-drawer.tsx:80-95]()

---

## 默认凭证

首次部署 OpenLIT 后，访问前端界面需要使用以下默认登录凭证：

| 字段 | 值 |
|------|-----|
| Email | user@openlit.io |
| Password | openlituser |

资料来源：[src/client/src/app/(playground)/getting-started/page.tsx:60-65]()

---

## 技术栈总结

### 核心技术依赖

| 组件 | 用途 |
|------|------|
| OpenTelemetry | 遥测数据标准 |
| wrapt | 函数包装 |
| opentelemetry-instrumentation | 自动插桩框架 |

### 支持的框架版本

| 框架 | 最低版本要求 |
|------|--------------|
| Claude Agent SDK | >= 0.1.0 |
| OpenAI | - |
| LlamaIndex | - |
| LangGraph | - |

---

## 总结

OpenLIT 的可观测性功能通过标准化的 OpenTelemetry 集成，为 LLM 应用提供了全面的监控能力。其核心优势包括：

- **多框架支持**：统一覆盖主流 GenAI 和 LLM 框架
- **零侵入集成**：通过自动插桩实现快速部署
- **标准化输出**：遵循 OpenTelemetry 和 GenAI 语义约定
- **灵活配置**：支持多种初始化方式和配置参数
- **优雅降级**：异常情况下不影响应用运行

---

<a id='evaluations'></a>

## 评估功能

### 相关页面

相关主题：[Guardrails与规则引擎](#guardrails), [可观测性功能](#observability)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
- [src/client/src/utils/breadcrumbs.ts](https://github.com/openlit/openlit/blob/main/src/client/src/utils/breadcrumbs.ts)
- [src/client/src/components/(playground)/getting-started/tracing/index.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/components/(playground)/getting-started/tracing/index.tsx)
- [src/client/README.md](https://github.com/openlit/openlit/blob/main/src/client/README.md)
- [sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py)

</details>

# 评估功能

## 概述

OpenLIT 的评估功能（Evaluation）是平台核心能力之一，旨在为 LLM 应用提供可量化的质量评估机制。通过集成 OpenTelemetry 原生架构，OpenLIT 能够对 AI 应用的响应质量、成本效率和性能表现进行全面的监控与评估。

> **注意**：当前从源码中获取的评估功能相关文档较为有限，以下内容基于可观察到的系统架构和组件进行阐述。

## 评估设置页面

### 路由配置

评估功能在客户端应用中具有独立的路由入口。根据路由配置，评估相关页面包括：

| 路由路径 | 页面标题 | 面包屑导航 |
|---------|---------|-----------|
| `/evaluations/settings` | 评估设置（Evaluation Settings） | Settings → Evaluation Settings |

资料来源：[src/client/src/utils/breadcrumbs.ts:1-150]()

### 评估设置界面

评估设置页面允许用户配置评估相关的参数和行为。页面采用标准的设置界面布局，与平台其他设置页面保持一致的交互模式。

## 评估执行机制

### SDK 层面的评估支持

OpenLIT Python SDK 提供了多层次的支持机制，虽然直接的评估执行文件在当前源码上下文中未完全展示，但从系统架构可以看出评估功能与其他 SDK 组件的集成关系。

```mermaid
graph TB
    subgraph "Python SDK"
        A[openlit.init] --> B[Guard System]
        B --> C[PII Detection]
        B --> D[Prompt Injection Detection]
        B --> E[Moderation]
        B --> F[Topic Restriction]
    end
    
    subgraph "Evaluation Layer"
        G[Evaluation Settings] --> H[Run Evaluation]
        H --> I[Results Analysis]
    end
    
    C --> G
    D --> G
    E --> G
    F --> G
```

## Guard 系统与评估质量保障

### Guard 组件架构

OpenLIT 的 Guard 系统是评估功能的重要组成部分，提供生产级的安全保障机制。Guard 组件可以作为评估流程中的质量过滤器使用。

资料来源：[sdk/python/src/openlit/guard/__init__.py:1-47]()

### 核心 Guard 类

| 类名 | 功能描述 | 使用场景 |
|-----|---------|---------|
| `PII` | 个人身份信息检测与处理 | 隐私合规评估 |
| `PromptInjection` | 提示词注入攻击检测 | 安全性评估 |
| `SensitiveTopic` | 敏感话题识别 | 内容安全评估 |
| `TopicRestriction` | 话题范围限制 | 主题一致性评估 |
| `Moderation` | 内容审核 | 合规性评估 |
| `Schema` | 输出结构验证 | 格式正确性评估 |
| `Custom` | 自定义 Guard 逻辑 | 定制化评估规则 |

### Guard 基础类型

```python
# 基础类型定义
Guard              # Guard 基类
GuardAction        # Guard 执行动作
GuardPhase         # Guard 执行阶段
GuardResult        # Guard 执行结果
PipelineResult     # 管道执行结果
```

资料来源：[sdk/python/src/openlit/guard/__init__.py:11-24]()

### Guard 错误类型

| 错误类 | 说明 |
|-------|------|
| `GuardError` | 基础 Guard 错误 |
| `GuardDeniedError` | Guard 拒绝执行错误 |
| `GuardTimeoutError` | Guard 执行超时错误 |
| `GuardConfigError` | Guard 配置错误 |

资料来源：[sdk/python/src/openlit/guard/__init__.py:26-31]()

## 评估集成方式

### 初始化配置

在应用代码中集成评估功能的标准方式：

```python
import openlit

# 初始化 OpenLIT
openlit.init(
    otlp_endpoint="http://127.0.0.1:4318",
    # 评估相关配置
)
```

资料来源：[src/client/src/components/(playground)/getting-started/tracing/index.tsx:1-100]()

### Guard 与评估结合使用

```python
import openlit

# 初始化并启用 Guard
openlit.init(
    guards=[
        openlit.PII(action="redact"),
        openlit.Moderation(threshold=0.8)
    ]
)
```

## 技术架构

### OpenTelemetry 原生集成

OpenLIT 采用 OpenTelemetry 标准进行数据采集和传输，确保评估数据与追踪、指标数据的一致性。

```mermaid
graph LR
    A[Application] -->|Traces/Metrics| B[OpenLIT SDK]
    B -->|OTLP| C[OpenTelemetry Collector]
    C -->|Data| D[OpenLIT Backend]
    D -->|Display| E[UI Dashboard]
    
    F[Evaluation Engine] -->|Quality Scores| D
    G[Guard System] -->|Security Results| F
```

### 追踪与评估关联

评估结果与应用的追踪数据关联存储，用户可以在 OpenLIT 前端界面中同时查看：

- 追踪详情（Traces）
- 评估分数（Evaluation Scores）
- Guard 事件（Guard Events）
- 成本与延迟指标（Cost & Latency Metrics）

## 前端组件结构

### 评估相关路由

客户端应用使用 Next.js 的文件路由系统，评估功能页面位于：

```
src/client/src/app/(playground)/evaluations/
```

### 导航与面包屑

评估功能通过统一的导航系统与平台其他功能整合，用户可以从主导航直接访问评估设置。

资料来源：[src/client/src/utils/breadcrumbs.ts:60-75]()

## SDK 仪器化支持

### Claude Agent SDK 仪器化

OpenLIT 支持对 Claude Agent SDK 进行仪器化，自动采集代理执行过程中的评估相关数据：

资料来源：[sdk/python/src/openlit/instrumentation/claude_agent_sdk/__init__.py:1-50]()

支持的仪器化操作：

| 操作类型 | 追踪事件 | 说明 |
|---------|---------|------|
| `invoke_agent` | Agent 调用 | 追踪代理执行 |
| `execute_tool` | 工具执行 | 追踪工具使用 |
| `query` | 查询操作 | 追踪查询请求 |

## 使用限制与注意事项

### 当前源码限制

从当前获取的源码上下文来看，评估功能的具体实现细节（如评估算法、评分标准等）需要在以下方面获取更多源码：

1. 评估执行核心逻辑文件
2. 评估结果存储与查询 API
3. 前端评估配置界面组件
4. 评估报告生成模块

### 建议

- 访问 OpenLIT 官方文档获取完整的评估功能使用指南
- 查看 `sdk/python/src/openlit/evaluation/` 目录下的具体实现
- 参考前端 `src/client/src/lib/platform/evaluation/` 目录了解评估流程

## 相关资源

| 资源 | 链接 |
|------|------|
| OpenLIT 官方文档 | https://docs.openlit.io |
| GitHub 仓库 | https://github.com/openlit/openlit |
| Python SDK | https://github.com/openlit/openlit/tree/main/sdk/python |
| TypeScript SDK | https://github.com/openlit/openlit/tree/main/sdk/typescript |

---

*本文档基于 OpenLIT 开源项目源码生成，如有疏漏或更新延迟，请以官方最新文档为准。*

---

<a id='guardrails'></a>

## Guardrails与规则引擎

### 相关页面

相关主题：[评估功能](#evaluations), [可观测性功能](#observability)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
- [sdk/python/src/openlit/guard/pii.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/pii.py)
- [sdk/python/src/openlit/guard/_base.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/_base.py)
- [sdk/python/src/openlit/guard/_pipeline.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/_pipeline.py)
- [sdk/python/src/openlit/guard/prompt_injection.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/prompt_injection.py)
- [sdk/python/src/openlit/guard/sensitive_topic.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/sensitive_topic.py)
</details>

# Guardrails与规则引擎

## 概述

OpenLIT Guardrails是面向LLM应用的生产级安全防护系统，提供开箱即用的防护栏（Guardrails）和灵活可扩展的规则引擎（Rule Engine）。该系统通过在LLM应用的关键阶段注入检测逻辑，自动识别并处理敏感信息泄露、提示词注入攻击、违禁内容等安全风险。

Guardrails系统基于OpenTelemetry语义约定设计，支持preflight（前置检查）和postflight（后置检查）两个执行阶段，可无缝集成到现有LLM应用架构中。资料来源：[sdk/python/src/openlit/guard/__init__.py:1-12]()

## 架构设计

### 核心组件

OpenLIT Guardrails系统由以下核心组件构成：

| 组件 | 文件路径 | 职责 |
|------|----------|------|
| Guard基类 | `guard/_base.py` | 定义所有Guard的通用接口和生命周期 |
| Pipeline管道 | `guard/_pipeline.py` | 编排多个Guard的执行顺序和结果聚合 |
| PII检测器 | `guard/pii.py` | 识别API密钥、个人身份信息等敏感数据 |
| 提示词注入检测器 | `guard/prompt_injection.py` | 识别恶意提示词注入攻击 |
| 敏感话题检测器 | `guard/sensitive_topic.py` | 检测涉及敏感话题的内容 |
| 主题限制器 | `guard/topic_restriction.py` | 限制对话内容的主题范围 |
| 内容审核器 | `guard/moderation.py` | 审核生成内容是否合规 |
| Schema验证器 | `guard/schema.py` | 验证输出是否符合预定义Schema |
| 自定义Guard | `guard/custom.py` | 支持用户自定义检测规则 |

资料来源：[sdk/python/src/openlit/guard/__init__.py:14-36]()

### 架构流程图

```mermaid
graph TD
    A[LLM请求入口] --> B[Preflight阶段]
    B --> C{Guard Pipeline}
    C --> D1[PII检测]
    C --> D2[提示词注入检测]
    C --> D3[敏感话题检测]
    C --> D4[自定义Guard]
    D1 --> E{检测结果}
    D2 --> E
    D3 --> E
    D4 --> E
    E -->|通过| F[调用LLM]
    E -->|拒绝| G[返回GuardDeniedError]
    E -->|警告| H[记录日志继续执行]
    F --> I[Postflight阶段]
    I --> J1[输出PII检测]
    I --> J2[内容审核]
    I --> J3[Schema验证]
    J1 --> K[返回结果]
    J2 --> K
    J3 --> K
```

## Guard基类设计

### 执行阶段

Guard系统定义了两个核心执行阶段：

| 阶段 | 说明 | 适用场景 |
|------|------|----------|
| `PREFLIGHT` | 在LLM调用前执行 | 输入内容检测、提示词保护 |
| `POSTFLIGHT` | 在LLM调用后执行 | 输出内容审核、数据脱敏 |

资料来源：[sdk/python/src/openlit/guard/_base.py]()

### 动作类型

每个Guard支持三种响应动作：

| 动作 | 行为 | 返回结果 |
|------|------|----------|
| `redact` | 自动脱敏/替换敏感内容 | 返回脱敏后的文本 |
| `deny` | 直接拒绝请求 | 抛出`GuardDeniedError` |
| `warn` | 仅记录事件不阻断 | 返回原始文本并附带警告信息 |

资料来源：[sdk/python/src/openlit/guard/_base.py]()

### 错误类型

| 错误类 | 触发条件 |
|--------|----------|
| `GuardError` | 通用Guard异常 |
| `GuardDeniedError` | 内容被拒绝时抛出 |
| `GuardTimeoutError` | 检测超时 |
| `GuardConfigError` | 配置错误 |

## PII检测器详解

### 功能概述

PII检测器是OpenLIT Guardrails最核心的组件之一，使用约25种高置信度正则表达式模式，在小于1毫秒的时间内完成本地检测。该检测器支持识别API密钥、PII个人信息和各类敏感凭证。

### 支持的检测类型

| 类别 | 检测项目 | 正则模式示例 |
|------|----------|--------------|
| API密钥 | OpenAI API Key | `sk-(?:proj-)?[A-Za-z0-9_-]{20,}` |
| API密钥 | Anthropic API Key | `sk-ant-[A-Za-z0-9_-]{20,}` |
| API密钥 | AWS Access Key | `AKIA[0-9A-Z]{16}` |
| API密钥 | GCP API Key | `AIza[0-9A-Za-z_-]{35}` |
| 令牌 | GitHub Token | `(?:ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36,}` |
| 证书 | 客户端证书 | PEM格式证书检测 |
| 凭证 | 私钥检测 | RSA/DSA/EC私钥模式 |
| 凭证 | Azure凭据 | Azure服务主体格式 |

资料来源：[sdk/python/src/openlit/guard/pii.py:1-45]()

### 使用方式

```python
import openlit

# 基础使用 - 自动脱敏
openlit.init(guards=[openlit.PII(action="redact")])

# 自定义检测模式
openlit.init(guards=[
    openlit.PII(
        action="deny",
        custom_patterns={
            "custom-secret": r"secret-[A-Za-z0-9]{16}",
            "internal-id": r"INT-\d{8}"
        }
    )
])
```

### 脱敏机制

当检测到敏感信息时，PII检测器会将匹配内容替换为标准化的占位符格式：

```
[REDACTED:<label>]
```

其中`<label>`为检测类型的标签名称（如`openai-api-key`、`github-token`等）。脱敏过程采用从后向前替换策略，确保位置索引准确性。

资料来源：[sdk/python/src/openlit/guard/pii.py:95-120]()

## Pipeline管道编排

### 功能说明

Pipeline用于将多个Guard组合成检测链，支持顺序执行和结果聚合。Pipeline接受一个Guard列表，按声明顺序依次执行每个Guard的检测逻辑。

### 执行结果

```python
@dataclass
class PipelineResult:
    guard_results: List[GuardResult]  # 各Guard的检测结果
    final_action: GuardAction          # 最终采取的动作
    transformed_text: Optional[str]    # 转换后的文本
```

资料来源：[sdk/python/src/openlit/guard/_base.py]()

### 配置选项

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `guards` | List[Guard] | [] | Guard实例列表 |
| `timeout` | float | 5.0 | 检测超时时间（秒） |

## 集成方式

### 方式一：初始化时配置

```python
import openlit

# 在应用初始化时配置Guardrails
openlit.init(
    guards=[
        openlit.PII(action="redact"),
        openlit.PromptInjection(action="deny"),
        openlit.Moderation(action="warn")
    ]
)
```

### 方式二：直接导入使用

```python
from openlit import PII, PromptInjection, Moderation

# 独立使用单个Guard
pii_guard = PII(action="redact")
result = pii_guard.evaluate("请帮我分析这个API密钥: sk-abc123...")

if result.action == GuardAction.DENY:
    print("检测到敏感信息，请求被拒绝")
```

资料来源：[sdk/python/src/openlit/guard/__init__.py:4-12]()

## 性能特性

| 特性 | 指标 | 说明 |
|------|------|------|
| 本地执行 | <1ms | 所有检测在本地完成，无外部依赖 |
| 内存占用 | 低 | 使用预编译正则表达式 |
| 并发支持 | 高 | 无状态设计，支持高并发场景 |
| 可扩展性 | 自定义模式 | 支持用户添加自定义正则表达式 |

## 最佳实践

### 输入保护策略

建议在`PREFLIGHT`阶段启用PII检测和提示词注入检测，防止敏感信息进入LLM处理流程：

```python
openlit.init(guards=[
    openlit.PII(action="redact"),           # 脱敏输入中的敏感信息
    openlit.PromptInjection(action="deny")  # 拒绝明显的注入攻击
])
```

### 输出审核策略

建议在`POSTFLIGHT`阶段启用内容审核和Schema验证，确保输出符合预期：

```python
openlit.init(guards=[
    openlit.Moderation(action="warn"),  # 审核输出内容
    openlit.Schema(action="deny")        # 验证输出格式
])
```

### 混合策略

生产环境推荐使用Pipeline组合多种Guard：

```python
openlit.init(guards=[
    openlit.PII(action="redact"),
    openlit.PromptInjection(action="deny"),
    openlit.SensitiveTopic(action="warn"),
    openlit.TopicRestriction(allowed_topics=["技术", "产品"], action="deny")
])
```

## 扩展开发

### 创建自定义Guard

```python
from openlit.guard._base import Guard, GuardAction, GuardPhase, GuardResult
import re

class MyCustomGuard(Guard):
    name = "my_custom_guard"
    phases = (GuardPhase.PREFLIGHT, GuardPhase.POSTFLIGHT)
    
    def __init__(self, pattern: str, action: str = "warn", **kwargs):
        super().__init__(action=action, **kwargs)
        self._pattern = re.compile(pattern)
    
    def evaluate(self, text: str) -> GuardResult:
        matches = self._pattern.findall(text)
        if matches:
            return GuardResult(
                guard_name=self.name,
                action=self._action,
                classification=f"matched_{len(matches)}_times"
            )
        return GuardResult(guard_name=self.name)
```

### 注册自定义Guard

```python
from openlit import Custom

# 使用Custom Guard注册自定义检测逻辑
openlit.init(guards=[
    Custom(
        name="my_detector",
        evaluate_fn=lambda text: detect_custom_pattern(text),
        action="deny"
    )
])
```

## 总结

OpenLIT Guardrails系统提供了完整的企业级LLM安全防护解决方案，其设计遵循以下核心原则：

- **高性能**：本地化检测，毫秒级响应
- **低侵入**：通过`openlit.init()`一行代码即可启用
- **可扩展**：支持自定义Guard和检测模式
- **标准化**：符合OpenTelemetry GenAI语义约定

通过合理配置Guardrails与规则引擎，开发者可以在不影响用户体验的前提下，有效防止数据泄露、内容滥用和安全攻击等风险。

---

<a id='controller'></a>

## OpenLIT Controller

### 相关页面

相关主题：[GPU Collector](#gpu-collector), [Python SDK](#python-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [src/client/src/app/(playground)/agents/no-controller.tsx](https://github.com/openlit/openlit/blob/main/src/client/src/app/(playground)/agents/no-controller.tsx)
- [src/client/src/lib/platform/controller/features/agent.ts](https://github.com/openlit/openlit/blob/main/src/client/src/lib/platform/controller/features/agent.ts)
- [sdk/python/src/openlit/guard/__init__.py](https://github.com/openlit/openlit/blob/main/sdk/python/src/openlit/guard/__init__.py)
- [sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)
- [sdk/typescript/README.md](https://github.com/openlit/openlit/blob/main/sdk/typescript/README.md)
</details>

# OpenLIT Controller

## 概述

OpenLIT Controller 是 OpenLIT 平台的核心组件之一，负责在运行时自动检测和插桩 Python 应用程序。它作为一个独立的守护进程运行，能够自动发现用户环境中的 LLM 应用并进行零侵入式的可观测性数据采集，无需修改应用程序代码。

OpenLIT Controller 的主要职责包括：

- **自动检测**：扫描目标环境中的 Python 应用
- **运行时插桩**：在运行时动态修改 Python 代码，注入 OpenTelemetry 埋点
- **自动配置**：根据检测到的框架自动应用相应的配置
- **进程管理**：管理被插桩进程的完整生命周期

## 架构设计

### 组件关系

```mermaid
graph TD
    A[OpenLIT Controller] --> B[Scanner 扫描器]
    A --> C[Engine 引擎]
    A --> D[Server 服务器]
    
    B --> B1[Python 应用检测]
    B --> B2[框架识别]
    B --> B3[依赖分析]
    
    C --> C1[运行时插桩]
    C --> C2[进程管理]
    C --> C3[配置应用]
    
    D --> D1[HTTP API]
    D --> D2[状态查询]
    D --> D3[操作指令]
    
    E[用户环境] --> B
    E --> F[Python 应用]
    F --> C
    G[OpenLIT Dashboard] --> D
    D --> H[OTLP 端点]
```

### 技术栈

| 组件 | 技术选型 | 说明 |
|------|----------|------|
| 核心语言 | Go | 高性能、跨平台支持 |
| 检测引擎 | Python AST 分析 | 精确的代码结构理解 |
| 通信协议 | OTLP | OpenTelemetry 标准协议 |
| 部署方式 | Systemd / Docker | 灵活的部署选项 |

## 部署方式

OpenLIT Controller 支持多种部署方式，适应不同的运行环境需求。

### Linux 系统部署

在 Linux 环境中，Controller 可以作为 systemd 服务运行，提供稳定的长期运行能力。

```bash
curl -fsSL https://github.com/openlit/openlit/releases/latest/download/openlit-controller-linux-amd64 \
  -o /usr/local/bin/openlit-controller
chmod +x /usr/local/bin/openlit-controller

# 创建 systemd 服务
cat > /etc/systemd/system/openlit-controller.service << 'EOF'
[Unit]
Description=OpenLIT Controller
After=network.target

[Service]
Environment="OPENLIT_URL=${openlitUrl}"
Environment="OTEL_EXPORTER_OTLP_ENDPOINT=${openlitUrl.replace(/:\d+$/, ":4318")}"
Environment="OPENLIT_API_KEY=${apiKey}"
ExecStart=/usr/local/bin/openlit-controller
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now openlit-controller
```

### Docker 容器部署

对于容器化环境，Controller 可以以特权模式运行，以便进行进程间通信和系统级操作。

```bash
docker run -d --privileged --pid=host \
  openlit/openlit-controller:latest
```

## 环境变量配置

Controller 通过环境变量接收配置信息，这些变量定义了与主平台的连接方式和认证凭证。

| 环境变量 | 说明 | 示例值 |
|----------|------|--------|
| `OPENLIT_URL` | OpenLIT 平台 URL | `http://127.0.0.1:3000` |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP 导出端点 | `http://127.0.0.1:4318` |
| `OPENLIT_API_KEY` | API 认证密钥 | `sk-xxx` |

环境变量通过模板字符串进行配置，支持动态插值：

```typescript
const systemdKey = apiKey
    ? `\nEnvironment="OPENLIT_API_KEY=${apiKey}"`
    : "";
```

资料来源：[src/client/src/app/(playground)/agents/no-controller.tsx:18-22]()

## 功能特性

### 自动框架检测

Controller 内置了对多种主流 LLM 框架的自动检测能力，能够识别以下框架：

| 框架类型 | 检测方式 | 支持语言 |
|----------|----------|----------|
| OpenAI SDK | 导入语句分析 | Python |
| Anthropic SDK | 导入语句分析 | Python |
| LangChain | 依赖检测 | Python |
| LlamaIndex | 依赖检测 | Python |
| LiteLLM | 导入语句分析 | Python |

### 运行时插桩机制

Controller 采用无代理（Agentless）方式的动态插桩技术：

```mermaid
sequenceDiagram
    participant Scanner as 扫描器
    participant Engine as 插桩引擎
    participant App as Python 应用
    participant OTLP as OTLP 端点

    Scanner->>App: 检测运行环境
    Scanner->>Engine: 报告检测结果
    Engine->>App: 注入埋点代码
    App->>OTLP: 发送追踪数据
    Engine->>App: 监控运行状态
```

### 进程生命周期管理

Controller 负责管理被插桩进程的完整生命周期，包括：

- **启动**：在检测到目标应用后自动启动插桩
- **监控**：持续监控进程状态和健康状况
- **重启**：在进程异常终止时自动重启
- **停止**：在收到停止指令后优雅终止

## API 接口

Controller 提供 HTTP API 用于外部控制和状态查询。

### 状态查询

```bash
curl http://localhost:8080/status
```

响应示例：

```json
{
  "status": "enabled",
  "mode": "kubernetes",
  "service": "my-llm-app",
  "namespace": "default",
  "transitioning": false
}
```

### 操作指令

| 操作 | 说明 | 参数 |
|------|------|------|
| `enable` | 启用自动插桩 | `serviceId` |
| `disable` | 禁用自动插桩 | `serviceId` |
| `status` | 查询当前状态 | `serviceId` |

## 与 Python SDK 的集成

OpenLIT Controller 与 Python SDK 形成互补关系，提供端到端的可观测性解决方案：

```python
# Python SDK - 手动初始化
import openlit

openlit.init(otlp_endpoint="http://127.0.0.1:4318")

# 与 Controller 配合时，Controller 自动处理
# 未初始化的 Python 应用
```

当 Python 应用未显式初始化 OpenLIT 时，Controller 可以自动注入初始化代码，实现零配置的可观测性。

资料来源：[sdk/python/src/openlit/guard/__init__.py:1-45]()

## 安全特性

### 认证机制

Controller 支持 API Key 认证，确保只有授权的客户端可以访问控制接口：

```bash
curl -H "Authorization: Bearer <API_KEY>" \
  http://localhost:8080/status
```

### 网络隔离

Controller 支持通过环境变量配置 OTLP 端点，支持 TLS 加密传输，适用于生产环境部署：

```bash
Environment="OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.internal:4318"
```

## 故障排除

### 常见问题

| 问题 | 可能原因 | 解决方案 |
|------|----------|----------|
| Controller 无法启动 | 端口被占用 | 检查 8080 端口占用情况 |
| 无法连接 OTLP 端点 | 网络隔离 | 检查防火墙规则 |
| 应用未被检测 | 非 Python 应用 | 确认应用语言运行时 |
| 数据未上报 | 端点配置错误 | 验证 OTEL_EXPORTER_OTLP_ENDPOINT |

### 日志查看

```bash
# Systemd 日志
journalctl -u openlit-controller -f

# Docker 日志
docker logs -f openlit-controller
```

## 最佳实践

### 生产环境部署建议

1. **高可用配置**：使用 Kubernetes 部署时，配置多副本保障可用性
2. **资源限制**：为 Controller 设置适当的 CPU 和内存限制
3. **网络安全**：使用 TLS 加密 OTLP 通信，确保数据传输安全
4. **监控告警**：对 Controller 进程本身进行监控，及时发现异常

### 性能优化

- Controller 本身资源消耗极低，不会影响目标应用性能
- 插桩开销主要在数据序列化阶段，可通过批量发送优化
- 建议使用 gRPC 协议的 OTLP 端点以获得更好的性能

## 相关文档

- [OpenLIT Python SDK 文档](https://github.com/openlit/openlit/tree/main/sdk/python)
- [OpenLIT TypeScript SDK 文档](https://github.com/openlit/openlit/tree/main/sdk/typescript)
- [OpenLIT Go SDK 文档](https://github.com/openlit/openlit/tree/main/sdk/go)
- [OpenLIT 官方文档](https://docs.openlit.io/)

---

<a id='gpu-collector'></a>

## GPU Collector

### 相关页面

相关主题：[OpenLIT Controller](#controller), [可观测性功能](#observability)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)
- [sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)
- [src/client/README.md](https://github.com/openlit/openlit/blob/main/src/client/README.md)
</details>

# GPU Collector

GPU Collector 是 OpenLIT 项目中的一个核心组件，负责从 NVIDIA、AMD、Intel 等主流 GPU 设备中采集硬件遥测数据，并以 OpenTelemetry 标准格式导出供监控和分析使用。该组件基于 OpenTelemetry Collector 架构构建，支持通过 eBPF 技术实现 CUDA 内核级别的追踪能力。

## 概述

GPU Collector 旨在为人工智能和高性能计算场景提供统一的 GPU 硬件监控解决方案。它能够实时采集 GPU 的功耗、时钟频率、内存使用、计算利用率等关键指标，并将这些数据通过 OTLP（OpenTelemetry Protocol）协议发送到后端监控系统。资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

该组件的主要设计目标包括：

- 提供跨厂商（NVIDIA、AMD、Intel）的 GPU 监控支持
- 遵循 OpenTelemetry 语义约定（`hw.gpu.*`）确保指标命名标准化
- 支持通过 Prometheus `/metrics` 端点暴露数据（规划中）
- 利用 eBPF 技术实现低开销的内核级追踪
- 支持 Docker Compose 和独立二进制等多种部署方式

## 功能特性

GPU Collector 目前已完成的功能和规划中的特性如下表所示：

| 功能特性 | 状态 |
|---------|------|
| NVIDIA GPU 硬件遥测（NVML） | 已完成 |
| AMD GPU 硬件遥测（sysfs/hwmon） | 已完成 |
| Intel GPU 硬件遥测（sysfs/hwmon） | 已完成 |
| eBPF CUDA 内核追踪 | 已完成 |
| OTel 语义约定合规（`hw.gpu.*`） | 已完成 |
| Prometheus `/metrics` 端点 | 规划中 |
| ROCm HIP 追踪（AMD eBPF） | 规划中 |
| 每进程 GPU 利用率（DRM fdinfo） | 规划中 |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

## 指标体系

GPU Collector 定义了一套完整的 GPU 硬件指标体系，涵盖功耗、时钟频率、能耗和错误等多个维度。

### 核心指标

| 指标名称 | 类型 | 单位 | 描述 | NVIDIA | AMD | Intel |
|---------|------|------|------|:------:|:---:|:-----:|
| `hw.gpu.power.draw` | Gauge | W | 当前功耗 | ✓ | ✓ | ✓ |
| `hw.gpu.power.limit` | Gauge | W | 功率限制 | ✓ | ✓ | ✓ |
| `hw.gpu.energy.consumed` | Counter | J | 累计能耗 | ✓ | ✓ | ✓ |
| `hw.gpu.clock.graphics` | Gauge | MHz | 图形/SM 时钟频率 | ✓ | ✓ | ✓* |
| `hw.gpu.clock.memory` | Gauge | MHz | 内存时钟频率 | ✓ | ✓ | — |
| `hw.errors` | Counter | {error} | ECC 和 PCIe 错误 | ✓ | — | — |

> * Intel 支持取决于驱动程序（i915/Xe）和内核版本

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

### 指标属性

所有 GPU 指标都携带以下标识属性，用于唯一定位和区分设备：

| 属性名称 | 描述 | 示例值 |
|---------|------|--------|
| `hw.id` | 设备唯一标识符（规范要求） | `GPU-a1b2c3d4-...` |
| `hw.name` | 产品名称 | `NVIDIA A100-SXM4-80GB` |
| `hw.vendor` | 厂商名称 | `nvidia`、`amd`、`intel` |
| `gpu.index` | 设备索引 | `0`、`1` |
| `gpu.pci_address` | PCI 总线地址 | `0000:01:00.0` |

### 额外属性

某些指标还包含额外的上下文属性：

| 指标名称 | 额外属性 | 可选值 |
|---------|---------|--------|
| `hw.gpu.utilization` | `hw.gpu.task` | `general`、`encoder`、`decoder` |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

## 系统架构

GPU Collector 的系统架构遵循模块化设计原则，主要包含以下核心模块：

```mermaid
graph TD
    A[GPU Collector 入口] --> B[GPU 数据采集层]
    A --> C[eBPF 内核追踪层]
    B --> D[指标处理与聚合]
    C --> D
    D --> E[OpenTelemetry 导出器]
    E --> F[OTLP Endpoint]
    E --> G[Prometheus Endpoint]
    
    B --> B1[NVML NVIDIA]
    B --> B2[sysfs hwmon AMD]
    B --> B3[sysfs hwmon Intel]
    
    style A fill:#e1f5fe
    style E fill:#fff3e0
    style F fill:#e8f5e9
    style G fill:#f3e5f5
```

### 采集层

采集层负责与底层 GPU 驱动和硬件接口交互，根据不同厂商采用相应的采集方式：

- **NVIDIA**：通过 NVML（NVIDIA Management Library）API 直接查询 GPU 状态
- **AMD**：通过 sysfs/hwmon 文件系统读取硬件传感器数据
- **Intel**：通过 sysfs/hwmon 文件系统读取硬件传感器数据（依赖特定驱动）

### eBPF 追踪层

eBPF（Extended Berkeley Packet Filter）模块用于实现 CUDA 内核级别的追踪能力，能够捕获 GPU 上的计算任务执行信息，提供细粒度的性能分析数据。

### 导出层

导出层负责将采集和处理的指标数据以标准格式输出。当前支持：

- **OTLP 协议**：通过 `OTEL_EXPORTER_OTLP_ENDPOINT` 配置发送至 OpenTelemetry Collector 或后端服务
- **Prometheus 格式**（规划中）：通过 `/metrics` 端点暴露数据供 Prometheus 抓取

## 部署方式

GPU Collector 提供多种部署方式以适应不同的使用场景。

### Docker 镜像

使用预构建的 Docker 镜像是最简便的部署方式：

```bash
docker run -d \
    --gpus all \
    --name otel-gpu-collector \
    -e OTEL_SERVICE_NAME=my-app \
    -e OTEL_RESOURCE_ATTRIBUTES="deployment.environment=production" \
    -e OTEL_EXPORTER_OTLP_ENDPOINT="http://otel-collector:4317" \
    ghcr.io/openlit/otel-gpu-collector:latest
```

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

### Docker Compose 集成

在已有 OpenLIT 栈的环境中，可以通过 Docker Compose 部署 GPU Collector：

```yaml
services:
  otel-gpu-collector:
    image: ghcr.io/openlit/otel-gpu-collector:latest
    environment:
      OTEL_SERVICE_NAME: my-app
      OTEL_RESOURCE_ATTRIBUTES: "deployment.environment=production"
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - otel-collector
    restart: always
```

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

### 二进制部署

对于无法使用 Docker 的环境，可以下载预编译的二进制文件：

```sh
# Linux amd64
curl -L https://github.com/openlit/openlit/releases/latest/download/opentelemetry-gpu-collector-<version>-linux-amd64 \
    -o opentelemetry-gpu-collector
chmod +x opentelemetry-gpu-collector

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 ./opentelemetry-gpu-collector
```

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

### 源码构建

从源码构建需要 Go 编译环境：

```sh
git clone https://github.com/openlit/openlit.git
cd openlit/opentelemetry-gpu-collector
make build
./opentelemetry-gpu-collector
```

## 配置说明

GPU Collector 通过标准 OpenTelemetry 环境变量进行配置，所有配置项如下表所示：

| 环境变量 | 默认值 | 描述 |
|---------|-------|------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | （必需） | OTLP 导出器端点地址 |
| `OTEL_SERVICE_NAME` | - | 服务名称标识 |
| `OTEL_RESOURCE_ATTRIBUTES` | - | 资源属性键值对 |

资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

## 与 SDK 的集成

GPU Collector 通常与 OpenLIT 提供的多语言 SDK 配合使用，形成完整的可观测性解决方案。SDK 负责在应用层面采集 LLM 调用追踪和指标，而 GPU Collector 则负责基础设施层面的硬件指标采集。

### Go SDK 配置示例

Go SDK 支持自定义定价信息和 OTLP 导出配置：

```go
config := openlit.Config{
    OtlpEndpoint: "http://localhost:4318",
    OtlpHeaders: map[string]string{
        "Authorization": "Bearer token",
        "X-Custom-Header": "value",
    },
    PricingInfo: map[string]openlit.ModelPricing{
        "gpt-4-custom": {
            InputCostPerToken:  0.00003,
            OutputCostPerToken: 0.00006,
        },
    },
}

openlit.Init(config)
```

资料来源：[sdk/go/README.md](https://github.com/openlit/openlit/blob/main/sdk/go/README.md)

### 集成流程

完整的集成流程如下：

1. 启动 OpenLIT 基础设施栈（包含 OpenTelemetry Collector、存储和可视化组件）
2. 在应用服务器上部署 GPU Collector 并配置 OTLP 端点
3. 在应用代码中集成 OpenLIT SDK，配置相同的 OTLP 端点
4. 在 OpenLIT Dashboard 中查看追踪和指标数据

```mermaid
graph LR
    A[应用代码 + SDK] -->|OTLP Traces| B[OpenTelemetry Collector]
    C[GPU Collector] -->|OTLP Metrics| B
    B --> D[OpenLIT Backend]
    D --> E[Dashboard]
```

## 技术依赖

GPU Collector 的正常运行依赖以下技术组件：

| 组件 | 用途 | 平台支持 |
|-----|------|---------|
| NVML | NVIDIA GPU 管理接口 | NVIDIA GPU |
| sysfs/hwmon | Linux 硬件传感器接口 | AMD、Intel GPU |
| eBPF | 内核级追踪 | Linux 4.x+ |
| OpenTelemetry Collector | 指标导出协议 | 全平台 |

## 许可说明

OpenTelemetry GPU Collector 由 OpenLIT 团队构建和维护，采用 Apache-2.0 开源许可证。资料来源：[opentelemetry-gpu-collector/README.md](https://github.com/openlit/openlit/blob/main/opentelemetry-gpu-collector/README.md)

---

---

## Doramagic 踩坑日志

项目：openlit/openlit

摘要：发现 15 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：安装坑 - 来源证据：Integration: Governance and compliance signals for LLM observability。

## 1. 安装坑 · 来源证据：Integration: Governance and compliance signals for LLM observability

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Integration: Governance and compliance signals for LLM observability
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_16e8a1979e4646f18ae6d36da1fd46fe | https://github.com/openlit/openlit/issues/1106 | 来源类型 github_issue 暴露的待验证使用条件。

## 2. 安装坑 · 来源证据：Proposal: gen_ai.agent.threat_detected span event helper for OTel-shaped detection observability

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Proposal: gen_ai.agent.threat_detected span event helper for OTel-shaped detection observability
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_9788255c9fb34a7eae64ba6413a52030 | https://github.com/openlit/openlit/issues/1186 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 3. 安装坑 · 来源证据：[Bug]: Docker Image doesn't run on windows 64bit

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: Docker Image doesn't run on windows 64bit
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_e25a08120daf4deb81b9193aeab1f929 | https://github.com/openlit/openlit/issues/786 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 4. 安装坑 · 来源证据：openlit-1.19.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：openlit-1.19.0
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_0504e467960f4bbe919ff101c6a14d7b | https://github.com/openlit/openlit/releases/tag/openlit-1.19.0 | 来源类型 github_release 暴露的待验证使用条件。

## 5. 配置坑 · 来源证据：controller-0.2.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：controller-0.2.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_addec19eec37420da207487d5a685eaa | https://github.com/openlit/openlit/releases/tag/controller-0.2.0 | 来源类型 github_release 暴露的待验证使用条件。

## 6. 配置坑 · 来源证据：openlit-1.20.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个配置相关的待验证问题：openlit-1.20.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_217968c917e9426f9f8fbb4b50bebdb5 | https://github.com/openlit/openlit/releases/tag/openlit-1.20.0 | 来源类型 github_release 暴露的待验证使用条件。

## 7. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:747319327 | https://github.com/openlit/openlit | README/documentation is current enough for a first validation pass.

## 8. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | last_activity_observed missing

## 9. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:747319327 | https://github.com/openlit/openlit | no_demo; severity=medium

## 10. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:747319327 | https://github.com/openlit/openlit | no_demo; severity=medium

## 11. 安全/权限坑 · 来源证据：Bug: OpenAI API key in operator example test-application is not using OPENAI_API_KEY env var

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Bug: OpenAI API key in operator example test-application is not using OPENAI_API_KEY env var
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_bfba0945570d4cbbaead1257e8f70dfe | https://github.com/openlit/openlit/issues/1135 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 12. 安全/权限坑 · 来源证据：openlit-1.19.1

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：openlit-1.19.1
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_b5088506959947828f2d740f9297d5b5 | https://github.com/openlit/openlit/releases/tag/openlit-1.19.1 | 来源类型 github_release 暴露的待验证使用条件。

## 13. 安全/权限坑 · 来源证据：py-1.41.2

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：py-1.41.2
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_ff3f4dfa2dc04616be73482b2145ac5c | https://github.com/openlit/openlit/releases/tag/py-1.41.2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 14. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | issue_or_pr_quality=unknown

## 15. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:747319327 | https://github.com/openlit/openlit | release_recency=unknown

<!-- canonical_name: openlit/openlit; human_manual_source: deepwiki_human_wiki -->