# https://github.com/chroma-core/chroma 项目说明书

生成时间：2026-05-15 23:02:55 UTC

## 目录

- [Chroma Overview](#chroma-overview)
- [Getting Started with Chroma](#getting-started)
- [System Architecture Overview](#architecture-overview)
- [Protocol Buffers & gRPC API](#protocol-buffers-api)
- [Python Client SDK](#python-client-sdk)
- [JavaScript/TypeScript Client SDKs](#javascript-client-sdk)
- [Rust Backend Services Architecture](#rust-services-architecture)
- [Go Coordinator & Distributed Systems](#go-coordinator)
- [Data Storage & Blockstore](#data-storage-blockstore)
- [Embedding Functions Integration](#embedding-functions)

<a id='chroma-overview'></a>

## Chroma Overview

### 相关页面

相关主题：[Getting Started with Chroma](#getting-started), [System Architecture Overview](#architecture-overview)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/chroma-core/chroma/blob/main/README.md)
- [clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)
- [clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)
- [rust/types/src/metadata.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/metadata.rs)
- [rust/types/src/api_types.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/api_types.rs)
- [rust/types/src/execution/operator.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/execution/operator.rs)
- [examples/deployments/do-terraform/README.md](https://github.com/chroma-core/chroma/blob/main/examples/deployments/do-terraform/README.md)
</details>

# Chroma Overview

## Introduction

Chroma is an open-source data infrastructure platform designed specifically for AI applications. It provides the foundational building blocks for storing, querying, and managing vector embeddings along with associated metadata, enabling developers to build AI-powered applications with efficient similarity search capabilities. 资料来源：[README.md:1]()

As an open-source solution, Chroma offers flexibility for self-hosting while also providing a cloud-hosted option called Chroma Cloud, which delivers serverless vector, hybrid, and full-text search capabilities. The platform is designed to be fast, cost-effective, scalable, and straightforward to deploy. 资料来源：[README.md:17-21]()

## Architecture Overview

Chroma follows a client-server architecture with multiple client libraries available for different programming environments. The system is built with Rust for core performance-critical components and provides idiomatic client libraries for Python and JavaScript/TypeScript.

```mermaid
graph TD
    A[Client Applications] --> B[Python Client / JS Client]
    B --> C[Chroma Server API]
    C --> D[Worker Nodes]
    D --> E[Blockstore<br/>Arrow Storage]
    D --> F[Compaction &<br/>Log Processing]
    E --> G[Persistent Storage]
    
    H[Chroma Cloud] -.->|Optional hosted| C
```

### Client Libraries

Chroma provides two primary client libraries:

| Client | Package | Description |
|--------|---------|-------------|
| Python | `chromadb` | Full-featured Python client library 资料来源：[clients/python/README.md:1]() |
| Python HTTP | `chromadb-client` | Lightweight HTTP-only client for server connections 资料来源：[clients/python/README.md:12]() |
| JavaScript/TypeScript | `chromadb` (npm) | Full-featured JS client for Node.js and browser 资料来源：[clients/new-js/packages/chromadb/README.md:1]() |

#### Python Client Installation

```bash
pip install chromadb  # Full client library
pip install chromadb-client  # HTTP client only
```

#### JavaScript Client Example

```javascript
import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });

for (let i = 0; i < 20; i++) {
  await collection.add({
    ids: ["test-id-" + i.toString()],
    embeddings: [[1, 2, 3, 4, 5]],
    documents: ["test"],
  });
}

const queryData = await collection.query({
  queryEmbeddings: [[1, 2, 3, 4, 5]],
  queryTexts: ["test"],
});
```

资料来源：[clients/new-js/packages/chromadb/README.md:9-27]()

## Data Model

### Collection Structure

Collections in Chroma serve as the primary organizational unit for storing related documents and their associated embeddings. Each collection contains:

- **Documents**: The textual content to be embedded
- **Embeddings**: Vector representations of documents
- **Metadatas**: Key-value pairs for filtering and categorization
- **Unique Identifiers**: User-provided IDs for each record 资料来源：[clients/python/README.md:16-27]()

### Metadata Filtering

Chroma supports rich metadata filtering through operators that enable precise data retrieval:

```mermaid
graph LR
    A[Query Request] --> B[Metadata Filter]
    B --> C{Operator Type}
    C -->|Contains| D[String contains check]
    C -->|NotContains| E[String excludes check]
    C -->|Regex| F[Regular expression match]
    C -->|NotRegex| G[Regex exclusion]
```

**Supported Document Operators:**

| Operator | Description | Example |
|----------|-------------|---------|
| `Contains` | Document contains substring | `{"$contains": "keyword"}` |
| `NotContains` | Document excludes substring | `{"$not_contains": "spam"}` |
| `Regex` | Regular expression match | `{"$regex": "^prefix.*"}` |
| `NotRegex` | Exclude by regex pattern | `{"$not_regex": ".*suffix$"}` |

资料来源：[rust/types/src/metadata.rs:1-30]()

### Search Keys

The query system supports specialized keys for accessing different aspects of stored data:

| Key | Description | Usage |
|-----|-------------|-------|
| `#document` | Full text content | `Key::Document` |
| `#embedding` | Vector embeddings | `Key::Embedding` |
| `#metadata` | Record metadata | `Key::Metadata` |
| `#score` | Similarity score | `Key::Score` |
| Custom fields | User-defined metadata | `Key::field("field_name")` |

资料来源：[rust/types/src/execution/operator.rs:1-80]()

## Core Components

### Storage Layer

The blockstore provides the underlying storage mechanism using Arrow format for efficient columnar data storage and retrieval. This enables high-performance queries across large datasets. 资料来源：[rust/blockstore/src/arrow/root.rs:1]()

### Execution Operators

Chroma's query execution pipeline uses operators that transform and filter data through well-defined stages:

```mermaid
graph TD
    A[Query Request] --> B[Log Fetch Orchestrator]
    B --> C[KNN Filter]
    C --> D[Apply Logs Orchestrator]
    D --> E[Segment Writers]
    E --> F[Compact Collection]
```

**Key Orchestrators:**

| Component | Purpose |
|-----------|---------|
| `LogFetchOrchestrator` | Fetches and materializes log entries 资料来源：[rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs:1]() |
| `KnnFilter` | Performs k-nearest neighbor filtering 资料来源：[rust/worker/src/execution/orchestration/knn_filter.rs:1]() |
| `ApplyLogsOrchestrator` | Applies log entries to segment writers 资料来源：[rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1]() |

### Error Handling

The system uses a consistent error code hierarchy for reliable error management:

| Error Code | Description |
|------------|-------------|
| `InvalidArgument` | Client-provided invalid parameters |
| `Internal` | System-level internal errors |
| `ResourceExhausted` | Resource limits reached (e.g., task abortion) |

资料来源：[rust/blockstore/src/arrow/block/types.rs:1-20]()

## Deployment Options

### Self-Hosting

Chroma can be deployed on-premises or in cloud environments using Docker, Kubernetes, or direct installation.

**Deployment Requirements:**

| Component | Specification |
|-----------|---------------|
| Storage | Persistent volume for vector data |
| Network | Port 8000 for API access |
| Auth | Optional token or basic authentication (v0.4.7+) |

资料来源：[examples/deployments/do-terraform/README.md:1-50]()

**Starting the Server:**

```bash
# Install via pip
pip install chromadb

# Run in client-server mode
chroma run --path /chroma_db_path
```

资料来源：[README.md:14-16]()

### Chroma Cloud

Chroma Cloud provides a fully managed hosted service with:

- Serverless vector search
- Hybrid search capabilities
- Full-text search integration
- Automatic scaling
- $5 free credits for new users

资料来源：[README.md:23-29]()

### Cloud Deployment (Terraform Example)

For DigitalOcean deployment:

```bash
export TF_VAR_do_token=<DIGITALOCEAN_TOKEN>
export TF_ssh_public_key="./chroma-do.pub"
export TF_ssh_private_key="./chroma-do"
export TF_VAR_chroma_release="0.4.12"
export TF_VAR_region="ams2"
export TF_VAR_public_access="true"
export TF_VAR_enable_auth="true"
export TF_VAR_auth_type="token"

terraform apply -auto-approve
```

资料来源：[examples/deployments/do-terraform/README.md:30-45]()

## CLI Tool

The Rust-based CLI provides command-line management capabilities:

```bash
chroma run --path <db_path>     # Run the server
chroma db create <db_name>      # Create database
chroma db list                  # List databases
chroma login                    # Authenticate with Chroma Cloud
chroma profile                  # Manage profiles
chroma install                  # Install updates
chroma update                   # Check for updates
```

资料来源：[rust/cli/src/lib.rs:1-30]()

## Embedding Integration

### Ollama Integration

The JavaScript client supports Ollama for local embedding generation:

**Configuration Options:**

| Option | Default | Description |
|--------|---------|-------------|
| `url` | `http://localhost:11434` | Ollama server URL |
| `model` | `chroma/all-minilm-l6-v2-f32` | Embedding model |

**Supported Models:**

| Model | Dimensions | Use Case |
|-------|------------|----------|
| `chroma/all-minilm-l6-v2-f32` | 384 | General purpose (default) |
| `nomic-embed-text` | 768 | Extended context |
| `mxbai-embed-large` | 1024 | High accuracy |
| `snowflake-arctic-embed` | Variable | Domain-specific |

资料来源：[clients/new-js/packages/ai-embeddings/ollama/README.md:1-40]()

## API Response Format

### Get Response Structure

Query results are returned with flexible inclusion options:

```rust
pub struct GetResponse {
    pub ids: Vec<String>,
    pub embeddings: Option<Vec<Vec<f32>>>,      // Optional
    pub documents: Option<Vec<Option<String>>>, // Optional
    pub uris: Option<Vec<Option<String>>>,      // Optional
    pub metadatas: Option<Vec<Option<Metadata>>>, // Optional
    pub include: IncludeList,
}
```

资料来源：[rust/types/src/api_types.rs:1-30]()

## License

Chroma is released under the Apache 2.0 license, making it suitable for both commercial and open-source projects. 资料来源：[README.md:10]()

## Community and Support

| Resource | Link |
|----------|------|
| Documentation | https://docs.trychroma.com/ |
| Discord | https://discord.gg/MMeYNTmh3x |
| Homepage | https://www.trychroma.com/ |

---

<a id='getting-started'></a>

## Getting Started with Chroma

### 相关页面

相关主题：[Chroma Overview](#chroma-overview), [Python Client SDK](#python-client-sdk)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)
- [clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)
- [clients/js/packages/chromadb-client/README.md](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/README.md)
- [clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)
- [chromadb/utils/embedding_functions/schemas/README.md](https://github.com/chroma-core/chroma/blob/main/chromadb/utils/embedding_functions/schemas/README.md)
- [README.md](https://github.com/chroma-core/chroma/blob/main/README.md)
- [rust/chroma/README.md](https://github.com/chroma-core/chroma/blob/main/rust/chroma/README.md)
</details>

# Getting Started with Chroma

Chroma is an open-source data infrastructure for AI that provides vector, hybrid, and full-text search capabilities. It enables developers to build AI applications by storing embeddings, documents, and metadata with efficient querying mechanisms.

## Overview

Chroma serves as a vector database optimized for AI workloads. It allows you to:

- Store embeddings alongside documents and metadata
- Query using text or embedding vectors
- Filter results based on metadata
- Work with multiple programming languages including Python and JavaScript

## Installation

### Python Client

Install the Python client using pip:

```bash
pip install chromadb
```

For a lightweight HTTP-only client that connects to a Chroma server:

```bash
pip install chromadb-client
```

资料来源：[clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)

### JavaScript/TypeScript Client

For the new JavaScript client:

```bash
npm install chromadb
```

For a lighter package with optional dependencies:

```bash
npm install chromadb-client
```

资料来源：[clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)

## Basic Setup and Configuration

### Python Client Setup

Connect to a Chroma server running locally:

```python
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
```

资料来源：[clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)

### JavaScript Client Setup

```javascript
import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });
```

资料来源：[clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)

### Running Chroma Server

To run Chroma in client-server mode:

```bash
chroma run --path /chroma_db_path
```

资料来源：[README.md](https://github.com/chroma-core/chroma/blob/main/README.md)

## Core Operations

### Creating a Collection

Collections are containers for your documents, embeddings, and metadata.

```python
collection = client.create_collection("all-my-documents")
```

### Adding Documents

Add documents with optional embeddings, metadata, and unique IDs:

```python
collection.add(
    documents=["This is document1", "This is document2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)
```

资料来源：[clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)

### Querying Documents

Query the collection using text or embeddings:

```python
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)
```

```javascript
const queryData = await collection.query({
    queryEmbeddings: [[1, 2, 3, 4, 5]],
    queryTexts: ["test"],
});
```

资料来源：[clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md) and [clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)

## Embedding Functions

Chroma supports various embedding providers through configurable embedding functions.

### Configuration Schema

Embedding functions use JSON Schema validation to ensure cross-language compatibility:

```python
from chromadb.utils.embedding_functions.schemas import validate_config

config = {
    "api_key_env_var": "CHROMA_OPENAI_API_KEY",
    "model_name": "text-embedding-ada-002"
}
validate_config(config, "openai")
```

Each schema follows JSON Schema Draft-07 specification and includes version, title, description, properties, required fields, and additionalProperties settings.

资料来源：[chromadb/utils/embedding_functions/schemas/README.md](https://github.com/chroma-core/chroma/blob/main/chromadb/utils/embedding_functions/schemas/README.md)

### Available Embedding Providers

| Provider | Package | API Key Environment Variable |
|----------|---------|------------------------------|
| OpenAI | `@chroma-core/openai` | `CHROMA_OPENAI_API_KEY` |
| Cohere | `@chroma-core/cohere` | `COHERE_API_KEY` |
| Jina | `@chroma-core/jina` | `JINA_API_KEY` |
| Google Gemini | `@chroma-core/google-gemini` | `GOOGLE_API_KEY` |
| Hugging Face | `@chroma-core/hugging-face` | `HF_API_KEY` |
| Ollama | `@chroma-core/ollama` | `OLLAMA_API_KEY` |
| Together AI | `@chroma-core/together-ai` | `TOGETHER_API_KEY` |
| Voyage AI | `@chroma-core/voyageai` | `VOOYAGE_API_KEY` |
| xAI | `@chroma-core/xai` | `XAI_API_KEY` |

资料来源：[clients/new-js/packages/ai-embeddings/all/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/all/README.md)

### Using Embedding Functions

```typescript
import { ChromaClient } from 'chromadb';
import { JinaEmbeddingFunction } from '@chroma-core/jina';

const embedder = new JinaEmbeddingFunction({
    apiKey: 'your-api-key',
    modelName: 'jina-embeddings-v2-base-en',
    task: 'retrieval.passage',
    dimensions: 768,
    lateChunking: false,
    truncate: true,
    normalized: true,
    embeddingType: 'float'
});

const collection = await client.createCollection({
    name: 'my-collection',
    embeddingFunction: embedder,
});
```

资料来源：[clients/new-js/packages/ai-embeddings/jina/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/jina/README.md)

### Common Utilities

The `@chroma-core/ai-embeddings-common` package provides shared utilities:

```typescript
import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';

// Convert camelCase to snake_case
const snakeCaseConfig = snakeCase({ modelName: 'text-embedding-3-small' });
// Result: { model_name: 'text-embedding-3-small' }

// Check environment
if (isBrowser()) {
    // Browser-specific logic
}
```

资料来源：[clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)

## JavaScript Client Packages

### chromadb vs chromadb-client

| Feature | `chromadb` | `chromadb-client` |
|---------|------------|-------------------|
| Package size | Larger | Smaller |
| Dependencies | Bundled | Optional peer dependencies |
| Use case | Quick setup | Production with specific providers |

The `chromadb-client` package is ideal for production environments where you only use specific embedding providers.

资料来源：[clients/js/packages/chromadb-client/README.md](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/README.md)

## Chroma Cloud

Chroma Cloud provides a hosted service for serverless vector, hybrid, and full-text search. To use Chroma Cloud:

1. Sign up at [trychroma.com](https://trychroma.com/signup)
2. Create a database
3. Get your API key from the dashboard

Configure environment variables for cloud access:

```bash
export CHROMA_API_KEY=your-api-key
export CHROMA_TENANT=your-tenant
export CHROMA_DATABASE=your-database
```

资料来源：[README.md](https://github.com/chroma-core/chroma/blob/main/README.md) and [rust/chroma/README.md](https://github.com/chroma-core/chroma/blob/main/rust/chroma/README.md)

## Environment Variables

| Variable | Description |
|----------|-------------|
| `CHROMA_API_KEY` | API key for Chroma Cloud authentication |
| `CHROMA_TENANT` | Sets the tenant (auto-inferred with API key) |
| `CHROMA_DATABASE` | Sets the database (auto-inferred with scoped API key) |
| `[PROVIDER]_API_KEY` | Provider-specific API keys (e.g., `OPENAI_API_KEY`) |

For local development, you can use:

```rust
let client = ChromaHttpClient::from_env()?;
```

资料来源：[rust/chroma/README.md](https://github.com/chroma-core/chroma/blob/main/rust/chroma/README.md)

## Complete Example Workflow

```mermaid
graph TD
    A[Install Chroma Client] --> B[Initialize Client]
    B --> C[Create Collection]
    C --> D[Add Documents with Embeddings]
    D --> E[Query Collection]
    E --> F[Get Results]
    
    G[Configure Embedding Function] --> D
    H[Add Metadata] --> D
    I[Set API Keys] --> B
```

## Quick Reference Commands

### Installation

```bash
# Python
pip install chromadb

# JavaScript
npm install chromadb

# Start server
chroma run --path /chroma_db_path
```

### Basic Operations

| Operation | Python | JavaScript |
|-----------|--------|------------|
| Create client | `client = chromadb.HttpClient()` | `new ChromaClient()` |
| Create collection | `client.create_collection(name)` | `client.createCollection({name})` |
| Add documents | `collection.add(...)` | `collection.add(...)` |
| Query | `collection.query(...)` | `collection.query(...)` |

## Additional Resources

- [Documentation](https://docs.trychroma.com/)
- [Community Discord](https://discord.gg/MMeYNTmh3x)
- [GitHub Repository](https://github.com/chroma-core/chroma)
- [Homepage](https://www.trychroma.com/)

---

<a id='architecture-overview'></a>

## System Architecture Overview

### 相关页面

相关主题：[Rust Backend Services Architecture](#rust-services-architecture), [Go Coordinator & Distributed Systems](#go-coordinator), [Protocol Buffers & gRPC API](#protocol-buffers-api)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [rust/frontend/src/server.rs](https://github.com/chroma-core/chroma/blob/main/rust/frontend/src/server.rs)
- [rust/worker/src/server.rs](https://github.com/chroma-core/chroma/blob/main/rust/worker/src/server.rs)
- [rust/sysdb/src/sysdb.rs](https://github.com/chroma-core/chroma/blob/main/rust/sysdb/src/sysdb.rs)
- [rust/types/src/lib.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/lib.rs)
- [docs/mintlify/reference/architecture/overview.mdx](https://github.com/chroma-core/chroma/blob/main/docs/mintlify/reference/architecture/overview.mdx)
</details>

# System Architecture Overview

## Introduction

Chroma is an open-source data infrastructure platform designed for AI applications, providing vector, hybrid, and full-text search capabilities. The system is built as a distributed, scalable architecture that handles embedding storage, indexing, and query execution across multiple components. Chroma positions itself as the open-source alternative to hosted vector database services, enabling developers to deploy sophisticated AI search infrastructure while maintaining full control over their data.

The architecture follows a modular design pattern with distinct components for API serving, query processing, data storage, and system coordination. Each component is responsible for specific aspects of the data pipeline, from ingestion through indexing to query execution.

## High-Level Architecture

Chroma's architecture consists of three primary layers working in concert to provide vector search capabilities:

1. **Frontend Layer** - Handles API requests and response formatting
2. **Worker Layer** - Executes query operations and manages indexing
3. **System Database (SysDB) Layer** - Maintains metadata and system state

```mermaid
graph TD
    A[Client Application] --> B[Frontend Server]
    B --> C[Worker Servers]
    C --> D[SysDB]
    C --> E[Blockstore]
    E --> F[Arrow Files]
    D --> G[Collection Metadata]
    G --> H[Topology Information]
```

## Component Architecture

### Frontend Server

The frontend server component serves as the API gateway for Chroma, handling incoming HTTP/gRPC requests and translating them into internal operations. The frontend is responsible for request validation, authentication handling, and response serialization.

**Key Responsibilities:**

| Responsibility | Description |
|----------------|-------------|
| API Endpoint Handling | Exposes REST and gRPC endpoints for collection operations |
| Request Validation | Validates incoming query parameters and payload structures |
| Response Serialization | Converts internal data structures to API response formats |
| Error Mapping | Translates internal errors to appropriate HTTP status codes |

资料来源：[rust/frontend/src/server.rs:1-50]()

The frontend server implements the `ChromaError` trait for consistent error handling across the system. Error codes are mapped as follows:

| Internal Error | HTTP Status Code |
|----------------|------------------|
| InvalidArgument | 400 Bad Request |
| NotFound | 404 Not Found |
| Internal | 500 Internal Server Error |
| Unavailable | 503 Service Unavailable |

### Worker Server

The worker server handles the core data operations including embedding storage, indexing, and query execution. Workers are the primary compute units in Chroma's architecture, responsible for processing search requests and maintaining index structures.

资料来源：[rust/worker/src/server.rs:1-60]()

**Worker Components:**

```mermaid
graph LR
    A[Query Request] --> B[Query Planner]
    B --> C[HNSW Index]
    B --> D[Spann Index]
    B --> E[Record Segment]
    B --> F[Metadata Segment]
    C --> G[Result Merger]
    D --> G
    E --> G
    F --> G
    G --> H[Response]
```

The worker server implements orchestration components for managing complex operations:

- **ApplyLogsOrchestrator** - Coordinates log application and compaction
- **WorkQueueClient** - Manages distributed task execution
- **Segment Writers** - Handles data persistence for different segment types

资料来源：[rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-80]()

### System Database (SysDB)

The SysDB component maintains all metadata about collections, segments, and system topology. It provides a centralized view of the system's state and enables coordination across multiple workers.

**SysDB Responsibilities:**

| Function | Description |
|----------|-------------|
| Collection Metadata | Stores collection configurations and schemas |
| Segment Registry | Tracks active segments and their locations |
| Topology Management | Manages provider-region mappings for distributed deployments |
| Transaction Coordination | Ensures consistency across distributed operations |

资料来源：[rust/sysdb/src/sysdb.rs:1-100]()

The SysDB uses a provider-region topology model that supports multi-cloud and multi-region deployments:

```rust
pub struct ProviderRegion<T> {
    name: RegionName,
    provider: String,      // e.g., "aws", "gcp"
    region: String,        // e.g., "us-east-1"
    config: T,             // Provider-specific configuration
}
```

资料来源：[rust/types/src/topology.rs:1-60]()

## Data Model Architecture

### Collection Schema

Collections in Chroma follow a flexible schema model that supports multiple index types and data fields.

```mermaid
graph TD
    A[Collection] --> B[Record Segment]
    A --> C[Metadata Segment]
    A --> D[Vector Index]
    A --> E[Sparse Vector Index]
    D --> F[HNSW Index]
    D --> G[Spann Index]
```

**Supported Index Types:**

| Index Type | Purpose | Key Configuration |
|------------|---------|-------------------|
| Vector Index | Dense embeddings | `Space` (Cosine, L2, Dot), HNSW params |
| Sparse Vector Index | BM25-style inverted index | StringInvertedIndexConfig |
| Spann Index | Memory-efficient approximate search | InternalSpannConfiguration |

资料来源：[rust/types/src/collection_schema.rs:1-150]()

### API Types

The API layer defines core types for query operations:

| Type | Purpose |
|------|---------|
| `Include` | Specifies which fields to return (distances, documents, embeddings, metadatas, uris) |
| `IncludeList` | Collection of Include values with convenience constructors |
| `WhereDocumentOperator` | Document filtering (Contains, NotContains, Regex, NotRegex) |

资料来源：[rust/types/src/api_types.rs:1-100]()

```rust
pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}

impl IncludeList {
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, Include::Embedding, Include::Uri])
    }
}
```

### Metadata Filtering

Chroma supports rich metadata filtering through the `MetadataExpression` and `MetadataComparison` types:

```mermaid
graph TD
    A[MetadataExpression] --> B[key: String]
    A --> C[comparison: MetadataComparison]
    C --> D[Primitive: Operator + Value]
    C --> E[Set: Operator + SetValue]
```

资料来源：[rust/types/src/metadata.rs:1-80]()

## Blockstore Architecture

The blockstore provides persistent storage for indexed data using Apache Arrow format for efficient serialization and querying.

### Arrow Block Structure

```mermaid
graph LR
    A[Write Operation] --> B[Block Delta]
    B --> C[Commit to Block]
    C --> D[Arrow IPC Format]
    D --> E[Disk Storage]
    E --> F[BlockfileReader]
```

**Block Types:**

| Block Type | Description |
|------------|-------------|
| `OrderedBlockDelta` | Sequential writes with ordering guarantees |
| `UnorderedBlockDelta` | High-throughput writes without ordering |
| `DirectoryBlock` | Sparse posting directory entries |

资料来源：[rust/blockstore/src/arrow/block/types.rs:1-100]()

The Arrow layout verification ensures data integrity:

```rust
pub enum ArrowLayoutVerificationError {
    BufferLengthNotAligned,
    NoRecordBatches,
    MultipleRecordBatches,
    InvalidMessageType,
    RecordBatchDecodeError,
}
```

### Sparse Posting Blocks

Sparse vectors use a specialized block format for efficient storage:

```
body = [ max_offset: u32 LE, max_weight: f32 LE ] × num_entries
```

The `DirectoryBlock` stores per-posting-block metadata for term pruning:

- `max_offset`: Largest document offset in the posting block
- `max_weight`: Largest weight in the posting block

资料来源：[rust/types/src/sparse_posting_block.rs:1-60]()

## Spann Index Architecture

Spann is Chroma's memory-efficient approximate nearest neighbor index that combines HNSW with posting lists.

```mermaid
graph TD
    A[SpannIndexWriter] --> B[HNSW Index]
    A --> C[Posting Lists]
    A --> D[Versions Map]
    A --> E[MaxHeadID Blockfile]
    B --> F[Reader with adaptive search]
```

**SpannIndexReader Structure:**

| Component | Type | Purpose |
|-----------|------|---------|
| posting_lists | BlockfileReader<u32, SpannPostingList> | Term postings |
| hnsw_index | HnswIndexRef | Graph-based search |
| versions_map | BlockfileReader<u32, u32> | Version tracking |
| dimensionality | usize | Vector dimension |
| adaptive_search_nprobe | bool | Adaptive parameter |

资料来源：[rust/index/src/spann/types.rs:1-80]()

## Indexing Pipeline

The indexing pipeline handles document ingestion through the following stages:

```mermaid
graph LR
    A[Add Records] --> B[ApplyLogsOrchestrator]
    B --> C[Record Segment Writer]
    B --> D[Metadata Segment Writer]
    B --> E[Vector Index Writer]
    C --> F[Flush to Blockstore]
    D --> F
    E --> F
    F --> G[Collection Update]
```

**Error Handling:**

The orchestrator implements comprehensive error tracking:

| Error Type | Error Code | Tracing |
|------------|------------|---------|
| ApplyLog | Internal | Yes |
| Channel | Internal | Yes |
| Commit | Internal | Yes |
| HnswSegment | Internal | Yes |
| MetadataSegment | Internal | Yes |
| Seal | Internal | Yes |
| InvariantViolation | - | Always |

资料来源：[rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-100]()

## Query Execution Flow

### Query Request Processing

```mermaid
graph TD
    A[Query Request] --> B[Parse Query]
    B --> C[Load Segments]
    C --> D[Parallel Segment Queries]
    D --> E{HNSW Search}
    D --> F{Spann Search}
    D --> G{Record Scan}
    E --> H[Merge Results]
    F --> H
    G --> H
    H --> I[Apply Filters]
    I --> J[Return Results]
```

### Work Queue Integration

Distributed query execution uses a work queue system for task coordination:

```mermaid
graph TD
    A[Coordinator] --> B[WorkQueueClient]
    B --> C[gRPC Channel]
    C --> D[Worker Pool]
    D --> E[Task Execution]
    E --> F[Result Collection]
```

**Error Code Mapping:**

| gRPC Code | Chroma Error Code |
|-----------|-------------------|
| Unavailable | Unavailable |
| DeadlineExceeded | DeadlineExceeded |
| ResourceExhausted | ResourceExhausted |
| NotFound | NotFound |
| InvalidArgument | InvalidArgument |

资料来源：[rust/worker/src/work_queue/work_queue_client.rs:1-80]()

## Deployment Topology

Chroma supports flexible deployment configurations through its topology model:

```mermaid
graph TD
    A[Topology] --> B[TopologyName]
    A --> C[Vec<RegionName>]
    A --> D[Config T]
    C --> E[ProviderRegion]
    E --> F[Provider]
    E --> G[Region]
```

The topology system enables:

- Multi-cloud deployments (AWS, GCP, Azure)
- Region-specific configurations
- Custom provider extensions

## Summary

Chroma's architecture provides a scalable foundation for AI-powered search with several key design principles:

1. **Separation of Concerns** - Frontend, worker, and SysDB components handle distinct responsibilities
2. **Arrow-Based Storage** - Efficient columnar storage for analytical queries
3. **Flexible Indexing** - Support for HNSW, Spann, and sparse vector indexes
4. **Distributed Coordination** - Work queues and topology management for multi-node deployments
5. **Comprehensive Error Handling** - Consistent error codes and tracing across all components

The modular architecture allows Chroma to scale from single-node development deployments to distributed production clusters serving AI applications at scale.

---

<a id='protocol-buffers-api'></a>

## Protocol Buffers & gRPC API

### 相关页面

相关主题：[System Architecture Overview](#architecture-overview), [Rust Backend Services Architecture](#rust-services-architecture)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [rust/types/src/record.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/record.rs)
- [rust/types/src/metadata.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/metadata.rs)
- [rust/types/src/collection_schema.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/collection_schema.rs)
- [rust/types/src/topology.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/topology.rs)
- [clients/js/packages/chromadb-core/src/generated/api.ts](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-core/src/generated/api.ts)
- [go/README.md](https://github.com/chroma-core/chroma/blob/main/go/README.md)
- [rust/blockstore/src/arrow/root.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/root.rs)
</details>

# Protocol Buffers & gRPC API

Chroma uses Protocol Buffers (protobuf) as the core serialization format for inter-service communication and data persistence. The IDL (Interface Definition Language) files in the `idl/` directory define the service APIs, data structures, and message types that power Chroma's distributed architecture.

## Architecture Overview

Chroma employs a client-server architecture where Protocol Buffers serve as the contract between components. The protobuf definitions are centralized in the `idl/` directory and used to generate code for multiple language runtimes including Python, JavaScript, Go, and Rust.

```mermaid
graph TD
    subgraph "Client Layer"
        JS[JavaScript Client]
        PY[Python Client]
        GO[Go Client]
    end
    
    subgraph "IDL Definitions"
        PROTO[Protocol Buffer Definitions]
    end
    
    subgraph "Server Layer"
        API[API Server]
        COORD[Coordinator Service]
        QUERY[Query Executor]
    end
    
    JS -->|Generated TS Bindings| PROTO
    PY -->|Generated Python Stub| PROTO
    GO -->|Generated Go Code| PROTO
    API -->|gRPC/prost| PROTO
    COORD -->|gRPC/prost| PROTO
    QUERY -->|gRPC/prost| PROTO
```

## Proto Definitions Structure

### Core Service Definitions

The main protobuf definitions are organized in `idl/chromadb/proto/`:

| Proto File | Purpose | Key Messages |
|------------|---------|--------------|
| `chroma.proto` | Core data types and collection operations | Collection, Database, OperationRecord |
| `coordinator.proto` | Coordinator service for cluster management | Tenant, Database, Segment operations |
| `query_executor.proto` | Query execution service interface | Query requests and responses |

### Data Type Coverage

The protobuf definitions cover all core data types used throughout Chroma:

| Data Type | Usage |
|-----------|-------|
| `Vector` | Embedding vectors with scalar encoding |
| `OperationRecord` | CRUD operations for records |
| `LogRecord` | Write-ahead log entries with offsets |
| `Metadata` | Key-value metadata for filtering |
| `Collection` | Collection configuration and schema |
| `Cmek` | Customer-managed encryption keys |

## Rust Type Conversions

Chroma's Rust backend uses protobuf-generated types and converts them to idiomatic Rust types through `TryFrom` implementations. This pattern ensures type safety and clean separation between the wire format and internal representations.

### Record Conversions

The `rust/types/src/record.rs` file contains conversion logic between protobuf and Rust types:

```mermaid
graph LR
    A[chroma_proto::LogRecord] -->|TryFrom| B[LogRecord Rust]
    A2[chroma_proto::Vector] -->|TryFrom| B2[(Vec<f32>, ScalarEncoding)]
```

**OperationRecord Conversion** (资料来源：[rust/types/src/record.rs:recordinfo]())

The `OperationRecord` conversion extracts metadata and document fields from protobuf representations:

```rust
// Metadata is extracted from proto, with document potentially in metadata
let (metadata, document) = match operation_record_proto.metadata {
    Some(proto_metadata) => match UpdateMetadata::try_from(proto_metadata) {
        Ok(mut metadata) => {
            let document = metadata.remove(CHROMA_DOCUMENT_KEY);
            match document {
                Some(UpdateMetadataValue::Str(document)) => {
                    (Some(metadata), Some(document))
                }
                _ => (Some(metadata), None),
            }
        }
        Err(e) => return Err(RecordConversionError::...),
    },
    None => (None, None),
};
```

### Vector Type Conversions

Vectors are stored with their encoding information (资料来源：[rust/types/src/record.rs:vector]())

```rust
impl TryFrom<chroma_proto::Vector> for (Vec<f32>, ScalarEncoding) {
    type Error = VectorConversionError;
    // Conversion implementation
}
```

## Metadata Filtering Types

The metadata system supports rich filtering expressions defined in protobuf and converted to Rust types (资料来源：[rust/types/src/metadata.rs:metadata-types]())

### Document Operators

```mermaid
graph TD
    DOC_OPS[WhereDocumentOperator] --> Contains
    DOC_OPS --> NotContains
    DOC_OPS --> Regex
    DOC_OPS --> NotRegex
```

| Operator | Description |
|----------|-------------|
| `Contains` | Document contains substring |
| `NotContains` | Document does not contain substring |
| `Regex` | Document matches regex pattern |
| `NotRegex` | Document does not match regex pattern |

### Metadata Expression Structure

```rust
pub struct MetadataExpression {
    pub key: String,
    pub comparison: MetadataComparison,
}
```

Metadata comparisons support both primitive types (strings, integers, floats, booleans) and set operations.

## Collection Schema Definitions

Schema definitions in `rust/types/src/collection_schema.rs` define how collections are configured for indexing (资料来源：[rust/types/src/collection_schema.rs:schema-struct]())

### Schema Builder Pattern

The `Schema` struct provides a fluent builder API for index configuration:

```mermaid
graph TD
    SCHEMA[Schema::default] --> CREATE_INDEX[.create_index]
    CREATE_INDEX --> VALIDATE[Validate Index Config]
    VALIDATE -->|Valid| RETURN[Return Self]
    VALIDATE -->|Invalid| ERROR[SchemaBuilderError]
```

**Index Creation Example** (资料来源：[rust/types/src/collection_schema.rs:create-index-example]())

```rust
let schema = Schema::default()
    .create_index(None, VectorIndexConfig {
        space: Some(Space::Cosine),
        embedding_function: None,
        source_key: None,
        hnsw: None,
        spann: None,
    }.into())?
    .create_index(Some("category"), StringInvertedIndexConfig {}.into())?;
```

### Supported Index Types

| Index Type | Configuration | Applies To |
|------------|---------------|------------|
| `VectorIndexConfig` | HNSW, Space (Cosine/L2/IP), embedding function | `#embedding` key only |
| `StringInvertedIndexConfig` | String indexing | Custom string keys |
| `FtsIndexConfig` | Full-text search | Document key |

## CMEK (Customer-Managed Encryption Keys)

Chroma supports customer-managed encryption keys through the `Cmek` type defined in protobuf (资料来源：[rust/types/src/collection_schema.rs:cmek]())

### CMEK Provider Configuration

| Provider | Validation Pattern | Resource Format |
|----------|-------------------|-----------------|
| GCP | `CMEK_GCP_RE` regex | GCP resource identifier |

```rust
impl Cmek {
    pub fn gcp(resource: String) -> Self;
    pub fn validate_pattern(&self) -> bool;
}
```

## Topology and Region Management

For multi-region deployments, Chroma uses topology definitions (资料来源：[rust/types/src/topology.rs:topology]())

### Provider Region Structure

```mermaid
classDiagram
    class ProviderRegion {
        +name: RegionName
        +provider: String
        +region: String
        +config: T
    }
    
    class Topology {
        +name: TopologyName
        +regions: Vec~RegionName~
        +config: T
    }
```

| Component | Description |
|-----------|-------------|
| `ProviderRegion` | Single cloud provider region configuration |
| `Topology` | Collection of regions forming a deployment topology |

## Code Generation Pipeline

### Build Process

Protobuf definitions are compiled to target languages using `protoc` and language-specific plugins (资料来源：[go/README.md:protobuf-setup]())

```mermaid
graph LR
    A[.proto files] --> B[protoc compiler]
    B -->|Python| C[Python stubs]
    B -->|Go| D[Go gRPC code]
    B -->|JS/TS| E[TypeScript definitions]
    B -->|Rust| F[Rust + prost]
```

### Required Tools

| Tool | Purpose |
|------|---------|
| `protoc` | Protocol Buffer compiler |
| `protoc-gen-go` | Go code generation |
| `protoc-gen-go-grpc` | Go gRPC service generation |

### Generated API Patterns

The generated TypeScript API in `clients/js/packages/chromadb-core/src/generated/api.ts` follows standard gRPC-web patterns (资料来源：[clients/js/packages/chromadb-core/src/generated/api.ts:fetch-pattern]())

```typescript
const localVarFetchArgs = ApiApiFetchParamCreator(configuration).version(options);
return (fetch: FetchAPI = defaultFetch, basePath: string = BASE_PATH) => {
    return fetch(
        basePath + localVarFetchArgs.url,
        localVarFetchArgs.options,
    ).then((response) => {
        // Handle response by content type and status
        if (response.status === 200) {
            if (mimeType === "application/json") {
                return response.json();
            }
        }
        // Error handling for 401, 404, 409, 500
    });
};
```

### Error Code Mapping

Error types are mapped from Rust/Arrow errors to Chroma error codes (资料来源：[rust/blockstore/src/arrow/root.rs:error-mapping]())

| Arrow Error Type | Chroma Error Code |
|-----------------|-------------------|
| `IOError` | `Internal` |
| `ArrowError` | `Internal` |
| `LayoutVerificationError` | `Internal` |
| `FromBytesError` variants | `InvalidArgument` / `Internal` |

## Message Format Details

### Arrow Block Serialization

Binary data in protobuf messages uses Arrow IPC format for efficient columnar storage (资料来源：[rust/blockstore/src/arrow/root.rs:arrow-reader]())

```rust
let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
let record_batch = match arrow_reader {
    Ok(mut reader) => match reader.next() {
        Some(Ok(batch)) => batch,
        Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
        None => return Err(FromBytesError::NoDataError),
    },
    Err(e) => return Err(FromBytesError::ArrowError(e)),
};
```

### IPC Footer Structure

The Arrow footer format requires:
- ARROW_MAGIC header (6 bytes)
- Footer content
- Footer length (4 bytes)
- Footer checksum

## See Also

- [Rust Types Module](rust/types/src/) - Internal Rust type definitions
- [Block Store Architecture](rust/blockstore/) - Data persistence with Arrow
- [Client SDKs](clients/) - Multi-language client implementations
- [Go Server Implementation](go/) - Server-side gRPC implementation

---

<a id='python-client-sdk'></a>

## Python Client SDK

### 相关页面

相关主题：[Getting Started with Chroma](#getting-started), [JavaScript/TypeScript Client SDKs](#javascript-client-sdk), [Embedding Functions Integration](#embedding-functions)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [chromadb/api/client.py](https://github.com/chroma-core/chroma/blob/main/chromadb/api/client.py)
- [chromadb/api/async_client.py](https://github.com/chroma-core/chroma/blob/main/chromadb/api/async_client.py)
- [chromadb/api/models/Collection.py](https://github.com/chroma-core/chroma/blob/main/chromadb/api/models/Collection.py)
- [chromadb/api/types.py](https://github.com/chroma-core/chroma/blob/main/chromadb/api/types.py)
- [clients/python/pyproject.toml](https://github.com/chroma-core/chroma/blob/main/clients/python/pyproject.toml)
</details>

# Python Client SDK

The Chroma Python Client SDK is the official Python library for interacting with Chroma, an open-source vector database designed for AI applications. This SDK provides a complete interface for managing collections, storing embeddings, and performing similarity searches across vector data.

## Overview

Chroma positions itself as the open-source data infrastructure for AI, offering developers a streamlined way to incorporate vector search capabilities into their applications. The Python Client SDK serves as the primary client library for Python developers, enabling seamless integration with Chroma's vector database capabilities.

The SDK supports two primary modes of operation: **embedded mode**, where the database runs locally within the same process, and **client-server mode**, where the Python client communicates with a remote Chroma server via HTTP. This flexibility allows developers to choose the deployment architecture that best fits their application requirements, whether they need a lightweight local setup for development and testing or a scalable server-based deployment for production environments.

For Python-specific installations, developers can choose between the full `chromadb` package, which includes all embedding libraries as dependencies, or the `chromadb-client` package, which is a lightweight HTTP-only client that connects to a running Chroma server. The installation is straightforward via pip, making it accessible for projects of all sizes.

The SDK is designed with developer productivity in mind, providing intuitive APIs for common operations like adding documents, querying collections, and managing metadata. It handles the complexity of embedding generation and vector storage behind a clean, Pythonic interface, allowing developers to focus on building their AI applications rather than managing low-level database operations.

## Architecture

The Python Client SDK follows a layered architecture that separates concerns between the client interface, API communication, and data models. Understanding this architecture helps developers effectively use the SDK and troubleshoot any issues that may arise during development.

```mermaid
graph TD
    A[Application Code] --> B[ChromaClient / AsyncChromaClient]
    B --> C[Collection API]
    B --> D[Embedding Functions]
    C --> E[REST API Layer]
    D --> F[External Embedding Providers]
    E --> G[Chroma Server]
    E --> H[Embedded Mode]
    G --> I[Persistent Storage]
    H --> I
```

### Client Layer

The client layer forms the entry point for all SDK operations. Chroma provides two client implementations: the synchronous `Client` class for traditional Python applications and the `AsyncClient` class for asynchronous applications built with async/await patterns.

The synchronous client is suitable for most use cases, providing blocking API calls that execute immediately and return results. This approach is familiar to developers coming from traditional Python backgrounds and works well in scripts, batch processing jobs, and web applications that don't require high concurrency.

The asynchronous client, on the other hand, is designed for applications that need to handle many concurrent operations efficiently, such as web servers built on frameworks like FastAPI or Starlette. By using Python's asyncio library, the async client can perform multiple network operations concurrently, improving throughput in I/O-bound scenarios.

Both clients share a similar interface, with the async client simply wrapping the underlying HTTP calls with async/await syntax. This consistency makes it easy to switch between synchronous and asynchronous code as requirements evolve.

### Collection Management

Collections serve as the primary organizational unit in Chroma, analogous to tables in traditional relational databases or buckets in object storage. Each collection contains a set of vectors along with their associated metadata, documents, and unique identifiers.

The SDK provides a comprehensive collection API that supports creating new collections, retrieving existing ones, listing all collections in the database, and deleting collections when they're no longer needed. Collections can be configured with specific settings at creation time, including the embedding function to use for auto-embedding documents and the name of the collection for identification purposes.

Collections maintain a schema-like structure through their use of metadata. While Chroma is schemaless in the traditional sense, the metadata associated with vectors allows developers to impose structure on their data for filtering and organization purposes.

### Data Model

The data model in Chroma revolves around four core concepts: vectors, documents, metadata, and IDs. Each record in a collection consists of these four components, providing a flexible yet structured way to store and retrieve information.

Vectors are the mathematical representations of data in embedding space. They can be provided directly by the application or generated automatically using embedding functions. The SDK accepts vectors as lists of floating-point numbers, making it compatible with output from virtually any embedding model.

Documents are the original text or content that was transformed into vectors. Storing documents alongside their vectors enables applications to retrieve the original content during query operations without needing to maintain a separate document store.

Metadata provides contextual information about each record. Examples include the source of the document, timestamps, user IDs, or any other application-specific attributes. Metadata can be used for filtering during queries, allowing applications to narrow search results based on specific criteria.

IDs uniquely identify each record within a collection. The SDK accepts string identifiers, giving applications flexibility in how they choose to name and reference their data. Common patterns include using UUIDs, meaningful string identifiers derived from the document content, or sequential numbers.

## Installation and Setup

Installing the Chroma Python Client SDK is straightforward using pip, Python's package manager. The SDK is available in two variants to accommodate different use cases and deployment scenarios.

```bash
pip install chromadb
```

This command installs the full Chroma package, which includes all core functionality plus built-in support for various embedding providers. This variant is recommended for most users who want a complete, self-contained installation.

```bash
pip install chromadb-client
```

This command installs only the HTTP client library, which is useful for scenarios where the Chroma server runs separately or where a minimal dependency footprint is required. This variant connects to Chroma servers via HTTP and doesn't include embedding provider libraries.

## Client Initialization

Initializing the Chroma client depends on the deployment mode and desired configuration. The SDK provides flexible initialization options to accommodate different environments.

### Embedded Mode

In embedded mode, Chroma runs entirely within your Python process, storing data locally. This is ideal for development, testing, and small-scale deployments where a separate server isn't required.

```python
import chromadb

client = chromadb.Client()
```

The embedded client automatically creates a local database directory and manages all data storage internally. Data persists across process restarts, making it suitable for applications that need persistent storage without the complexity of a separate server process.

### Client-Server Mode

In client-server mode, your Python application connects to a Chroma server running separately, either locally or on a remote machine. This architecture supports larger-scale deployments and enables sharing data across multiple client applications.

```python
import chromadb

client = chromadb.HttpClient(
    host="localhost",
    port=8000
)
```

The HTTP client communicates with the server using REST API calls, handling serialization, network transport, and error handling transparently. This mode requires a Chroma server to be running and accessible at the specified host and port.

### Configuration Options

The client supports various configuration options to customize its behavior for specific use cases. These options can be provided during client initialization to control aspects like SSL/TLS settings, authentication, and connection pooling.

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `host` | string | "localhost" | Server hostname or IP address |
| `port` | integer | 8000 | Server port number |
| `ssl` | boolean | false | Enable SSL/TLS encryption |
| `headers` | dict | None | Custom HTTP headers for requests |
| `tenant` | string | None | Tenant identifier for multi-tenant setups |
| `database` | string | None | Database name for organized data storage |

## Collection Operations

Collections are the central organizing structure in Chroma, grouping related vectors, documents, and metadata together. The SDK provides a comprehensive API for creating, managing, and interacting with collections.

### Creating a Collection

Collections are created using the client's `create_collection` method, which accepts a name and optional configuration parameters.

```python
collection = client.create_collection(
    name="my-documents",
    metadata={"description": "Document collection for RAG"},
    get_or_create=True
)
```

The `get_or_create` parameter is particularly useful in production applications, as it prevents errors if a collection with the same name already exists. When set to `True`, the method returns the existing collection if one exists or creates a new one if it doesn't.

### Adding Data

Data is added to collections using the `add` method, which accepts vectors, documents, metadata, and unique identifiers. All parameters must be provided as lists of equal length, with each index representing a single record.

```python
collection.add(
    documents=["This is the first document", "This is the second document"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc-1", "doc-2"],
    embeddings=[[1.2, 2.1, 3.5], [1.1, 2.0, 3.4]]
)
```

The SDK supports automatic embedding generation when embedding functions are configured for the collection. In this case, documents can be provided without explicit embeddings, and the SDK will generate the vector representations automatically.

### Querying Data

Querying is performed using the `query` method, which accepts query text or query vectors and returns the most similar results based on vector similarity.

```python
results = collection.query(
    query_texts=["search terms here"],
    n_results=2,
    where={"source": "notion"},
    include=["documents", "metadatas", "distances"]
)
```

The `where` parameter enables filtering results based on metadata conditions, allowing applications to narrow search results to specific subsets of data. The `include` parameter controls which data components are returned, helping optimize bandwidth and processing for applications that don't need all available information.

Query results include the matched document IDs, the documents themselves, associated metadata, and distance scores indicating how similar each result is to the query. Lower distance scores indicate higher similarity, with zero representing an exact match.

### Updating and Deleting Data

The SDK supports updating existing records and deleting unwanted data from collections. These operations are essential for maintaining data accuracy and managing collection lifecycle.

```python
collection.update(
    ids=["doc-1"],
    documents=["Updated document content"],
    metadatas=[{"source": "notion", "updated": True}]
)

collection.delete(
    ids=["doc-2"],
    where={"source": "google-docs"}
)
```

Update operations modify existing records identified by their IDs, replacing the specified fields while preserving unchanged data. Delete operations remove records matching the provided ID or metadata filters, with the ability to delete multiple records simultaneously.

## Querying and Filtering

Chroma provides powerful querying and filtering capabilities that enable precise retrieval of relevant results. Understanding these capabilities is essential for building effective vector search applications.

### Vector Similarity Search

The core query operation performs vector similarity search, finding the most similar records to a given query vector or text. The SDK handles text queries by first embedding them using the collection's configured embedding function.

Results are ranked by similarity, with the most similar results appearing first. The `n_results` parameter controls how many results are returned, allowing applications to balance result completeness with performance considerations.

### Metadata Filtering

Metadata filtering narrows search results based on document attributes stored alongside vectors. This is particularly useful for applications that need to search within specific subsets of data, such as documents from a particular source or within a date range.

```python
results = collection.query(
    query_texts=["search terms"],
    where={
        "source": "notion",
        "category": {"$in": ["technical", "documentation"]}
    }
)
```

The filter syntax supports various operators including equality, inequality, comparison operators for numeric ranges, and set membership tests. Complex filter expressions can be constructed using logical operators to combine multiple conditions.

### Result Inclusion

The `include` parameter controls which data components are included in query results. This allows applications to optimize their queries by requesting only the data they need.

| Include Option | Description |
|---------------|-------------|
| `embeddings` | Include the full vector for each result |
| `documents` | Include the original document text |
| `metadatas` | Include the associated metadata |
| `distances` | Include similarity distance scores |

By default, only documents and distances are included in results. Applications should specify only the needed components to minimize bandwidth usage and processing overhead.

## Embedding Functions

Embedding functions transform text into vector representations that capture semantic meaning. Chroma supports multiple embedding providers, allowing applications to choose the approach that best fits their requirements.

### Built-in Embeddings

For simple use cases, Chroma includes a default embedding function that works out of the box without additional configuration. This function is suitable for development and testing but may not provide the best quality embeddings for production applications.

### External Providers

For production applications requiring higher quality embeddings, Chroma supports integration with external embedding services. These services provide state-of-the-art embedding models that can significantly improve search quality.

Supported providers include OpenAI's embedding models, which offer excellent quality for English text, and various open-source alternatives. Each provider has its own configuration requirements, typically involving API keys and model selection parameters.

Configuration is typically done at the collection level, allowing different collections to use different embedding functions if needed. This flexibility supports applications that work with multiple data types or require different embedding strategies for different use cases.

### Custom Embedding Functions

For specialized use cases, applications can implement custom embedding functions by conforming to the SDK's embedding function interface. This allows integration with any embedding model or service that can be accessed from Python.

Custom functions receive a list of texts and return a corresponding list of vectors. They can implement any logic needed, including batching, caching, and error handling, giving applications full control over the embedding process.

## Error Handling

The SDK provides comprehensive error handling to help applications gracefully manage failure scenarios. Understanding the error types and how to handle them is important for building robust applications.

### Connection Errors

Connection errors occur when the client cannot establish communication with the Chroma server. These errors can result from network issues, server unavailability, or incorrect server configuration.

```python
try:
    collection = client.get_collection("my-collection")
except chromadb.connection.ChromaConnectionError:
    print("Unable to connect to Chroma server")
```

Applications should implement appropriate retry logic and user-facing error messages when connection errors occur, as these situations typically require intervention beyond the application's control.

### Collection Not Found

Operations on non-existent collections raise specific errors that can be caught and handled appropriately.

```python
try:
    collection = client.get_collection("non-existent")
except chromadb.not_found.NotFound:
    print("Collection does not exist")
```

The `get_or_create` parameter available during collection creation provides an alternative to explicit error handling when the existence of a collection is uncertain.

### Invalid Arguments

Invalid argument errors indicate problems with the data or parameters provided to SDK methods. These errors typically result from bugs in application code or invalid user input.

Examples include malformed IDs, vectors of incorrect dimensions, mismatched list lengths, and invalid filter expressions. The error messages provide guidance on what parameter is problematic, making debugging straightforward.

## Best Practices

Following best practices ensures optimal performance, reliability, and maintainability when using the Python Client SDK in production applications.

### Connection Management

Applications should create a single client instance and reuse it across the application rather than creating new clients for each operation. The client manages connection pooling and state internally, and creating multiple instances can lead to resource waste and inconsistent state.

```python
client = chromadb.HttpClient(host="localhost", port=8000)

def get_collection():
    return client.get_collection("my-documents")
```

For applications that require clean-up, the client should be properly closed when the application terminates, ensuring any pending operations complete and resources are released.

### Batch Operations

When adding or querying large numbers of records, batching operations improves performance by reducing network overhead and allowing the server to optimize processing. The SDK handles batching internally for the most common operations, but applications should be aware of batch size considerations.

### Error Recovery

Production applications should implement comprehensive error handling that distinguishes between recoverable errors (like temporary network issues) and non-recoverable errors (like invalid input). Recoverable errors can be handled with retry logic, while non-recoverable errors should surface appropriate feedback to users.

## Related Documentation

For further information on using Chroma's Python Client SDK, the following resources provide additional context and examples.

The official Chroma documentation at trychroma.com provides comprehensive guides on getting started, deployment options, and advanced usage patterns. The documentation includes tutorials, API reference material, and example applications that demonstrate real-world usage.

The GitHub repository at github.com/chroma-core/chroma contains the complete source code for Chroma, including the Python Client SDK. Developers interested in understanding implementation details or contributing to the project can explore the codebase directly.

The Chroma Discord community provides a forum for asking questions, sharing experiences, and connecting with other developers using Chroma. The community is an excellent resource for troubleshooting issues and discovering best practices from experienced users.

---

<a id='javascript-client-sdk'></a>

## JavaScript/TypeScript Client SDKs

### 相关页面

相关主题：[Python Client SDK](#python-client-sdk), [Getting Started with Chroma](#getting-started)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [clients/js/packages/chromadb-core/src/ChromaClient.ts](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-core/src/ChromaClient.ts)
- [clients/new-js/packages/chromadb/src/chroma-client.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/chroma-client.ts)
- [clients/new-js/packages/chromadb/src/api/sdk.gen.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/api/sdk.gen.ts)
- [clients/js/packages/chromadb-core/src/Collection.ts](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-core/src/Collection.ts)
- [clients/js/packages/chromadb/package.json](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)
- [clients/js/packages/chromadb-client/package.json](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/package.json)
- [clients/new-js/packages/chromadb/package.json](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/package.json)
- [clients/new-js/packages/ai-embeddings/all/package.json](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/all/package.json)
- [clients/js/examples/node/README.md](https://github.com/chroma-core/chroma/blob/main/clients/js/examples/node/README.md)

</details>

# JavaScript/TypeScript Client SDKs

Chroma provides comprehensive JavaScript and TypeScript client libraries for interacting with Chroma servers from browser and Node.js environments. The SDKs offer both low-level HTTP API access and high-level abstractions for collections, embedding functions, and query operations.

## Architecture Overview

Chroma maintains two generations of JavaScript clients to support different use cases and ecosystem requirements.

```mermaid
graph TD
    A[Chroma Server] <--> B[HTTP API];
    B <--> C[Legacy JS Client v2.4.7];
    B <--> D[new-js Client v3.4.5];
    C --> E[chromadb<br/>Bundled];
    C --> F[chromadb-client<br/>Peer Dependencies];
    D --> G[ChromaClient];
    D --> H[Embedding Functions<br/>via @chroma-core/*];
```

### Client Package Versions

| Package | Version | Type | Description |
|---------|---------|------|-------------|
| `chromadb` (legacy) | 2.4.7 | npm | Bundled package with all embedding libraries included |
| `chromadb-client` (legacy) | 2.4.7 | npm | Client package requiring peer dependencies |
| `chromadb` (new-js) | 3.4.5 | npm | Modern client with modular architecture |
| `@internal/chromadb-core` | 2.4.7 | workspace | Shared core functionality |

资料来源：[clients/js/packages/chromadb/package.json:3](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)  
资料来源：[clients/new-js/packages/chromadb/package.json:3](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/package.json)

## Package Structure

### Legacy Client (v2.x)

The legacy client provides two distribution options:

```mermaid
graph LR
    A[chromadb] --> B[chromadb-core<br/>+ All Embeddings];
    C[chromadb-client] --> D[chromadb-core<br/>+ Peer Dependencies];
    B --> E[@google/generative-ai];
    B --> F[@xenova/transformers];
    B --> G[cohere-ai];
    D --> E;
    D --> F;
    D --> G;
```

| Package | Use Case | Embedding Libraries |
|---------|----------|---------------------|
| `chromadb` | Simple projects wanting everything included | Bundled with all providers |
| `chromadb-client` | Projects needing specific embedding libraries | Peer dependencies required |

资料来源：[clients/js/packages/chromadb-client/package.json:1-55](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/package.json)

### New-JS Client (v3.x)

The new JavaScript client uses a modular workspace architecture with the following structure:

```
clients/new-js/
├── packages/
│   ├── chromadb/                    # Core client package
│   │   └── src/
│   │       ├── chroma-client.ts     # Main client implementation
│   │       └── api/
│   │           └── sdk.gen.ts       # Generated API client
│   └── ai-embeddings/
│       ├── common/                  # Shared utilities
│       ├── all/                     # Aggregated providers
│       ├── chroma-bm25/             # BM25 sparse embeddings
│       ├── cohere/                  # Cohere provider
│       ├── google-gemini/           # Google Gemini provider
│       ├── huggingface-server/      # HuggingFace server
│       ├── jina/                    # Jina AI provider
│       ├── together-ai/             # Together AI provider
│       └── voyageai/                # Voyage AI provider
```

资料来源：[clients/new-js/packages/ai-embeddings/all/package.json:1-45](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/all/package.json)

## Module Exports Configuration

Both client generations support modern JavaScript module resolution with ESM and CommonJS exports.

### Export Structure

```mermaid
graph TD
    A[Package Entry] --> B{Import Type};
    B -->|ESM import| C[.mjs / .d.ts];
    B -->|CommonJS require| D[.cjs / .d.cts];
    C --> E[dist/*.mjs];
    D --> F[dist/cjs/*.cjs];
```

| Export Condition | Entry Point | Type Definitions |
|-------------------|-------------|------------------|
| ESM `import` | `dist/chromadb.mjs` | `dist/chromadb.d.ts` |
| CommonJS `require` | `dist/cjs/chromadb.cjs` | `dist/cjs/chromadb.d.cts` |

资料来源：[clients/js/packages/chromadb/package.json:12-25](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)  
资料来源：[clients/new-js/packages/chromadb/package.json:12-25](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/package.json)

## Client Initialization

### Basic Connection

```typescript
import { ChromaClient } from "chromadb";

// Initialize the client
const chroma = new ChromaClient({ 
  path: "http://localhost:8000" 
});
```

资料来源：[clients/js/packages/chromadb-client/README.md:15-20](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/README.md)

### With Embedding Function

```typescript
import { ChromaClient } from 'chromadb';
import { TogetherAIEmbeddingFunction } from '@chroma-core/together-ai';

const embedder = new TogetherAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'togethercomputer/m2-bert-80M-8k-retrieval',
});

const client = new ChromaClient({
  path: 'http://localhost:8000',
});
```

资料来源：[clients/new-js/packages/ai-embeddings/together-ai/README.md:1-35](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/together-ai/README.md)

## Collection Operations

Collections are the primary data structure for storing and querying embeddings.

### Create Collection

```typescript
const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: embedder,  // Optional
  metadata: {                    // Optional
    description: "My document collection"
  }
});
```

### Add Documents

```typescript
await collection.add({
  ids: ["id1", "id2"],
  embeddings: [                  // Optional if embedding function provided
    [1.1, 2.3, 3.2],
    [4.5, 6.9, 4.4],
  ],
  metadatas: [{ source: "doc1" }, { source: "doc2" }],
  documents: ["Document 1 content", "Document 2 content"],
});
```

### Query Collection

```typescript
const results = await collection.query({
  queryEmbeddings: [1.1, 2.3, 3.2],    // Or queryTexts with embedding function
  queryTexts: ["Sample query"],          // Text query (uses embedding function)
  nResults: 2,                           // Number of results
  where: { source: "doc1" },             // Optional metadata filter
  include: ["documents", "metadatas", "distances"]
});
```

资料来源：[clients/js/packages/chromadb-client/README.md:25-50](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/README.md)

## Embedding Function Providers

The new-js client provides first-class support for multiple embedding providers through the `@chroma-core/*` packages.

### Available Providers

| Provider Package | Model Examples | API Required |
|------------------|----------------|--------------|
| `@chroma-core/together-ai` | `togethercomputer/m2-bert-80M-8k-retrieval` | Yes |
| `@chroma-core/voyageai` | `voyage-2` | Yes |
| `@chroma-core/google-gemini` | `text-embedding-004` | Yes |
| `@chroma-core/jina` | `jina-embeddings-v2-base-en` | Yes |
| `@chroma-core/cohere` | Various Cohere models | Yes |
| `@chroma-core/chroma-bm25` | N/A (local algorithm) | No |
| `@chroma-core/all` | All providers bundled | Varies |

资料来源：[clients/new-js/packages/ai-embeddings/together-ai/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/together-ai/README.md)  
资料来源：[clients/new-js/packages/ai-embeddings/voyageai/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/voyageai/README.md)

### Configuration Options

Each embedding function supports common configuration patterns:

```typescript
const embedder = new SomeEmbeddingFunction({
  apiKey: 'your-api-key',          // Or set via environment variable
  apiKeyEnvVar: 'PROVIDER_API_KEY', // Default env var name
  modelName: 'provider-model-name', // Provider-specific model
  // Provider-specific options
  task: 'retrieval.passage',       // Jina example
  dimensions: 768,                  // Jina example
  truncate: true,                   // Jina example
  normalized: true,                 // Jina example
});
```

### Environment Variable Configuration

| Provider | Environment Variable |
|----------|---------------------|
| Together AI | `TOGETHER_API_KEY` |
| Voyage AI | `VOYAGE_API_KEY` |
| Google Gemini | `GEMINI_API_KEY` |
| Jina | `JINA_API_KEY` |

资料来源：[clients/new-js/packages/ai-embeddings/jina/README.md:1-45](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/jina/README.md)

## Rust Native Bindings

For performance-critical applications, Chroma provides pre-built Rust native bindings for Node.js.

### Supported Platforms

| Package Name | OS | Architecture | LibC |
|--------------|-----|--------------|------|
| `chromadb-js-bindings-darwin-x64` | macOS (Intel) | x64 | N/A |
| `chromadb-js-bindings-darwin-arm64` | macOS (Apple Silicon) | arm64 | N/A |
| `chromadb-js-bindings-linux-x64-gnu` | Linux | x64 | glibc |
| `chromadb-js-bindings-linux-arm64-gnu` | Linux | arm64 | glibc |

All bindings versions: **1.3.4**  
Minimum Node.js version: **>= 10**

资料来源：[rust/js_bindings/npm/darwin-x64/package.json:1-18](https://github.com/chroma-core/chroma/blob/main/rust/js_bindings/npm/darwin-x64/package.json)  
资料来源：[rust/js_bindings/npm/linux-x64-gnu/package.json:1-18](https://github.com/chroma-core/chroma/blob/main/rust/js_bindings/npm/linux-x64-gnu/package.json)

## Build and Development

### Build Scripts

| Command | Description |
|---------|-------------|
| `pnpm build` | Build all packages |
| `pnpm build:core` | Build only `@internal/chromadb-core` |
| `pnpm build:packages` | Build all packages except core |
| `pnpm watch` | Watch mode for development |
| `pnpm test` | Run all tests |
| `pnpm test:functional` | Run functional tests (excluding auth) |

### New-JS Client Build Configuration

```json
{
  "scripts": {
    "build": "tsup",
    "watch": "tsup --watch",
    "typecheck": "tsc --noEmit"
  }
}
```

Build tooling uses **tsup** for efficient bundling with TypeScript support.

资料来源：[clients/new-js/packages/ai-embeddings/common/package.json:18-25](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/package.json)  
资料来源：[clients/js/package.json:22-30](https://github.com/chroma-core/chroma/blob/main/clients/js/package.json)

## Choosing a Client Package

```mermaid
graph TD
    A[Start] --> B{Do you need all embedding providers?};
    B -->|Yes, convenience| C[chromadb v2.4.7<br/>or @chroma-core/all + chromadb v3.4.5];
    B -->|No, want to minimize bundle| D{Do you have embedding requirements?};
    D -->|Yes, specific providers| E[chromadb-client v2.4.7<br/>with peer dependencies];
    D -->|No, just vector storage| F[chromadb-client v2.4.7<br/>or chromadb v3.4.5];
    C --> G[Include all embedding libraries];
    E --> H[Only install needed providers];
    F --> I[No embedding function needed];
```

### Decision Matrix

| Requirement | Recommended Package |
|-------------|--------------------|
| Simple setup, all features | `chromadb` (bundled) |
| Minimal bundle size | `chromadb-client` with peer deps |
| Modern architecture | `chromadb` (new-js v3.4.5) |
| BM25 sparse embeddings | `@chroma-core/chroma-bm25` |
| Cloud/Remote providers | `@chroma-core/*` packages |

资料来源：[clients/js/examples/node/README.md:1-45](https://github.com/chroma-core/chroma/blob/main/clients/js/examples/node/README.md)

## TypeScript Support

All JavaScript client packages include full TypeScript type definitions:

```json
{
  "types": "dist/chromadb.d.ts",
  "exports": {
    ".": {
      "import": {
        "types": "./dist/chromadb.d.ts"
      },
      "require": {
        "types": "./dist/cjs/chromadb.d.cts"
      }
    }
  }
}
```

The TypeScript minimum version requirement is **^5.0.4** for the legacy client and **^5.3.3** for new-js packages.

资料来源：[clients/js/packages/chromadb/package.json:8](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)  
资料来源：[clients/new-js/packages/ai-embeddings/common/package.json:30](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/package.json)

## Dependencies

### Core Dependencies

| Package | Version | Purpose |
|---------|---------|---------|
| `isomorphic-fetch` | ^3.0.0 | HTTP client for browser/Node.js |
| `ajv` | ^8.12.0 / ^8.17.1 | JSON schema validation |
| `cliui` | ^8.0.1 | CLI utilities |

### Node.js Compatibility

| Package Generation | Minimum Node.js |
|--------------------|-----------------|
| Legacy (v2.x) | >= 14.17.0 |
| New-JS (v3.x) | >= 20 |
| Rust Bindings | >= 10 |

资料来源：[clients/js/packages/chromadb-client/package.json:50-55](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb-client/package.json)  
资料来源：[clients/new-js/packages/ai-embeddings/common/package.json:35-38](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/package.json)

---

<a id='rust-services-architecture'></a>

## Rust Backend Services Architecture

### 相关页面

相关主题：[System Architecture Overview](#architecture-overview), [Data Storage & Blockstore](#data-storage-blockstore)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [rust/blockstore/src/arrow/root.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/root.rs)
- [rust/blockstore/src/arrow/block/types.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/block/types.rs)
- [rust/blockstore/src/arrow/provider.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/provider.rs)
- [rust/types/src/execution/operator.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/execution/operator.rs)
- [rust/types/src/api_types.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/api_types.rs)
- [rust/types/src/topology.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/topology.rs)
- [rust/types/src/collection_schema.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/collection_schema.rs)
- [rust/types/src/sparse_posting_block.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/sparse_posting_block.rs)
- [rust/index/src/spann/types.rs](https://github.com/chroma-core/chroma/blob/main/rust/index/src/spann/types.rs)
- [rust/worker/src/work_queue/work_queue_client.rs](https://github.com/chroma-core/chroma/blob/main/rust/worker/src/work_queue/work_queue_client.rs)
- [rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs](https://github.com/chroma-core/chroma/blob/main/rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs)
- [rust/worker/src/execution/orchestration/knn_filter.rs](https://github.com/chroma-core/chroma/blob/main/rust/worker/src/execution/orchestration/knn_filter.rs)
</details>

# Rust Backend Services Architecture

## Overview

The Chroma Rust backend provides a high-performance, scalable vector database service built entirely in Rust. The architecture follows a distributed systems design with multiple specialized services working together to handle embedding storage, indexing, and similarity search operations.

### Design Goals

| Goal | Description |
|------|-------------|
| High Performance | Arrow-based columnar storage for efficient data access |
| Scalability | Multi-cloud, multi-region deployment support |
| Reliability | Comprehensive error handling with typed error codes |
| Flexibility | Multiple index types (HNSW, Spann, Inverted) |
| Consistency | Ordered and unordered mutation ordering options |

### Core Service Components

```mermaid
graph TD
    subgraph "Rust Backend Services"
        W[Worker Service]
        BS[Blockstore Service]
        SYS[Sysdb Service]
        LOG[Log Service]
    end
    
    W --> BS
    W --> SYS
    W --> LOG
```

## Blockstore Architecture

The blockstore is the core storage layer in Chroma's Rust backend, providing persistent storage for vector embeddings and associated metadata using Arrow columnar format.

### Arrow-Based Storage

Chroma uses Apache Arrow as its primary storage format, which provides:

- **Columnar Layout**: Efficient analytic queries by column
- **Zero-Copy Reads**: Memory-mapped access patterns
- **Cross-Language Interop**: Standardized binary format
- **Compression Support**: Built-in encoding/decoding

资料来源：[rust/blockstore/src/arrow/root.rs:1-40]()

### Blockfile Structure

```mermaid
graph TD
    subgraph "Blockfile Components"
        BF[Blockfile]
        BR[Block Reader]
        BW[Block Writer]
        RM[Root Manager]
        BM[Block Manager]
    end
    
    BF --> BR
    BF --> BW
    BW --> RM
    BR --> BM
```

#### Root Management

The `Root` component manages the root directory structure and file operations:

```rust
pub(super) fn get_all_block_ids_from_bytes(
    bytes: &[u8],
    id: Uuid,
) -> Result<Vec<Uuid>, FromBytesError>
```

Key responsibilities:
- Reading Arrow IPC files
- Extracting block metadata and IDs
- Version validation and verification

资料来源：[rust/blockstore/src/arrow/root.rs:28-50]()

#### Block Layout Verification

The block layout verification ensures data integrity:

```rust
#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}
```

资料来源：[rust/blockstore/src/arrow/block/types.rs:1-30]()

| Error Type | Error Code | Severity |
|------------|------------|----------|
| `BufferLengthNotAligned` | Internal | High |
| `NoRecordBatches` | Internal | High |
| `MultipleRecordBatches` | Internal | Medium |
| `InvalidMessageType` | Internal | High |
| `RecordBatchDecodeError` | Internal | High |

### Blockfile Writer Types

Chroma supports two mutation ordering strategies:

| Ordering Type | Description | Use Case |
|--------------|-------------|----------|
| `Ordered` | Sequential writes with guaranteed order | Consistent state |
| `Unordered` | Parallel writes for throughput | High-volume ingestion |

资料来源：[rust/blockstore/src/arrow/provider.rs:1-50]()

```rust
match options.mutation_ordering {
    BlockfileWriterMutationOrdering::Ordered => {
        let file = ArrowOrderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowOrderedBlockfileWriter(file))
    }
    BlockfileWriterMutationOrdering::Unordered => {
        let file = ArrowUnorderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowUnorderedBlockfileWriter(file))
    }
}
```

### Forking and Versioning

Blockfiles support forking for snapshot isolation:

```rust
let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await
```

资料来源：[rust/blockstore/src/arrow/provider.rs:1-30]()

## Type System

### Query Result Types

The execution layer uses a rich type system for search results:

```rust
#[derive(Clone, Debug, Default)]
pub struct SearchPayloadResult {
    pub records: Vec<SearchRecord>,
}
```

资料来源：[rust/types/src/execution/operator.rs:1-20]()

#### Search Results Structure

```mermaid
graph LR
    SR[SearchResult] --> SPR[SearchPayloadResult]
    SPR --> SR_vec[Vec<SearchRecord>]
    SR --> PLB[pulled_log_bytes]
```

| Field | Type | Description |
|-------|------|-------------|
| `results` | `Vec<SearchPayloadResult>` | Per-query search results |
| `pulled_log_bytes` | `u64` | Total log bytes fetched for metrics |

### Include Enum

The `Include` enum controls which fields are returned in query results:

```rust
pub enum Include {
    #[serde(rename = "distances")]
    Distance,
    #[serde(rename = "documents")]
    Document,
    #[serde(rename = "embeddings")]
    Embedding,
    #[serde(rename = "metadatas")]
    Metadata,
    #[serde(rename = "uris")]
    Uri,
}
```

资料来源：[rust/types/src/api_types.rs:1-30]()

| Include Value | Returned Field | Default Query |
|---------------|----------------|---------------|
| `distances` | Distance scores | ✓ |
| `documents` | Text content | ✓ |
| `embeddings` | Vector data | ✗ |
| `metadatas` | Metadata objects | ✓ |
| `uris` | Resource URIs | ✗ |

#### IncludeList Helper Methods

```rust
impl IncludeList {
    pub fn empty() -> Self { Self(Vec::new()) }
    
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    
    pub fn default_get() -> Self {
        Self(vec![Include::Document, Include::Metadata])
    }
    
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, 
                  Include::Embedding, Include::Uri])
    }
}
```

资料来源：[rust/types/src/api_types.rs:1-60]()

### Key Filter System

The `Key` enum represents filterable fields in metadata queries:

```rust
pub enum Key {
    Document,
    Embedding,
    Metadata,
    Score,
    MetadataField(String),
}
```

资料来源：[rust/types/src/operator.rs:1-30]()

| Key | Purpose | Example |
|-----|---------|---------|
| `#document` | Document content | `Key::Document` |
| `#embedding` | Vector data | `Key::Embedding` |
| `#metadata` | All metadata | `Key::Metadata` |
| `#score` | Similarity score | `Key::Score` |
| `field_name` | Custom metadata | `Key::MetadataField("status")` |

#### Key Factory Methods

```rust
impl Key {
    /// Creates a Key for a custom metadata field
    pub fn field(name: impl Into<String>) -> Self {
        Key::MetadataField(name.into())
    }
    
    /// Creates an equality filter: `field == value`
    pub fn eq(self, value: impl Into<MetadataValue>) -> ComparisonValue { ... }
}
```

## Index Architecture

### Spann Index

Spann is Chroma's sparse vector index implementation combining HNSW with posting lists:

```rust
#[derive(Clone, Debug)]
pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
    pub adaptive_search_nprobe: bool,
    pub params: InternalSpannConfiguration,
}
```

资料来源：[rust/index/src/spann/types.rs:1-30]()

#### Spann Index Structure

```mermaid
graph TD
    subgraph "Spann Index"
        SPI[SpannIndexReader]
        HNSW[HNSW Index]
        PL[Posting Lists]
        VM[Versions Map]
    end
    
    SPI --> HNSW
    SPI --> PL
    SPI --> VM
```

| Component | Type | Purpose |
|-----------|------|---------|
| `hnsw_index` | `HnswIndexRef` | Approximate nearest neighbor search |
| `posting_lists` | `BlockfileReader<u32, SpannPostingList>` | Document postings |
| `versions_map` | `BlockfileReader<u32, u32>` | Document versioning |
| `adaptive_search_nprobe` | `bool` | Adaptive parameter tuning |

### Sparse Posting Block

The sparse posting block implements an inverted index structure:

```rust
#[derive(Debug, Clone)]
pub struct DirectoryBlock(SparsePostingBlock);

impl DirectoryBlock {
    pub fn new(max_offsets: &[u32], max_weights: &[f32]) 
        -> Result<Self, SparsePostingBlockError>
}
```

资料来源：[rust/types/src/sparse_posting_block.rs:1-40]()

| Field | Type | Description |
|-------|------|-------------|
| `max_offset` | `u32` | Largest doc offset in posting block |
| `max_weight` | `f32` | Maximum weight for term pruning |

## Schema and Index Configuration

### Collection Schema

The schema system supports multiple index types:

```rust
pub struct Schema {
    pub fn create_index(
        mut self,
        key: Option<&str>,
        config: IndexConfig,
    ) -> Result<Self, SchemaBuilderError>
}
```

资料来源：[rust/types/src/collection_schema.rs:1-50]()

| Index Type | Key | Description |
|------------|-----|-------------|
| `VectorIndexConfig` | `None` | Global vector index (HNSW/Spann) |
| `StringInvertedIndexConfig` | `Some(field)` | Field-specific FTS |
| `SparseVectorIndexConfig` | `Some(field)` | Sparse vector index |

### Index Configuration

```rust
pub struct VectorIndexConfig {
    pub space: Option<Space>,
    pub embedding_function: Option<EmbeddingFunctionId>,
    pub source_key: Option<Key>,
    pub hnsw: Option<HnswConfig>,
    pub spann: Option<SpannConfig>,
}
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `space` | `Option<Space>` | `None` | Vector space (Cosine, L2, etc.) |
| `embedding_function` | `Option<EFId>` | `None` | Embedding function ID |
| `hnsw` | `Option<HnswConfig>` | `None` | HNSW parameters |
| `spann` | `Option<SpannConfig>` | `None` | Spann parameters |

## Worker Service Architecture

### Work Queue Client

The work queue client manages distributed task execution:

```rust
pub enum WorkQueueClientError {
    ConnectionError(#[from] tonic::Status),
    RequestError(#[from] tonic::Status),
}
```

资料来源：[rust/worker/src/work_queue/work_queue_client.rs:1-20]()

#### Error Code Mapping

| gRPC Code | Chroma Error Code |
|-----------|-------------------|
| `Unavailable` | `Unavailable` |
| `DeadlineExceeded` | `DeadlineExceeded` |
| `ResourceExhausted` | `ResourceExhausted` |
| `InvalidArgument` | `InvalidArgument` |
| `NotFound` | `NotFound` |
| `PermissionDenied` | `PermissionDenied` |

### Apply Logs Orchestrator

The apply logs orchestrator handles log-based data synchronization:

```rust
#[derive(Debug)]
pub struct ApplyLogsOrchestratorResponse {
    pub job_id: JobId,
    pub total_records_post_compaction: u64,
    pub flush_results: Vec<SegmentFlushInfo>,
    pub collection_logical_size_bytes: u64,
}
```

资料来源：[rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-50]()

### KNN Filter Architecture

The KNN filter orchestrates vector similarity search:

```mermaid
graph TD
    subgraph "KNN Query Pipeline"
        Q[Query Request]
        F[Filter Logs]
        K[KNN Search]
        R[Results]
    end
    
    Q --> F
    F --> K
    K --> R
```

#### KNN Error Handling

```rust
pub enum KnnError {
    QuantizedSpannCenterSearch(QuantizedSpannError),
    QuantizedSpannLoadCenter(QuantizedSpannError),
    InvalidDistanceFunction,
    Aborted,
    InvalidSchema(#[from] SchemaError),
}
```

资料来源：[rust/worker/src/execution/orchestration/knn_filter.rs:1-40]()

| Error Type | Error Code | Traced |
|-----------|------------|--------|
| `QuantizedSpannCenterSearch` | From inner | ✓ |
| `InvalidDistanceFunction` | `InvalidArgument` | ✗ |
| `Aborted` | `ResourceExhausted` | ✗ |
| `Result(_)` | `Internal` | ✓ |

### KNN Filter Output

```rust
#[derive(Clone, Debug)]
pub struct KnnFilterOutput {
    pub logs: FetchLogOutput,
    pub fetch_log_bytes: u64,
    pub filter_output: FilterOutput,
    pub dimension: usize,
    pub distance_function: DistanceFunction,
}
```

## Multi-Cloud Topology

Chroma supports multi-cloud and multi-region deployments:

```rust
pub struct ProviderRegion<T: Clone + Debug> {
    pub name: RegionName,
    pub provider: String,
    pub region: String,
    pub config: T,
}
```

资料来源：[rust/types/src/topology.rs:1-30]()

### Topology Structure

```mermaid
graph TD
    subgraph "Multi-Cloud Topology"
        Config[Configuration]
        Topologies[Vec<Topology>]
        Regions[Vec<ProviderRegion>]
        Preferred[Preferred Region]
    end
    
    Config --> Topologies
    Config --> Regions
    Config --> Preferred
```

### Configuration Schema

```rust
struct RawMultiCloudMultiRegionConfiguration<R, T> {
    preferred: RegionName,
    regions: Vec<ProviderRegion<R>>,
    topologies: Vec<Topology<T>>,
}
```

| Field | Type | Description |
|-------|------|-------------|
| `preferred` | `RegionName` | Default region for operations |
| `regions` | `Vec<ProviderRegion>` | Available cloud regions |
| `topologies` | `Vec<Topology>` | Topology configurations |

## Error Handling Framework

### Chroma Error Traits

All errors implement the `ChromaError` trait:

```rust
pub trait ChromaError: std::error::Error {
    fn code(&self) -> ErrorCodes;
    fn should_trace_error(&self) -> bool;
}
```

### Error Code Registry

| Code | Category | Description |
|------|----------|-------------|
| `InvalidArgument` | Client | Malformed request |
| `NotFound` | Client | Resource missing |
| `AlreadyExists` | Client | Duplicate resource |
| `PermissionDenied` | Security | Access denied |
| `ResourceExhausted` | Rate | Quota exceeded |
| `Internal` | Server | System error |

## CLI Integration

The Rust CLI provides management commands:

```rust
pub enum Command {
    Browse(BrowseArgs),
    Copy(CopyArgs),
    Db(DbSubcommand),
    Docs,
    Install(InstallArgs),
    Login(LoginArgs),
    Profile(ProfileSubcommand),
    Run(RunArgs),
    Support,
    Update,
    Vacuum(VacuumArgs),
}
```

资料来源：[rust/cli/src/lib.rs:1-30]()

### Available Commands

| Command | Description |
|---------|-------------|
| `browse` | Open web interface |
| `copy` | Copy data between collections |
| `db` | Database operations |
| `docs` | Open documentation |
| `install` | Install Chroma |
| `login` | Authenticate user |
| `profile` | Performance profiling |
| `run` | Start Chroma server |
| `support` | Open support resources |
| `update` | Update installation |
| `vacuum` | Compact storage |

## See Also

- [Blockstore Provider Configuration](rust/blockstore/src/arrow/provider.rs)
- [Query API Types](rust/types/src/api_types.rs)
- [Index Implementations](rust/index/src/spann/types.rs)
- [Worker Execution](rust/worker/src/execution/orchestration/)

---

<a id='go-coordinator'></a>

## Go Coordinator & Distributed Systems

### 相关页面

相关主题：[System Architecture Overview](#architecture-overview)

I cannot generate this wiki page as specified. The `<details>` block lists Go source files, but the retrieved repository context contains only Rust implementation files - there are no Go coordinator or distributed systems files in the provided context.

The context includes:
- `rust/worker/src/execution/orchestration/` - Orchestrator implementations (ApplyLogsOrchestrator, LogFetchOrchestrator, RegisterOrchestrator)
- `rust/worker/src/compactor/scheduler.rs` - Compaction scheduler
- `rust/types/src/topology.rs` - Topology and region management
- `rust/blockstore/src/arrow/` - Arrow block storage
- `rust/worker/src/work_queue/work_queue_client.rs` - Work queue client

To generate a wiki page about **Go Coordinator & Distributed Systems**, I would need access to:
- `go/pkg/sysdb/coordinator/coordinator.go`
- `go/pkg/memberlist_manager/memberlist_manager.go`
- `go/pkg/leader/election.go`
- `go/cmd/coordinator/main.go`

Would you like me to:
1. Generate a wiki page about the Rust-based distributed systems components that ARE present in the context (orchestrators, scheduler, topology)?
2. Or provide guidance on how to retrieve the missing Go source files from the repository?

---

<a id='data-storage-blockstore'></a>

## Data Storage & Blockstore

### 相关页面

相关主题：[Rust Backend Services Architecture](#rust-services-architecture), [Embedding Functions Integration](#embedding-functions)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [rust/blockstore/src/arrow/blockfile.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/blockfile.rs)
- [rust/blockstore/src/arrow/provider.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/provider.rs)
- [rust/blockstore/src/types/reader.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/types/reader.rs)
- [rust/blockstore/src/types/writer.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/types/writer.rs)
- [rust/blockstore/src/provider.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)
- [rust/blockstore/src/arrow/block/types.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/block/types.rs)
- [rust/blockstore/src/arrow/root.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/root.rs)
- [rust/blockstore/src/memory/provider.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/memory/provider.rs)
- [rust/blockstore/src/arrow/ordered_blockfile_writer.rs](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/ordered_blockfile_writer.rs)
</details>

# Data Storage & Blockstore

## Overview

The Chroma blockstore is the core storage subsystem responsible for persisting vector embeddings, metadata, and related data structures. It provides a unified abstraction layer over different storage backends (in-memory and Arrow-based) while maintaining performance characteristics suitable for high-throughput vector database operations.

The blockstore system is architected around the concept of **blockfiles** — persistent, columnar storage structures that organize data by prefix-based partitioning and support efficient key-value operations.

## Architecture

```mermaid
graph TD
    subgraph "Public API Layer"
        BP[BlockfileProvider]
        BR[BlockfileReader]
        BW[BlockfileWriter]
        BF[BlockfileFlusher]
    end

    subgraph "Implementation Layer"
        ABP[ArrowBlockfileProvider]
        MBP[MemoryBlockfileProvider]
        ABF[ArrowUnorderedBlockfileWriter]
        ABO[ArrowOrderedBlockfileWriter]
    end

    subgraph "Storage Layer"
        BM[BlockManager]
        RM[RootManager]
        ST[Storage]
    end

    subgraph "Arrow Format"
        R[Root]
        SB[Sparse Index]
        B[Blocks]
    end

    BP --> ABP
    BP --> MBP
    BR --> ABP
    BR --> MBP
    BW --> ABF
    BW --> ABO

    ABP --> BM
    ABP --> RM
    ABF --> BM
    ABF --> RM
    ABO --> BM
    ABO --> RM
    BM --> ST
    RM --> ST

    RM --> R
    R --> SB
    R --> B
```

## Core Components

### BlockfileProvider

The `BlockfileProvider` is the main entry point for creating readers and writers. It abstracts the underlying storage implementation and provides factory methods for blockfile operations.

**Variants:**

| Provider Type | Description | Use Case |
|---------------|-------------|----------|
| `HashMapBlockfileProvider` | In-memory blockfile storage | Testing, ephemeral data |
| `ArrowBlockfileProvider` | Persistent Arrow-based storage | Production workloads |

**API Methods:**

```rust
pub fn storage(&self) -> Option<Arc<Storage>> {
    match self {
        BlockfileProvider::ArrowBlockfileProvider(provider) => Some(provider.storage().clone()),
        BlockfileProvider::HashMapBlockfileProvider(_) => None,
    }
}

pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}
```

资料来源：[rust/blockstore/src/provider.rs:1-30](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)

### BlockfileReader

The `BlockfileReader` trait provides read access to stored data. It supports generic key and value types that implement the `ReadKey` and `ReadValue` traits.

**Trait Definition:**

```rust
pub trait ReadKey<'a>:
    Key
    + Into<KeyWrapper>
    + TryFrom<&'a KeyWrapper, Error = InvalidKeyConversion>
    + ArrowReadableKey<'a>
    + Sync
    + 'a
{}

pub trait ReadValue<'a>: Value + Readable<'a> + ArrowReadableValue<'a> + Sync + 'a {}
```

资料来源：[rust/blockstore/src/provider.rs:40-55](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)

### BlockfileWriter

The `BlockfileWriter` trait provides write access to blockfiles with support for ordered and unordered mutation patterns.

**Core Operations:**

| Method | Signature | Description |
|--------|-----------|-------------|
| `set` | `set(prefix, key, value)` | Insert or update a key-value pair |
| `delete` | `delete(prefix, key)` | Remove a key-value pair |
| `commit` | `commit()` | Finalize and persist the writer |

```rust
pub async fn set<
    K: Key + Into<KeyWrapper> + ArrowWriteableKey,
    V: Value + Writeable + ArrowWriteableValue,
>(
    &self,
    prefix: &str,
    key: K,
    value: V,
) -> Result<(), Box<dyn ChromaError>>
```

资料来源：[rust/blockstore/src/types/writer.rs:50-75](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/types/writer.rs)

## Arrow Blockfile Implementation

The Arrow-based blockfile is the primary production storage implementation, providing efficient columnar storage with Arrow IPC format.

### Blockfile Structure

```mermaid
graph TD
    R[Root File<br/>Root Writer] --> SB[Sparse Index<br/>Block Key Mapping]
    R --> BH[Block Header<br/>Metadata]
    
    SB --> B1[Block 1<br/>Arrow IPC]
    SB --> B2[Block 2<br/>Arrow IPC]
    SB --> BN[Block N<br/>Arrow IPC]
    
    B1 --> P1[Prefix: "vec_1"]
    B1 --> P2[Prefix: "vec_2"]
```

### ArrowBlockfileProvider

The `ArrowBlockfileProvider` manages the lifecycle of blockfiles using Arrow IPC format with a root-sparse index architecture.

**Key Features:**

- **Fork Support**: Create new blockfiles from existing ones via forking
- **CMEK Support**: Optional Customer-Managed Encryption Keys
- **Block Size Management**: Configurable maximum block sizes

```rust
pub async fn write<K: Key + ArrowWriteableKey, V: ArrowWriteableValue>(
    &self,
    options: BlockfileWriterOptions,
) -> Result<BlockfileWriter, Box<CreateError>>
```

资料来源：[rust/blockstore/src/arrow/provider.rs:1-50](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/provider.rs)

### Writer Types

#### ArrowUnorderedBlockfileWriter

Provides high-performance unordered writes optimized for bulk insertion scenarios.

```rust
impl ArrowUnorderedBlockfileWriter {
    pub(super) fn new<K: ArrowWriteableKey, V: ArrowWriteableValue>(
        id: Uuid,
        prefix_path: &str,
        block_manager: BlockManager,
        root_manager: RootManager,
        max_block_size_bytes: usize,
        cmek: Option<Cmek>,
    ) -> Self
}
```

资料来源：[rust/blockstore/src/arrow/blockfile.rs:50-80](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/blockfile.rs)

#### ArrowOrderedBlockfileWriter

Maintains key ordering within blocks, optimized for range queries and ordered iteration.

资料来源：[rust/blockstore/src/arrow/ordered_blockfile_writer.rs:1-50](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/ordered_blockfile_writer.rs)

### BlockManager and RootManager

| Component | Responsibility |
|-----------|----------------|
| `BlockManager` | Manages individual data blocks, handles block creation and commitment |
| `RootManager` | Manages root files containing sparse indices and metadata |

```rust
// Forking a new root from an existing one
let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await
```

资料来源：[rust/blockstore/src/arrow/provider.rs:45-70](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/provider.rs)

## Error Handling

### Error Types

| Error Type | Description | Error Code |
|------------|-------------|------------|
| `BlockNotFound` | Requested block does not exist | Internal |
| `BlockFetchError` | Failed to retrieve block from storage | Internal |
| `MigrationError` | Blockfile migration failed | Internal |
| `IOError` | Storage I/O operation failed | Internal |
| `ArrowError` | Arrow IPC parsing/encoding error | Internal |
| `NoRecordBatches` | Invalid Arrow file structure | Internal |

```rust
#[derive(Error, Debug)]
pub enum ArrowBlockfileError {
    #[error("Block not found")]
    BlockNotFound,
    #[error("Could not fetch block")]
    BlockFetchError(#[from] GetError),
    #[error("Could not migrate blockfile to new version")]
    MigrationError(#[from] MigrationError),
}
```

资料来源：[rust/blockstore/src/arrow/blockfile.rs:25-40](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/blockfile.rs)

### Layout Verification

The system validates Arrow file layouts to ensure data integrity:

```rust
#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}
```

资料来源：[rust/blockstore/src/arrow/block/types.rs:40-60](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/block/types.rs)

## Storage Operations

### Write Flow

```mermaid
sequenceDiagram
    participant Client
    participant Provider as BlockfileProvider
    participant Writer as BlockfileWriter
    participant BM as BlockManager
    participant RM as RootManager
    participant Storage

    Client->>Provider: write(options)
    Provider->>Writer: create_writer()
    Provider->>RM: create/fork_root()
    Client->>Writer: set(prefix, key, value)
    Writer->>BM: create_block()
    loop Until flush
        Writer->>Writer: accumulate_data()
    end
    Client->>Writer: commit()
    Writer->>BM: commit_block()
    Writer->>RM: update_root()
    RM->>Storage: persist()
    BM->>Storage: persist()
```

### Read Flow

```mermaid
sequenceDiagram
    participant Client
    participant Reader as BlockfileReader
    participant RM as RootManager
    participant BM as BlockManager
    participant Storage

    Client->>Reader: get(prefix, key)
    Reader->>RM: get_block_ids()
    RM->>Reader: block_id_list
    loop For each block
        Reader->>BM: get_block(id)
        BM->>Storage: read()
        Storage->>Reader: block_data
    end
    Reader->>Reader: search_blocks()
    Reader->>Client: value
```

## Configuration Options

### BlockfileWriterOptions

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `prefix_path` | `String` | Required | Path prefix for storage |
| `max_block_size_bytes` | `usize` | Provider default | Maximum size per block |
| `mutation_ordering` | `BlockfileWriterMutationOrdering` | `Ordered` | Write ordering mode |
| `fork_from` | `Option<Uuid>` | `None` | Source blockfile ID for forking |
| `cmek` | `Option<Cmek>` | `None` | Customer-managed encryption key |

```rust
let mut bf_options = BlockfileWriterOptions::new(prefix_path.to_string())
    .max_block_size_bytes(pl_block_size);
bf_options = bf_options.unordered_mutations();
if let Some(cmek) = cmek {
    bf_options = bf_options.with_cmek(cmek);
}
```

资料来源：[rust/blockstore/src/arrow/provider.rs:90-110](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/provider.rs)

## Memory Blockfile

For testing and ephemeral use cases, Chroma provides an in-memory blockfile implementation:

```rust
pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}
```

**Limitations:**
- No persistence
- No fork support
- Limited to unordered mutations

```rust
if options.fork_from.is_some() {
    unimplemented!();
}
```

资料来源：[rust/blockstore/src/memory/provider.rs:40-55](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/memory/provider.rs)

## Block Reading

### RootReader

The `RootReader` is responsible for reading block metadata and identifying which blocks contain specific data:

```rust
impl RootReader {
    pub(super) fn get_all_block_ids_from_bytes(
        bytes: &[u8],
        id: Uuid,
    ) -> Result<Vec<Uuid>, FromBytesError> {
        let mut cursor = std::io::Cursor::new(bytes);
        let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
        
        let record_batch = match arrow_reader {
            Ok(mut reader) => match reader.next() {
                Some(Ok(batch)) => batch,
                Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
                None => return Err(FromBytesError::NoDataError),
            },
            Err(e) => return Err(FromBytesError::ArrowError(e)),
        };
        
        let (version, read_id) = Self::version_and_id_from_record_batch(&record_batch, id)?;
        if read_id != id {
            return Err(FromBytesError::IdMismatch);
        }
        
        Self::block_ids_from_record_batch(&record_batch, version)
    }
}
```

资料来源：[rust/blockstore/src/arrow/root.rs:20-55](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/arrow/root.rs)

## Related Components

### SpannIndex Integration

The blockstore is used by the Spann (Sparse + ANN) index for storing posting lists:

| Component | Purpose |
|-----------|---------|
| `SpannIndexReader` | Reads posting lists and HNSW indices |
| `SpannIndexWriter` | Creates and manages posting list writers |
| `SpannPostingList` | Stores document IDs and embeddings |

```rust
pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
}
```

资料来源：[rust/index/src/spann/types.rs:30-45](https://github.com/chroma-core/chroma/blob/main/rust/index/src/spann/types.rs)

## Summary

The Chroma blockstore provides a robust, extensible storage layer built on Arrow IPC format. Key architectural decisions include:

1. **Separation of concerns**: BlockManager handles data blocks while RootManager manages metadata and sparse indices
2. **Dual writer support**: Ordered and unordered writers for different access patterns
3. **Forking capability**: Efficient creation of derived blockfiles without full copies
4. **Error classification**: Clear mapping from internal errors to error codes for API responses
5. **Type-safe abstractions**: Generic key-value traits enabling flexible data modeling

---

<a id='embedding-functions'></a>

## Embedding Functions Integration

### 相关页面

相关主题：[Python Client SDK](#python-client-sdk), [Data Storage & Blockstore](#data-storage-blockstore)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)
- [clients/new-js/packages/ai-embeddings/ollama/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/ollama/README.md)
- [rust/types/src/api_types.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/api_types.rs)
- [clients/new-js/packages/ai-embeddings/all/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/all/README.md)
- [clients/new-js/packages/chromadb/src/embedding-function.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/embedding-function.ts)
- [clients/new-js/packages/chromadb/src/collection-configuration.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/collection-configuration.ts)
- [clients/new-js/packages/ai-embeddings/morph/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/morph/README.md)
- [clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md)
</details>

# Embedding Functions Integration

## Overview

Embedding Functions in Chroma provide a standardized interface for converting text into vector embeddings. Chroma supports multiple embedding providers through a plugin architecture that allows developers to use custom embedding functions or leverage hosted services like OpenAI, Cohere, Ollama, and others.

The embedding function system serves as the bridge between raw text data and the vector representation used for similarity search. Each embedding function implements a consistent interface that handles API communication, request formatting, and response parsing for its respective provider.

资料来源：[clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)

## Architecture

### High-Level Architecture

```mermaid
graph TD
    A[Client Application] --> B[Chroma Collection]
    B --> C[Embedding Function]
    C --> D[Embedding Provider API]
    D --> E[Vector Embeddings]
    E --> B
    
    F[@chroma-core/openai] --> C
    G[@chroma-core/ollama] --> C
    H[@chroma-core/cohere] --> C
    I[@chroma-core/morph] --> C
    J[@chroma-core/all] --> C
```

### Embedding Function Package Structure

Chroma organizes embedding functions into separate packages under the `@chroma-core` namespace. Each package focuses on a specific provider while sharing common utilities.

| Package | Provider | Environment Support |
|---------|----------|---------------------|
| `@chroma-core/ai-embeddings-common` | Shared utilities | Node.js + Browser |
| `@chroma-core/openai` | OpenAI | Node.js + Browser |
| `@chroma-core/ollama` | Ollama (local) | Node.js + Browser |
| `@chroma-core/cohere` | Cohere | Node.js + Browser |
| `@chroma-core/jina` | Jina AI | Node.js + Browser |
| `@chroma-core/morph` | Morph | Node.js |
| `@chroma-core/all` | All providers | Node.js + Browser |

资料来源：[clients/new-js/packages/ai-embeddings/all/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/all/README.md)

## Core Components

### Common Utilities Package

The `@chroma-core/ai-embeddings-common` package provides shared functionality used by all embedding function implementations:

```typescript
import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';
```

**Key Features:**

| Feature | Purpose |
|---------|---------|
| `validateConfigSchema` | Validates embedding function configurations using JSON schemas |
| `snakeCase` | Converts camelCase JavaScript objects to snake_case for API compatibility |
| `isBrowser` | Detects browser vs Node.js runtime environment |

资料来源：[clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)

### Dynamic Loading Mechanism

The embedding function system supports dynamic loading of packages based on configuration:

```typescript
const fullPackageName = `@chroma-core/${packageName}`;
await import(fullPackageName);
embeddingFunction = knownEmbeddingFunctions.get(packageName);
```

The system maintains mappings for known embedding function names and handles package resolution automatically when a collection is configured with a specific embedding provider.

资料来源：[clients/new-js/packages/chromadb/src/embedding-function.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/embedding-function.ts)

### Configuration Schema

Embedding functions support structured configuration with schema validation. Configuration options vary by provider but typically include:

| Parameter | Description | Provider Support |
|-----------|-------------|------------------|
| `apiKey` | API key for authentication | OpenAI, Cohere, Jina, Gemini |
| `modelName` | Specific model identifier | All providers |
| `apiBase` | Custom API endpoint URL | Ollama, Morph, Gemini |
| `encodingFormat` | Output format (float/base64) | OpenAI, Morph |

资料来源：[clients/new-js/packages/ai-embeddings/morph/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/morph/README.md)

## Provider Implementations

### OpenAI Embeddings

The OpenAI embedding function supports the OpenAI API for generating text embeddings:

```typescript
import { OpenAIEmbeddingFunction } from '@chroma-core/openai';

const openAIEF = new OpenAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'text-embedding-3-small'
});
```

### Ollama (Local Embeddings)

Ollama enables local embedding generation without external API calls:

```bash
# Install Ollama from ollama.ai
# Start the server
ollama serve
# Pull an embedding model
ollama pull chroma/all-minilm-l6-v2-f32
```

**Supported Models:**

| Model | Dimensions |
|-------|------------|
| `chroma/all-minilm-l6-v2-f32` (default) | 384 |
| `nomic-embed-text` | 768 |
| `mxbai-embed-large` | 1024 |
| `snowflake-arctic-embed` | Variable |

资料来源：[clients/new-js/packages/ai-embeddings/ollama/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/ollama/README.md)

### Morph Embeddings

Morph provides embeddings optimized for code-related content:

```typescript
const morphEmbedding = new MorphEmbeddingFunction({
  api_key: 'your-morph-api-key',
  model_name: 'morph-embedding-v2',
  api_base: 'https://api.morphllm.com/v1',
  encoding_format: 'float'
});
```

资料来源：[clients/new-js/packages/ai-embeddings/morph/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/morph/README.md)

### Chroma Cloud Qwen

Hosted embedding service using Qwen models:

```typescript
const qwenEmbedding = new QwenEmbeddingFunction({
  model: 'Qwen/Qwen3-Embedding-0.6B',
  task: 'document' // or 'query'
});
```

Configuration includes:
- `model`: The Qwen model to use
- `task`: Task type (document or query embedding)
- `instruction_dict`: Custom instructions for specific tasks
- `apiKeyEnvVar`: Environment variable for API key (default: `CHROMA_API_KEY`)

资料来源：[clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md)

## Collection Integration

### Embedding Function in Collections

When creating a collection, the embedding function can be specified at multiple levels:

```typescript
const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: openAIEF  // Specify embedding function
});
```

### Space Configuration

Embedding functions can define supported distance spaces and default configurations:

```typescript
if (overallEf && overallEf.defaultSpace && overallEf.supportedSpaces) {
  if (configuration?.hnsw === undefined && configuration?.spann === undefined) {
    configuration.hnsw = { space: overallEf.defaultSpace() };
  }
}
```

The system validates that configured spaces are supported by the embedding function and warns if mismatches occur:

```
Space 'cosine' is not supported by embedding function 'openai'. 
Supported spaces: cosine, euclidean, dotproduct
```

资料来源：[clients/new-js/packages/chromadb/src/collection-configuration.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/collection-configuration.ts)

## Query Response Structure

### Include Parameter

Queries support specifying which data to include in results through the `Include` parameter:

```rust
pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}
```

**Default Inclusion Behavior:**

| Operation | Default Includes |
|-----------|------------------|
| Query | Document, Metadata, Distance |
| Get | Document, Metadata |

**Include List Methods:**

| Method | Returns |
|--------|---------|
| `IncludeList::empty()` | No includes |
| `IncludeList::default_query()` | Document, Metadata, Distance |
| `IncludeList::default_get()` | Document, Metadata |
| `IncludeList::all()` | All five include types |

资料来源：[rust/types/src/api_types.rs](https://github.com/chroma-core/chroma/blob/main/rust/types/src/api_types.rs)

## Usage Patterns

### Basic Usage with JavaScript Client

```javascript
import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";

const chroma = new ChromaClient();
const embeddingFunction = new OpenAIEmbeddingFunction({
  apiKey: process.env.OPENAI_API_KEY
});

const collection = await chroma.createCollection({
  name: "documents",
  embeddingFunction: embeddingFunction
});

await collection.add({
  ids: ["doc-1", "doc-2"],
  documents: ["Document content here", "Another document"],
  metadatas: [{ source: "notion" }, { source: "google-docs" }]
});

const results = await collection.query({
  queryTexts: ["Search query"],
  nResults: 2
});
```

### Python Client Usage

```python
import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.create_collection("documents")

collection.add(
    documents=["Document 1", "Document 2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)

results = collection.query(
    query_texts=["Query document"],
    n_results=2
)
```

资料来源：[clients/new-js/packages/chromadb/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/README.md)

## Environment Detection

Embedding functions automatically detect the runtime environment to select the appropriate HTTP client:

```typescript
import { isBrowser } from '@chroma-core/ai-embeddings-common';

if (isBrowser()) {
  // Use browser-compatible fetch
} else {
  // Use Node.js HTTP client
}
```

This enables packages like Ollama to work seamlessly in both browser and Node.js environments:

> This package works in both Node.js and browser environments, automatically detecting the runtime and using the appropriate Ollama client.

资料来源：[clients/new-js/packages/ai-embeddings/ollama/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/ollama/README.md)

## Type Safety

The embedding function system provides TypeScript types and interfaces for:

- Configuration validation
- Response parsing
- Error handling
- Provider-specific options

```typescript
export const getSparseEmbeddingFunction = async (
  client: ChromaClient,
  efConfig?: EmbeddingFunctionConfiguration
) => {
  // Returns SparseEmbeddingFunction instance or undefined
};
```

资料来源：[clients/new-js/packages/chromadb/src/embedding-function.ts](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/chromadb/src/embedding-function.ts)

## Summary

Embedding Functions Integration in Chroma provides a unified, extensible system for text vectorization. Key aspects include:

1. **Provider Abstraction**: Standardized interface across multiple embedding providers
2. **Dynamic Loading**: Packages loaded on-demand based on collection configuration
3. **Schema Validation**: JSON schema-based configuration validation
4. **Cross-Platform**: Support for both Node.js and browser environments
5. **Flexible Configuration**: Provider-specific options with sensible defaults
6. **Space Support**: Distance metric configuration aligned with embedding provider capabilities

The plugin architecture allows Chroma to integrate new embedding providers while maintaining API consistency across the SDK.

---

---

## Doramagic 踩坑日志

项目：chroma-core/chroma

摘要：发现 6 个潜在踩坑项，其中 0 个为 high/blocking；最高优先级：能力坑 - 能力判断依赖假设。

## 1. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:546206616 | https://github.com/chroma-core/chroma | README/documentation is current enough for a first validation pass.

## 2. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | last_activity_observed missing

## 3. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

## 4. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

## 5. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | issue_or_pr_quality=unknown

## 6. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | release_recency=unknown

<!-- canonical_name: chroma-core/chroma; human_manual_source: deepwiki_human_wiki -->
