chroma Manual Preview - Doramagic.ai

Doramagic Project Pack · Human Manual

chroma

Related topics: Getting Started with Chroma, System Architecture Overview

Chroma Overview

Related topics: Getting Started with Chroma, System Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Libraries

Continue reading this section for the full explanation and source context.

Section Collection Structure

Continue reading this section for the full explanation and source context.

Section Metadata Filtering

Continue reading this section for the full explanation and source context.

Chroma Overview

Introduction

Chroma is an open-source data infrastructure platform designed specifically for AI applications. It provides the foundational building blocks for storing, querying, and managing vector embeddings along with associated metadata, enabling developers to build AI-powered applications with efficient similarity search capabilities. Sources: README.md:1

As an open-source solution, Chroma offers flexibility for self-hosting while also providing a cloud-hosted option called Chroma Cloud, which delivers serverless vector, hybrid, and full-text search capabilities. The platform is designed to be fast, cost-effective, scalable, and straightforward to deploy. Sources: README.md:17-21

Architecture Overview

Chroma follows a client-server architecture with multiple client libraries available for different programming environments. The system is built with Rust for core performance-critical components and provides idiomatic client libraries for Python and JavaScript/TypeScript.

graph TD
    A[Client Applications] --> B[Python Client / JS Client]
    B --> C[Chroma Server API]
    C --> D[Worker Nodes]
    D --> E[Blockstore<br/>Arrow Storage]
    D --> F[Compaction &<br/>Log Processing]
    E --> G[Persistent Storage]
    
    H[Chroma Cloud] -.->|Optional hosted| C

Client Libraries

Chroma provides two primary client libraries:

Client	Package	Description
Python	`chromadb`	Full-featured Python client library Sources: clients/python/README.md:1
Python HTTP	`chromadb-client`	Lightweight HTTP-only client for server connections Sources: clients/python/README.md:12
JavaScript/TypeScript	`chromadb` (npm)	Full-featured JS client for Node.js and browser Sources: clients/new-js/packages/chromadb/README.md:1

#### Python Client Installation

pip install chromadb  # Full client library
pip install chromadb-client  # HTTP client only

#### JavaScript Client Example

import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });

for (let i = 0; i < 20; i++) {
  await collection.add({
    ids: ["test-id-" + i.toString()],
    embeddings: [[1, 2, 3, 4, 5]],
    documents: ["test"],
  });
}

const queryData = await collection.query({
  queryEmbeddings: [[1, 2, 3, 4, 5]],
  queryTexts: ["test"],
});

Sources: clients/new-js/packages/chromadb/README.md:9-27

Data Model

Collection Structure

Collections in Chroma serve as the primary organizational unit for storing related documents and their associated embeddings. Each collection contains:

Documents: The textual content to be embedded
Embeddings: Vector representations of documents
Metadatas: Key-value pairs for filtering and categorization
Unique Identifiers: User-provided IDs for each record Sources: clients/python/README.md:16-27

Metadata Filtering

Chroma supports rich metadata filtering through operators that enable precise data retrieval:

graph LR
    A[Query Request] --> B[Metadata Filter]
    B --> C{Operator Type}
    C -->|Contains| D[String contains check]
    C -->|NotContains| E[String excludes check]
    C -->|Regex| F[Regular expression match]
    C -->|NotRegex| G[Regex exclusion]

Supported Document Operators:

Operator	Description	Example
`Contains`	Document contains substring	`{"$contains": "keyword"}`
`NotContains`	Document excludes substring	`{"$not_contains": "spam"}`
`Regex`	Regular expression match	`{"$regex": "^prefix.*"}`
`NotRegex`	Exclude by regex pattern	`{"$not_regex": ".*suffix$"}`

Sources: rust/types/src/metadata.rs:1-30

Search Keys

The query system supports specialized keys for accessing different aspects of stored data:

Key	Description	Usage
`#document`	Full text content	`Key::Document`
`#embedding`	Vector embeddings	`Key::Embedding`
`#metadata`	Record metadata	`Key::Metadata`
`#score`	Similarity score	`Key::Score`
Custom fields	User-defined metadata	`Key::field("field_name")`

Sources: rust/types/src/execution/operator.rs:1-80

Core Components

Storage Layer

The blockstore provides the underlying storage mechanism using Arrow format for efficient columnar data storage and retrieval. This enables high-performance queries across large datasets. Sources: rust/blockstore/src/arrow/root.rs:1

Execution Operators

Chroma's query execution pipeline uses operators that transform and filter data through well-defined stages:

graph TD
    A[Query Request] --> B[Log Fetch Orchestrator]
    B --> C[KNN Filter]
    C --> D[Apply Logs Orchestrator]
    D --> E[Segment Writers]
    E --> F[Compact Collection]

Key Orchestrators:

Component	Purpose
`LogFetchOrchestrator`	Fetches and materializes log entries Sources: rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs:1
`KnnFilter`	Performs k-nearest neighbor filtering Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1
`ApplyLogsOrchestrator`	Applies log entries to segment writers Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1

Error Handling

The system uses a consistent error code hierarchy for reliable error management:

Error Code	Description
`InvalidArgument`	Client-provided invalid parameters
`Internal`	System-level internal errors
`ResourceExhausted`	Resource limits reached (e.g., task abortion)

Sources: rust/blockstore/src/arrow/block/types.rs:1-20

Deployment Options

Self-Hosting

Chroma can be deployed on-premises or in cloud environments using Docker, Kubernetes, or direct installation.

Deployment Requirements:

Component	Specification
Storage	Persistent volume for vector data
Network	Port 8000 for API access
Auth	Optional token or basic authentication (v0.4.7+)

Sources: examples/deployments/do-terraform/README.md:1-50

Starting the Server:

# Install via pip
pip install chromadb

# Run in client-server mode
chroma run --path /chroma_db_path

Sources: README.md:14-16

Chroma Cloud

Chroma Cloud provides a fully managed hosted service with:

Serverless vector search
Hybrid search capabilities
Full-text search integration
Automatic scaling
$5 free credits for new users

Sources: README.md:23-29

Cloud Deployment (Terraform Example)

For DigitalOcean deployment:

export TF_VAR_do_token=<DIGITALOCEAN_TOKEN>
export TF_ssh_public_key="./chroma-do.pub"
export TF_ssh_private_key="./chroma-do"
export TF_VAR_chroma_release="0.4.12"
export TF_VAR_region="ams2"
export TF_VAR_public_access="true"
export TF_VAR_enable_auth="true"
export TF_VAR_auth_type="token"

terraform apply -auto-approve

Sources: examples/deployments/do-terraform/README.md:30-45

CLI Tool

The Rust-based CLI provides command-line management capabilities:

chroma run --path <db_path>     # Run the server
chroma db create <db_name>      # Create database
chroma db list                  # List databases
chroma login                    # Authenticate with Chroma Cloud
chroma profile                  # Manage profiles
chroma install                  # Install updates
chroma update                   # Check for updates

Sources: rust/cli/src/lib.rs:1-30

Embedding Integration

Ollama Integration

The JavaScript client supports Ollama for local embedding generation:

Configuration Options:

Option	Default	Description
`url`	`http://localhost:11434`	Ollama server URL
`model`	`chroma/all-minilm-l6-v2-f32`	Embedding model

Supported Models:

Model	Dimensions	Use Case
`chroma/all-minilm-l6-v2-f32`	384	General purpose (default)
`nomic-embed-text`	768	Extended context
`mxbai-embed-large`	1024	High accuracy
`snowflake-arctic-embed`	Variable	Domain-specific

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md:1-40

API Response Format

Get Response Structure

Query results are returned with flexible inclusion options:

pub struct GetResponse {
    pub ids: Vec<String>,
    pub embeddings: Option<Vec<Vec<f32>>>,      // Optional
    pub documents: Option<Vec<Option<String>>>, // Optional
    pub uris: Option<Vec<Option<String>>>,      // Optional
    pub metadatas: Option<Vec<Option<Metadata>>>, // Optional
    pub include: IncludeList,
}

Sources: rust/types/src/api_types.rs:1-30

License

Chroma is released under the Apache 2.0 license, making it suitable for both commercial and open-source projects. Sources: README.md:10

Community and Support

Resource	Link
Documentation	https://docs.trychroma.com/
Discord	https://discord.gg/MMeYNTmh3x
Homepage	https://www.trychroma.com/

Sources: [clients/new-js/packages/chromadb/README.md:9-27]()

Getting Started with Chroma

Related topics: Chroma Overview, Python Client SDK

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Client

Continue reading this section for the full explanation and source context.

Section JavaScript/TypeScript Client

Continue reading this section for the full explanation and source context.

Section Python Client Setup

Continue reading this section for the full explanation and source context.

Related topics: Chroma Overview, Python Client SDK

Getting Started with Chroma

Chroma is an open-source data infrastructure for AI that provides vector, hybrid, and full-text search capabilities. It enables developers to build AI applications by storing embeddings, documents, and metadata with efficient querying mechanisms.

Overview

Chroma serves as a vector database optimized for AI workloads. It allows you to:

Store embeddings alongside documents and metadata
Query using text or embedding vectors
Filter results based on metadata
Work with multiple programming languages including Python and JavaScript

Installation

Python Client

Install the Python client using pip:

pip install chromadb

For a lightweight HTTP-only client that connects to a Chroma server:

pip install chromadb-client

Sources: clients/python/README.md

JavaScript/TypeScript Client

For the new JavaScript client:

npm install chromadb

For a lighter package with optional dependencies:

npm install chromadb-client

Sources: clients/new-js/packages/chromadb/README.md

Basic Setup and Configuration

Python Client Setup

Connect to a Chroma server running locally:

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)

Sources: clients/python/README.md

JavaScript Client Setup

import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });

Sources: clients/new-js/packages/chromadb/README.md

Running Chroma Server

To run Chroma in client-server mode:

chroma run --path /chroma_db_path

Sources: README.md

Core Operations

Creating a Collection

Collections are containers for your documents, embeddings, and metadata.

collection = client.create_collection("all-my-documents")

Adding Documents

Add documents with optional embeddings, metadata, and unique IDs:

collection.add(
    documents=["This is document1", "This is document2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)

Sources: clients/python/README.md

Querying Documents

Query the collection using text or embeddings:

results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)

const queryData = await collection.query({
    queryEmbeddings: [[1, 2, 3, 4, 5]],
    queryTexts: ["test"],
});

Sources: clients/python/README.md and clients/new-js/packages/chromadb/README.md

Embedding Functions

Chroma supports various embedding providers through configurable embedding functions.

Configuration Schema

Embedding functions use JSON Schema validation to ensure cross-language compatibility:

from chromadb.utils.embedding_functions.schemas import validate_config

config = {
    "api_key_env_var": "CHROMA_OPENAI_API_KEY",
    "model_name": "text-embedding-ada-002"
}
validate_config(config, "openai")

Each schema follows JSON Schema Draft-07 specification and includes version, title, description, properties, required fields, and additionalProperties settings.

Sources: chromadb/utils/embedding_functions/schemas/README.md

Available Embedding Providers

Provider	Package	API Key Environment Variable
OpenAI	`@chroma-core/openai`	`CHROMA_OPENAI_API_KEY`
Cohere	`@chroma-core/cohere`	`COHERE_API_KEY`
Jina	`@chroma-core/jina`	`JINA_API_KEY`
Google Gemini	`@chroma-core/google-gemini`	`GOOGLE_API_KEY`
Hugging Face	`@chroma-core/hugging-face`	`HF_API_KEY`
Ollama	`@chroma-core/ollama`	`OLLAMA_API_KEY`
Together AI	`@chroma-core/together-ai`	`TOGETHER_API_KEY`
Voyage AI	`@chroma-core/voyageai`	`VOOYAGE_API_KEY`
xAI	`@chroma-core/xai`	`XAI_API_KEY`

Sources: clients/new-js/packages/ai-embeddings/all/README.md

Using Embedding Functions

import { ChromaClient } from 'chromadb';
import { JinaEmbeddingFunction } from '@chroma-core/jina';

const embedder = new JinaEmbeddingFunction({
    apiKey: 'your-api-key',
    modelName: 'jina-embeddings-v2-base-en',
    task: 'retrieval.passage',
    dimensions: 768,
    lateChunking: false,
    truncate: true,
    normalized: true,
    embeddingType: 'float'
});

const collection = await client.createCollection({
    name: 'my-collection',
    embeddingFunction: embedder,
});

Sources: clients/new-js/packages/ai-embeddings/jina/README.md

Common Utilities

The @chroma-core/ai-embeddings-common package provides shared utilities:

import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';

// Convert camelCase to snake_case
const snakeCaseConfig = snakeCase({ modelName: 'text-embedding-3-small' });
// Result: { model_name: 'text-embedding-3-small' }

// Check environment
if (isBrowser()) {
    // Browser-specific logic
}

Sources: clients/new-js/packages/ai-embeddings/common/README.md

JavaScript Client Packages

chromadb vs chromadb-client

Feature	`chromadb`	`chromadb-client`
Package size	Larger	Smaller
Dependencies	Bundled	Optional peer dependencies
Use case	Quick setup	Production with specific providers

The chromadb-client package is ideal for production environments where you only use specific embedding providers.

Sources: clients/js/packages/chromadb-client/README.md

Chroma Cloud

Chroma Cloud provides a hosted service for serverless vector, hybrid, and full-text search. To use Chroma Cloud:

Sign up at trychroma.com
Create a database
Get your API key from the dashboard

Configure environment variables for cloud access:

export CHROMA_API_KEY=your-api-key
export CHROMA_TENANT=your-tenant
export CHROMA_DATABASE=your-database

Sources: README.md and rust/chroma/README.md

Environment Variables

Variable	Description
`CHROMA_API_KEY`	API key for Chroma Cloud authentication
`CHROMA_TENANT`	Sets the tenant (auto-inferred with API key)
`CHROMA_DATABASE`	Sets the database (auto-inferred with scoped API key)
`[PROVIDER]_API_KEY`	Provider-specific API keys (e.g., `OPENAI_API_KEY`)

For local development, you can use:

let client = ChromaHttpClient::from_env()?;

Sources: rust/chroma/README.md

Complete Example Workflow

graph TD
    A[Install Chroma Client] --> B[Initialize Client]
    B --> C[Create Collection]
    C --> D[Add Documents with Embeddings]
    D --> E[Query Collection]
    E --> F[Get Results]
    
    G[Configure Embedding Function] --> D
    H[Add Metadata] --> D
    I[Set API Keys] --> B

Quick Reference Commands

Installation

# Python
pip install chromadb

# JavaScript
npm install chromadb

# Start server
chroma run --path /chroma_db_path

Basic Operations

Operation	Python	JavaScript
Create client	`client = chromadb.HttpClient()`	`new ChromaClient()`
Create collection	`client.create_collection(name)`	`client.createCollection({name})`
Add documents	`collection.add(...)`	`collection.add(...)`
Query	`collection.query(...)`	`collection.query(...)`

Additional Resources

Sources: [clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)

System Architecture Overview

Related topics: Rust Backend Services Architecture, Go Coordinator & Distributed Systems, Protocol Buffers & gRPC API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Frontend Server

Continue reading this section for the full explanation and source context.

Section Worker Server

Continue reading this section for the full explanation and source context.

Section System Database (SysDB)

Continue reading this section for the full explanation and source context.

System Architecture Overview

Introduction

Chroma is an open-source data infrastructure platform designed for AI applications, providing vector, hybrid, and full-text search capabilities. The system is built as a distributed, scalable architecture that handles embedding storage, indexing, and query execution across multiple components. Chroma positions itself as the open-source alternative to hosted vector database services, enabling developers to deploy sophisticated AI search infrastructure while maintaining full control over their data.

The architecture follows a modular design pattern with distinct components for API serving, query processing, data storage, and system coordination. Each component is responsible for specific aspects of the data pipeline, from ingestion through indexing to query execution.

High-Level Architecture

Chroma's architecture consists of three primary layers working in concert to provide vector search capabilities:

Frontend Layer - Handles API requests and response formatting
Worker Layer - Executes query operations and manages indexing
System Database (SysDB) Layer - Maintains metadata and system state

graph TD
    A[Client Application] --> B[Frontend Server]
    B --> C[Worker Servers]
    C --> D[SysDB]
    C --> E[Blockstore]
    E --> F[Arrow Files]
    D --> G[Collection Metadata]
    G --> H[Topology Information]

Component Architecture

Frontend Server

The frontend server component serves as the API gateway for Chroma, handling incoming HTTP/gRPC requests and translating them into internal operations. The frontend is responsible for request validation, authentication handling, and response serialization.

Key Responsibilities:

Responsibility	Description
API Endpoint Handling	Exposes REST and gRPC endpoints for collection operations
Request Validation	Validates incoming query parameters and payload structures
Response Serialization	Converts internal data structures to API response formats
Error Mapping	Translates internal errors to appropriate HTTP status codes

Sources: rust/frontend/src/server.rs:1-50

The frontend server implements the ChromaError trait for consistent error handling across the system. Error codes are mapped as follows:

Internal Error	HTTP Status Code
InvalidArgument	400 Bad Request
NotFound	404 Not Found
Internal	500 Internal Server Error
Unavailable	503 Service Unavailable

Worker Server

The worker server handles the core data operations including embedding storage, indexing, and query execution. Workers are the primary compute units in Chroma's architecture, responsible for processing search requests and maintaining index structures.

Sources: rust/worker/src/server.rs:1-60

Worker Components:

graph LR
    A[Query Request] --> B[Query Planner]
    B --> C[HNSW Index]
    B --> D[Spann Index]
    B --> E[Record Segment]
    B --> F[Metadata Segment]
    C --> G[Result Merger]
    D --> G
    E --> G
    F --> G
    G --> H[Response]

The worker server implements orchestration components for managing complex operations:

ApplyLogsOrchestrator - Coordinates log application and compaction
WorkQueueClient - Manages distributed task execution
Segment Writers - Handles data persistence for different segment types

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-80

System Database (SysDB)

The SysDB component maintains all metadata about collections, segments, and system topology. It provides a centralized view of the system's state and enables coordination across multiple workers.

SysDB Responsibilities:

Function	Description
Collection Metadata	Stores collection configurations and schemas
Segment Registry	Tracks active segments and their locations
Topology Management	Manages provider-region mappings for distributed deployments
Transaction Coordination	Ensures consistency across distributed operations

Sources: rust/sysdb/src/sysdb.rs:1-100

The SysDB uses a provider-region topology model that supports multi-cloud and multi-region deployments:

pub struct ProviderRegion<T> {
    name: RegionName,
    provider: String,      // e.g., "aws", "gcp"
    region: String,        // e.g., "us-east-1"
    config: T,             // Provider-specific configuration
}

Sources: rust/types/src/topology.rs:1-60

Data Model Architecture

Collection Schema

Collections in Chroma follow a flexible schema model that supports multiple index types and data fields.

graph TD
    A[Collection] --> B[Record Segment]
    A --> C[Metadata Segment]
    A --> D[Vector Index]
    A --> E[Sparse Vector Index]
    D --> F[HNSW Index]
    D --> G[Spann Index]

Supported Index Types:

Index Type	Purpose	Key Configuration
Vector Index	Dense embeddings	`Space` (Cosine, L2, Dot), HNSW params
Sparse Vector Index	BM25-style inverted index	StringInvertedIndexConfig
Spann Index	Memory-efficient approximate search	InternalSpannConfiguration

Sources: rust/types/src/collection_schema.rs:1-150

API Types

The API layer defines core types for query operations:

Type	Purpose
`Include`	Specifies which fields to return (distances, documents, embeddings, metadatas, uris)
`IncludeList`	Collection of Include values with convenience constructors
`WhereDocumentOperator`	Document filtering (Contains, NotContains, Regex, NotRegex)

Sources: rust/types/src/api_types.rs:1-100

pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}

impl IncludeList {
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, Include::Embedding, Include::Uri])
    }
}

Metadata Filtering

Chroma supports rich metadata filtering through the MetadataExpression and MetadataComparison types:

graph TD
    A[MetadataExpression] --> B[key: String]
    A --> C[comparison: MetadataComparison]
    C --> D[Primitive: Operator + Value]
    C --> E[Set: Operator + SetValue]

Sources: rust/types/src/metadata.rs:1-80

Blockstore Architecture

The blockstore provides persistent storage for indexed data using Apache Arrow format for efficient serialization and querying.

Arrow Block Structure

graph LR
    A[Write Operation] --> B[Block Delta]
    B --> C[Commit to Block]
    C --> D[Arrow IPC Format]
    D --> E[Disk Storage]
    E --> F[BlockfileReader]

Block Types:

Block Type	Description
`OrderedBlockDelta`	Sequential writes with ordering guarantees
`UnorderedBlockDelta`	High-throughput writes without ordering
`DirectoryBlock`	Sparse posting directory entries

Sources: rust/blockstore/src/arrow/block/types.rs:1-100

The Arrow layout verification ensures data integrity:

pub enum ArrowLayoutVerificationError {
    BufferLengthNotAligned,
    NoRecordBatches,
    MultipleRecordBatches,
    InvalidMessageType,
    RecordBatchDecodeError,
}

Sparse Posting Blocks

Sparse vectors use a specialized block format for efficient storage:

body = [ max_offset: u32 LE, max_weight: f32 LE ] × num_entries

The DirectoryBlock stores per-posting-block metadata for term pruning:

max_offset: Largest document offset in the posting block
max_weight: Largest weight in the posting block

Sources: rust/types/src/sparse_posting_block.rs:1-60

Spann Index Architecture

Spann is Chroma's memory-efficient approximate nearest neighbor index that combines HNSW with posting lists.

graph TD
    A[SpannIndexWriter] --> B[HNSW Index]
    A --> C[Posting Lists]
    A --> D[Versions Map]
    A --> E[MaxHeadID Blockfile]
    B --> F[Reader with adaptive search]

SpannIndexReader Structure:

Component	Type	Purpose
posting_lists	BlockfileReader<u32, SpannPostingList>	Term postings
hnsw_index	HnswIndexRef	Graph-based search
versions_map	BlockfileReader<u32, u32>	Version tracking
dimensionality	usize	Vector dimension
adaptive_search_nprobe	bool	Adaptive parameter

Sources: rust/index/src/spann/types.rs:1-80

Indexing Pipeline

The indexing pipeline handles document ingestion through the following stages:

graph LR
    A[Add Records] --> B[ApplyLogsOrchestrator]
    B --> C[Record Segment Writer]
    B --> D[Metadata Segment Writer]
    B --> E[Vector Index Writer]
    C --> F[Flush to Blockstore]
    D --> F
    E --> F
    F --> G[Collection Update]

Error Handling:

The orchestrator implements comprehensive error tracking:

Error Type	Error Code	Tracing
ApplyLog	Internal	Yes
Channel	Internal	Yes
Commit	Internal	Yes
HnswSegment	Internal	Yes
MetadataSegment	Internal	Yes
Seal	Internal	Yes
InvariantViolation	-	Always

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-100

Query Execution Flow

Query Request Processing

graph TD
    A[Query Request] --> B[Parse Query]
    B --> C[Load Segments]
    C --> D[Parallel Segment Queries]
    D --> E{HNSW Search}
    D --> F{Spann Search}
    D --> G{Record Scan}
    E --> H[Merge Results]
    F --> H
    G --> H
    H --> I[Apply Filters]
    I --> J[Return Results]

Work Queue Integration

Distributed query execution uses a work queue system for task coordination:

graph TD
    A[Coordinator] --> B[WorkQueueClient]
    B --> C[gRPC Channel]
    C --> D[Worker Pool]
    D --> E[Task Execution]
    E --> F[Result Collection]

Error Code Mapping:

gRPC Code	Chroma Error Code
Unavailable	Unavailable
DeadlineExceeded	DeadlineExceeded
ResourceExhausted	ResourceExhausted
NotFound	NotFound
InvalidArgument	InvalidArgument

Sources: rust/worker/src/work_queue/work_queue_client.rs:1-80

Deployment Topology

Chroma supports flexible deployment configurations through its topology model:

graph TD
    A[Topology] --> B[TopologyName]
    A --> C[Vec<RegionName>]
    A --> D[Config T]
    C --> E[ProviderRegion]
    E --> F[Provider]
    E --> G[Region]

The topology system enables:

Multi-cloud deployments (AWS, GCP, Azure)
Region-specific configurations
Custom provider extensions

Summary

Chroma's architecture provides a scalable foundation for AI-powered search with several key design principles:

Separation of Concerns - Frontend, worker, and SysDB components handle distinct responsibilities
Arrow-Based Storage - Efficient columnar storage for analytical queries
Flexible Indexing - Support for HNSW, Spann, and sparse vector indexes
Distributed Coordination - Work queues and topology management for multi-node deployments
Comprehensive Error Handling - Consistent error codes and tracing across all components

The modular architecture allows Chroma to scale from single-node development deployments to distributed production clusters serving AI applications at scale.

Sources: [rust/frontend/src/server.rs:1-50]()

Protocol Buffers & gRPC API

Related topics: System Architecture Overview, Rust Backend Services Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Service Definitions

Continue reading this section for the full explanation and source context.

Section Data Type Coverage

Continue reading this section for the full explanation and source context.

Section Record Conversions

Continue reading this section for the full explanation and source context.

Protocol Buffers & gRPC API

Chroma uses Protocol Buffers (protobuf) as the core serialization format for inter-service communication and data persistence. The IDL (Interface Definition Language) files in the idl/ directory define the service APIs, data structures, and message types that power Chroma's distributed architecture.

Architecture Overview

Chroma employs a client-server architecture where Protocol Buffers serve as the contract between components. The protobuf definitions are centralized in the idl/ directory and used to generate code for multiple language runtimes including Python, JavaScript, Go, and Rust.

graph TD
    subgraph "Client Layer"
        JS[JavaScript Client]
        PY[Python Client]
        GO[Go Client]
    end
    
    subgraph "IDL Definitions"
        PROTO[Protocol Buffer Definitions]
    end
    
    subgraph "Server Layer"
        API[API Server]
        COORD[Coordinator Service]
        QUERY[Query Executor]
    end
    
    JS -->|Generated TS Bindings| PROTO
    PY -->|Generated Python Stub| PROTO
    GO -->|Generated Go Code| PROTO
    API -->|gRPC/prost| PROTO
    COORD -->|gRPC/prost| PROTO
    QUERY -->|gRPC/prost| PROTO

Proto Definitions Structure

Core Service Definitions

The main protobuf definitions are organized in idl/chromadb/proto/:

Proto File	Purpose	Key Messages
`chroma.proto`	Core data types and collection operations	Collection, Database, OperationRecord
`coordinator.proto`	Coordinator service for cluster management	Tenant, Database, Segment operations
`query_executor.proto`	Query execution service interface	Query requests and responses

Data Type Coverage

The protobuf definitions cover all core data types used throughout Chroma:

Data Type	Usage
`Vector`	Embedding vectors with scalar encoding
`OperationRecord`	CRUD operations for records
`LogRecord`	Write-ahead log entries with offsets
`Metadata`	Key-value metadata for filtering
`Collection`	Collection configuration and schema
`Cmek`	Customer-managed encryption keys

Rust Type Conversions

Chroma's Rust backend uses protobuf-generated types and converts them to idiomatic Rust types through TryFrom implementations. This pattern ensures type safety and clean separation between the wire format and internal representations.

Record Conversions

The rust/types/src/record.rs file contains conversion logic between protobuf and Rust types:

graph LR
    A[chroma_proto::LogRecord] -->|TryFrom| B[LogRecord Rust]
    A2[chroma_proto::Vector] -->|TryFrom| B2[(Vec<f32>, ScalarEncoding)]

OperationRecord Conversion (Sources: rust/types/src/record.rs:recordinfo)

The OperationRecord conversion extracts metadata and document fields from protobuf representations:

// Metadata is extracted from proto, with document potentially in metadata
let (metadata, document) = match operation_record_proto.metadata {
    Some(proto_metadata) => match UpdateMetadata::try_from(proto_metadata) {
        Ok(mut metadata) => {
            let document = metadata.remove(CHROMA_DOCUMENT_KEY);
            match document {
                Some(UpdateMetadataValue::Str(document)) => {
                    (Some(metadata), Some(document))
                }
                _ => (Some(metadata), None),
            }
        }
        Err(e) => return Err(RecordConversionError::...),
    },
    None => (None, None),
};

Vector Type Conversions

Vectors are stored with their encoding information (Sources: rust/types/src/record.rs:vector)

impl TryFrom<chroma_proto::Vector> for (Vec<f32>, ScalarEncoding) {
    type Error = VectorConversionError;
    // Conversion implementation
}

Metadata Filtering Types

The metadata system supports rich filtering expressions defined in protobuf and converted to Rust types (Sources: rust/types/src/metadata.rs:metadata-types)

Document Operators

graph TD
    DOC_OPS[WhereDocumentOperator] --> Contains
    DOC_OPS --> NotContains
    DOC_OPS --> Regex
    DOC_OPS --> NotRegex

Operator	Description
`Contains`	Document contains substring
`NotContains`	Document does not contain substring
`Regex`	Document matches regex pattern
`NotRegex`	Document does not match regex pattern

Metadata Expression Structure

pub struct MetadataExpression {
    pub key: String,
    pub comparison: MetadataComparison,
}

Metadata comparisons support both primitive types (strings, integers, floats, booleans) and set operations.

Collection Schema Definitions

Schema definitions in rust/types/src/collection_schema.rs define how collections are configured for indexing (Sources: rust/types/src/collection_schema.rs:schema-struct)

Schema Builder Pattern

The Schema struct provides a fluent builder API for index configuration:

graph TD
    SCHEMA[Schema::default] --> CREATE_INDEX[.create_index]
    CREATE_INDEX --> VALIDATE[Validate Index Config]
    VALIDATE -->|Valid| RETURN[Return Self]
    VALIDATE -->|Invalid| ERROR[SchemaBuilderError]

Index Creation Example (Sources: rust/types/src/collection_schema.rs:create-index-example)

let schema = Schema::default()
    .create_index(None, VectorIndexConfig {
        space: Some(Space::Cosine),
        embedding_function: None,
        source_key: None,
        hnsw: None,
        spann: None,
    }.into())?
    .create_index(Some("category"), StringInvertedIndexConfig {}.into())?;

Supported Index Types

Index Type	Configuration	Applies To
`VectorIndexConfig`	HNSW, Space (Cosine/L2/IP), embedding function	`#embedding` key only
`StringInvertedIndexConfig`	String indexing	Custom string keys
`FtsIndexConfig`	Full-text search	Document key

CMEK (Customer-Managed Encryption Keys)

Chroma supports customer-managed encryption keys through the Cmek type defined in protobuf (Sources: rust/types/src/collection_schema.rs:cmek)

CMEK Provider Configuration

Provider	Validation Pattern	Resource Format
GCP	`CMEK_GCP_RE` regex	GCP resource identifier

impl Cmek {
    pub fn gcp(resource: String) -> Self;
    pub fn validate_pattern(&self) -> bool;
}

Topology and Region Management

For multi-region deployments, Chroma uses topology definitions (Sources: rust/types/src/topology.rs:topology)

Provider Region Structure

classDiagram
    class ProviderRegion {
        +name: RegionName
        +provider: String
        +region: String
        +config: T
    }
    
    class Topology {
        +name: TopologyName
        +regions: Vec~RegionName~
        +config: T
    }

Component	Description
`ProviderRegion`	Single cloud provider region configuration
`Topology`	Collection of regions forming a deployment topology

Code Generation Pipeline

Build Process

Protobuf definitions are compiled to target languages using protoc and language-specific plugins (Sources: go/README.md:protobuf-setup)

graph LR
    A[.proto files] --> B[protoc compiler]
    B -->|Python| C[Python stubs]
    B -->|Go| D[Go gRPC code]
    B -->|JS/TS| E[TypeScript definitions]
    B -->|Rust| F[Rust + prost]

Required Tools

Tool	Purpose
`protoc`	Protocol Buffer compiler
`protoc-gen-go`	Go code generation
`protoc-gen-go-grpc`	Go gRPC service generation

Generated API Patterns

The generated TypeScript API in clients/js/packages/chromadb-core/src/generated/api.ts follows standard gRPC-web patterns (Sources: clients/js/packages/chromadb-core/src/generated/api.ts:fetch-pattern)

const localVarFetchArgs = ApiApiFetchParamCreator(configuration).version(options);
return (fetch: FetchAPI = defaultFetch, basePath: string = BASE_PATH) => {
    return fetch(
        basePath + localVarFetchArgs.url,
        localVarFetchArgs.options,
    ).then((response) => {
        // Handle response by content type and status
        if (response.status === 200) {
            if (mimeType === "application/json") {
                return response.json();
            }
        }
        // Error handling for 401, 404, 409, 500
    });
};

Error Code Mapping

Error types are mapped from Rust/Arrow errors to Chroma error codes (Sources: rust/blockstore/src/arrow/root.rs:error-mapping)

Arrow Error Type	Chroma Error Code
`IOError`	`Internal`
`ArrowError`	`Internal`
`LayoutVerificationError`	`Internal`
`FromBytesError` variants	`InvalidArgument` / `Internal`

Message Format Details

Arrow Block Serialization

Binary data in protobuf messages uses Arrow IPC format for efficient columnar storage (Sources: rust/blockstore/src/arrow/root.rs:arrow-reader)

let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
let record_batch = match arrow_reader {
    Ok(mut reader) => match reader.next() {
        Some(Ok(batch)) => batch,
        Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
        None => return Err(FromBytesError::NoDataError),
    },
    Err(e) => return Err(FromBytesError::ArrowError(e)),
};

The Arrow footer format requires:

ARROW_MAGIC header (6 bytes)
Footer content
Footer length (4 bytes)
Footer checksum

Python Client SDK

Related topics: Getting Started with Chroma, JavaScript/TypeScript Client SDKs, Embedding Functions Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Layer

Continue reading this section for the full explanation and source context.

Section Collection Management

Continue reading this section for the full explanation and source context.

Section Data Model

Continue reading this section for the full explanation and source context.

Python Client SDK

The Chroma Python Client SDK is the official Python library for interacting with Chroma, an open-source vector database designed for AI applications. This SDK provides a complete interface for managing collections, storing embeddings, and performing similarity searches across vector data.

Overview

Chroma positions itself as the open-source data infrastructure for AI, offering developers a streamlined way to incorporate vector search capabilities into their applications. The Python Client SDK serves as the primary client library for Python developers, enabling seamless integration with Chroma's vector database capabilities.

The SDK supports two primary modes of operation: embedded mode, where the database runs locally within the same process, and client-server mode, where the Python client communicates with a remote Chroma server via HTTP. This flexibility allows developers to choose the deployment architecture that best fits their application requirements, whether they need a lightweight local setup for development and testing or a scalable server-based deployment for production environments.

For Python-specific installations, developers can choose between the full chromadb package, which includes all embedding libraries as dependencies, or the chromadb-client package, which is a lightweight HTTP-only client that connects to a running Chroma server. The installation is straightforward via pip, making it accessible for projects of all sizes.

The SDK is designed with developer productivity in mind, providing intuitive APIs for common operations like adding documents, querying collections, and managing metadata. It handles the complexity of embedding generation and vector storage behind a clean, Pythonic interface, allowing developers to focus on building their AI applications rather than managing low-level database operations.

Architecture

The Python Client SDK follows a layered architecture that separates concerns between the client interface, API communication, and data models. Understanding this architecture helps developers effectively use the SDK and troubleshoot any issues that may arise during development.

graph TD
    A[Application Code] --> B[ChromaClient / AsyncChromaClient]
    B --> C[Collection API]
    B --> D[Embedding Functions]
    C --> E[REST API Layer]
    D --> F[External Embedding Providers]
    E --> G[Chroma Server]
    E --> H[Embedded Mode]
    G --> I[Persistent Storage]
    H --> I

Client Layer

The client layer forms the entry point for all SDK operations. Chroma provides two client implementations: the synchronous Client class for traditional Python applications and the AsyncClient class for asynchronous applications built with async/await patterns.

The synchronous client is suitable for most use cases, providing blocking API calls that execute immediately and return results. This approach is familiar to developers coming from traditional Python backgrounds and works well in scripts, batch processing jobs, and web applications that don't require high concurrency.

The asynchronous client, on the other hand, is designed for applications that need to handle many concurrent operations efficiently, such as web servers built on frameworks like FastAPI or Starlette. By using Python's asyncio library, the async client can perform multiple network operations concurrently, improving throughput in I/O-bound scenarios.

Both clients share a similar interface, with the async client simply wrapping the underlying HTTP calls with async/await syntax. This consistency makes it easy to switch between synchronous and asynchronous code as requirements evolve.

Collection Management

Collections serve as the primary organizational unit in Chroma, analogous to tables in traditional relational databases or buckets in object storage. Each collection contains a set of vectors along with their associated metadata, documents, and unique identifiers.

The SDK provides a comprehensive collection API that supports creating new collections, retrieving existing ones, listing all collections in the database, and deleting collections when they're no longer needed. Collections can be configured with specific settings at creation time, including the embedding function to use for auto-embedding documents and the name of the collection for identification purposes.

Collections maintain a schema-like structure through their use of metadata. While Chroma is schemaless in the traditional sense, the metadata associated with vectors allows developers to impose structure on their data for filtering and organization purposes.

Data Model

The data model in Chroma revolves around four core concepts: vectors, documents, metadata, and IDs. Each record in a collection consists of these four components, providing a flexible yet structured way to store and retrieve information.

Vectors are the mathematical representations of data in embedding space. They can be provided directly by the application or generated automatically using embedding functions. The SDK accepts vectors as lists of floating-point numbers, making it compatible with output from virtually any embedding model.

Documents are the original text or content that was transformed into vectors. Storing documents alongside their vectors enables applications to retrieve the original content during query operations without needing to maintain a separate document store.

Metadata provides contextual information about each record. Examples include the source of the document, timestamps, user IDs, or any other application-specific attributes. Metadata can be used for filtering during queries, allowing applications to narrow search results based on specific criteria.

IDs uniquely identify each record within a collection. The SDK accepts string identifiers, giving applications flexibility in how they choose to name and reference their data. Common patterns include using UUIDs, meaningful string identifiers derived from the document content, or sequential numbers.

Installation and Setup

Installing the Chroma Python Client SDK is straightforward using pip, Python's package manager. The SDK is available in two variants to accommodate different use cases and deployment scenarios.

pip install chromadb

This command installs the full Chroma package, which includes all core functionality plus built-in support for various embedding providers. This variant is recommended for most users who want a complete, self-contained installation.

pip install chromadb-client

This command installs only the HTTP client library, which is useful for scenarios where the Chroma server runs separately or where a minimal dependency footprint is required. This variant connects to Chroma servers via HTTP and doesn't include embedding provider libraries.

Client Initialization

Initializing the Chroma client depends on the deployment mode and desired configuration. The SDK provides flexible initialization options to accommodate different environments.

Embedded Mode

In embedded mode, Chroma runs entirely within your Python process, storing data locally. This is ideal for development, testing, and small-scale deployments where a separate server isn't required.

import chromadb

client = chromadb.Client()

The embedded client automatically creates a local database directory and manages all data storage internally. Data persists across process restarts, making it suitable for applications that need persistent storage without the complexity of a separate server process.

Client-Server Mode

In client-server mode, your Python application connects to a Chroma server running separately, either locally or on a remote machine. This architecture supports larger-scale deployments and enables sharing data across multiple client applications.

import chromadb

client = chromadb.HttpClient(
    host="localhost",
    port=8000
)

The HTTP client communicates with the server using REST API calls, handling serialization, network transport, and error handling transparently. This mode requires a Chroma server to be running and accessible at the specified host and port.

Configuration Options

The client supports various configuration options to customize its behavior for specific use cases. These options can be provided during client initialization to control aspects like SSL/TLS settings, authentication, and connection pooling.

Option	Type	Default	Description
`host`	string	"localhost"	Server hostname or IP address
`port`	integer	8000	Server port number
`ssl`	boolean	false	Enable SSL/TLS encryption
`headers`	dict	None	Custom HTTP headers for requests
`tenant`	string	None	Tenant identifier for multi-tenant setups
`database`	string	None	Database name for organized data storage

Collection Operations

Collections are the central organizing structure in Chroma, grouping related vectors, documents, and metadata together. The SDK provides a comprehensive API for creating, managing, and interacting with collections.

Creating a Collection

Collections are created using the client's create_collection method, which accepts a name and optional configuration parameters.

collection = client.create_collection(
    name="my-documents",
    metadata={"description": "Document collection for RAG"},
    get_or_create=True
)

The get_or_create parameter is particularly useful in production applications, as it prevents errors if a collection with the same name already exists. When set to True, the method returns the existing collection if one exists or creates a new one if it doesn't.

Adding Data

Data is added to collections using the add method, which accepts vectors, documents, metadata, and unique identifiers. All parameters must be provided as lists of equal length, with each index representing a single record.

collection.add(
    documents=["This is the first document", "This is the second document"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc-1", "doc-2"],
    embeddings=[[1.2, 2.1, 3.5], [1.1, 2.0, 3.4]]
)

The SDK supports automatic embedding generation when embedding functions are configured for the collection. In this case, documents can be provided without explicit embeddings, and the SDK will generate the vector representations automatically.

Querying Data

Querying is performed using the query method, which accepts query text or query vectors and returns the most similar results based on vector similarity.

results = collection.query(
    query_texts=["search terms here"],
    n_results=2,
    where={"source": "notion"},
    include=["documents", "metadatas", "distances"]
)

The where parameter enables filtering results based on metadata conditions, allowing applications to narrow search results to specific subsets of data. The include parameter controls which data components are returned, helping optimize bandwidth and processing for applications that don't need all available information.

Query results include the matched document IDs, the documents themselves, associated metadata, and distance scores indicating how similar each result is to the query. Lower distance scores indicate higher similarity, with zero representing an exact match.

Updating and Deleting Data

The SDK supports updating existing records and deleting unwanted data from collections. These operations are essential for maintaining data accuracy and managing collection lifecycle.

collection.update(
    ids=["doc-1"],
    documents=["Updated document content"],
    metadatas=[{"source": "notion", "updated": True}]
)

collection.delete(
    ids=["doc-2"],
    where={"source": "google-docs"}
)

Update operations modify existing records identified by their IDs, replacing the specified fields while preserving unchanged data. Delete operations remove records matching the provided ID or metadata filters, with the ability to delete multiple records simultaneously.

Querying and Filtering

Chroma provides powerful querying and filtering capabilities that enable precise retrieval of relevant results. Understanding these capabilities is essential for building effective vector search applications.

Vector Similarity Search

The core query operation performs vector similarity search, finding the most similar records to a given query vector or text. The SDK handles text queries by first embedding them using the collection's configured embedding function.

Results are ranked by similarity, with the most similar results appearing first. The n_results parameter controls how many results are returned, allowing applications to balance result completeness with performance considerations.

Metadata Filtering

Metadata filtering narrows search results based on document attributes stored alongside vectors. This is particularly useful for applications that need to search within specific subsets of data, such as documents from a particular source or within a date range.

results = collection.query(
    query_texts=["search terms"],
    where={
        "source": "notion",
        "category": {"$in": ["technical", "documentation"]}
    }
)

The filter syntax supports various operators including equality, inequality, comparison operators for numeric ranges, and set membership tests. Complex filter expressions can be constructed using logical operators to combine multiple conditions.

Result Inclusion

The include parameter controls which data components are included in query results. This allows applications to optimize their queries by requesting only the data they need.

Include Option	Description
`embeddings`	Include the full vector for each result
`documents`	Include the original document text
`metadatas`	Include the associated metadata
`distances`	Include similarity distance scores

By default, only documents and distances are included in results. Applications should specify only the needed components to minimize bandwidth usage and processing overhead.

Embedding Functions

Embedding functions transform text into vector representations that capture semantic meaning. Chroma supports multiple embedding providers, allowing applications to choose the approach that best fits their requirements.

Built-in Embeddings

For simple use cases, Chroma includes a default embedding function that works out of the box without additional configuration. This function is suitable for development and testing but may not provide the best quality embeddings for production applications.

External Providers

For production applications requiring higher quality embeddings, Chroma supports integration with external embedding services. These services provide state-of-the-art embedding models that can significantly improve search quality.

Supported providers include OpenAI's embedding models, which offer excellent quality for English text, and various open-source alternatives. Each provider has its own configuration requirements, typically involving API keys and model selection parameters.

Configuration is typically done at the collection level, allowing different collections to use different embedding functions if needed. This flexibility supports applications that work with multiple data types or require different embedding strategies for different use cases.

Custom Embedding Functions

For specialized use cases, applications can implement custom embedding functions by conforming to the SDK's embedding function interface. This allows integration with any embedding model or service that can be accessed from Python.

Custom functions receive a list of texts and return a corresponding list of vectors. They can implement any logic needed, including batching, caching, and error handling, giving applications full control over the embedding process.

Error Handling

The SDK provides comprehensive error handling to help applications gracefully manage failure scenarios. Understanding the error types and how to handle them is important for building robust applications.

Connection Errors

Connection errors occur when the client cannot establish communication with the Chroma server. These errors can result from network issues, server unavailability, or incorrect server configuration.

try:
    collection = client.get_collection("my-collection")
except chromadb.connection.ChromaConnectionError:
    print("Unable to connect to Chroma server")

Applications should implement appropriate retry logic and user-facing error messages when connection errors occur, as these situations typically require intervention beyond the application's control.

Collection Not Found

Operations on non-existent collections raise specific errors that can be caught and handled appropriately.

try:
    collection = client.get_collection("non-existent")
except chromadb.not_found.NotFound:
    print("Collection does not exist")

The get_or_create parameter available during collection creation provides an alternative to explicit error handling when the existence of a collection is uncertain.

Invalid Arguments

Invalid argument errors indicate problems with the data or parameters provided to SDK methods. These errors typically result from bugs in application code or invalid user input.

Examples include malformed IDs, vectors of incorrect dimensions, mismatched list lengths, and invalid filter expressions. The error messages provide guidance on what parameter is problematic, making debugging straightforward.

Best Practices

Following best practices ensures optimal performance, reliability, and maintainability when using the Python Client SDK in production applications.

Connection Management

Applications should create a single client instance and reuse it across the application rather than creating new clients for each operation. The client manages connection pooling and state internally, and creating multiple instances can lead to resource waste and inconsistent state.

client = chromadb.HttpClient(host="localhost", port=8000)

def get_collection():
    return client.get_collection("my-documents")

For applications that require clean-up, the client should be properly closed when the application terminates, ensuring any pending operations complete and resources are released.

Batch Operations

When adding or querying large numbers of records, batching operations improves performance by reducing network overhead and allowing the server to optimize processing. The SDK handles batching internally for the most common operations, but applications should be aware of batch size considerations.

Error Recovery

Production applications should implement comprehensive error handling that distinguishes between recoverable errors (like temporary network issues) and non-recoverable errors (like invalid input). Recoverable errors can be handled with retry logic, while non-recoverable errors should surface appropriate feedback to users.

For further information on using Chroma's Python Client SDK, the following resources provide additional context and examples.

The official Chroma documentation at trychroma.com provides comprehensive guides on getting started, deployment options, and advanced usage patterns. The documentation includes tutorials, API reference material, and example applications that demonstrate real-world usage.

The GitHub repository at github.com/chroma-core/chroma contains the complete source code for Chroma, including the Python Client SDK. Developers interested in understanding implementation details or contributing to the project can explore the codebase directly.

The Chroma Discord community provides a forum for asking questions, sharing experiences, and connecting with other developers using Chroma. The community is an excellent resource for troubleshooting issues and discovering best practices from experienced users.

Source: https://github.com/chroma-core/chroma / Human Manual

JavaScript/TypeScript Client SDKs

Related topics: Python Client SDK, Getting Started with Chroma

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Package Versions

Continue reading this section for the full explanation and source context.

Section Legacy Client (v2.x)

Continue reading this section for the full explanation and source context.

Section New-JS Client (v3.x)

Continue reading this section for the full explanation and source context.

JavaScript/TypeScript Client SDKs

Chroma provides comprehensive JavaScript and TypeScript client libraries for interacting with Chroma servers from browser and Node.js environments. The SDKs offer both low-level HTTP API access and high-level abstractions for collections, embedding functions, and query operations.

Architecture Overview

Chroma maintains two generations of JavaScript clients to support different use cases and ecosystem requirements.

graph TD
    A[Chroma Server] <--> B[HTTP API];
    B <--> C[Legacy JS Client v2.4.7];
    B <--> D[new-js Client v3.4.5];
    C --> E[chromadb<br/>Bundled];
    C --> F[chromadb-client<br/>Peer Dependencies];
    D --> G[ChromaClient];
    D --> H[Embedding Functions<br/>via @chroma-core/*];

Client Package Versions

Package	Version	Type	Description
`chromadb` (legacy)	2.4.7	npm	Bundled package with all embedding libraries included
`chromadb-client` (legacy)	2.4.7	npm	Client package requiring peer dependencies
`chromadb` (new-js)	3.4.5	npm	Modern client with modular architecture
`@internal/chromadb-core`	2.4.7	workspace	Shared core functionality

Sources: clients/js/packages/chromadb/package.json:3 Sources: clients/new-js/packages/chromadb/package.json:3

Package Structure

Legacy Client (v2.x)

The legacy client provides two distribution options:

graph LR
    A[chromadb] --> B[chromadb-core<br/>+ All Embeddings];
    C[chromadb-client] --> D[chromadb-core<br/>+ Peer Dependencies];
    B --> E[@google/generative-ai];
    B --> F[@xenova/transformers];
    B --> G[cohere-ai];
    D --> E;
    D --> F;
    D --> G;

Package	Use Case	Embedding Libraries
`chromadb`	Simple projects wanting everything included	Bundled with all providers
`chromadb-client`	Projects needing specific embedding libraries	Peer dependencies required

Sources: clients/js/packages/chromadb-client/package.json:1-55

New-JS Client (v3.x)

The new JavaScript client uses a modular workspace architecture with the following structure:

clients/new-js/
├── packages/
│   ├── chromadb/                    # Core client package
│   │   └── src/
│   │       ├── chroma-client.ts     # Main client implementation
│   │       └── api/
│   │           └── sdk.gen.ts       # Generated API client
│   └── ai-embeddings/
│       ├── common/                  # Shared utilities
│       ├── all/                     # Aggregated providers
│       ├── chroma-bm25/             # BM25 sparse embeddings
│       ├── cohere/                  # Cohere provider
│       ├── google-gemini/           # Google Gemini provider
│       ├── huggingface-server/      # HuggingFace server
│       ├── jina/                    # Jina AI provider
│       ├── together-ai/             # Together AI provider
│       └── voyageai/                # Voyage AI provider

Sources: clients/new-js/packages/ai-embeddings/all/package.json:1-45

Module Exports Configuration

Both client generations support modern JavaScript module resolution with ESM and CommonJS exports.

Export Structure

graph TD
    A[Package Entry] --> B{Import Type};
    B -->|ESM import| C[.mjs / .d.ts];
    B -->|CommonJS require| D[.cjs / .d.cts];
    C --> E[dist/*.mjs];
    D --> F[dist/cjs/*.cjs];

Export Condition	Entry Point	Type Definitions
ESM `import`	`dist/chromadb.mjs`	`dist/chromadb.d.ts`
CommonJS `require`	`dist/cjs/chromadb.cjs`	`dist/cjs/chromadb.d.cts`

Sources: clients/js/packages/chromadb/package.json:12-25 Sources: clients/new-js/packages/chromadb/package.json:12-25

Client Initialization

Basic Connection

import { ChromaClient } from "chromadb";

// Initialize the client
const chroma = new ChromaClient({ 
  path: "http://localhost:8000" 
});

Sources: clients/js/packages/chromadb-client/README.md:15-20

With Embedding Function

import { ChromaClient } from 'chromadb';
import { TogetherAIEmbeddingFunction } from '@chroma-core/together-ai';

const embedder = new TogetherAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'togethercomputer/m2-bert-80M-8k-retrieval',
});

const client = new ChromaClient({
  path: 'http://localhost:8000',
});

Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md:1-35

Collection Operations

Collections are the primary data structure for storing and querying embeddings.

Create Collection

const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: embedder,  // Optional
  metadata: {                    // Optional
    description: "My document collection"
  }
});

Add Documents

await collection.add({
  ids: ["id1", "id2"],
  embeddings: [                  // Optional if embedding function provided
    [1.1, 2.3, 3.2],
    [4.5, 6.9, 4.4],
  ],
  metadatas: [{ source: "doc1" }, { source: "doc2" }],
  documents: ["Document 1 content", "Document 2 content"],
});

Query Collection

const results = await collection.query({
  queryEmbeddings: [1.1, 2.3, 3.2],    // Or queryTexts with embedding function
  queryTexts: ["Sample query"],          // Text query (uses embedding function)
  nResults: 2,                           // Number of results
  where: { source: "doc1" },             // Optional metadata filter
  include: ["documents", "metadatas", "distances"]
});

Sources: clients/js/packages/chromadb-client/README.md:25-50

Embedding Function Providers

The new-js client provides first-class support for multiple embedding providers through the @chroma-core/* packages.

Available Providers

Provider Package	Model Examples	API Required
`@chroma-core/together-ai`	`togethercomputer/m2-bert-80M-8k-retrieval`	Yes
`@chroma-core/voyageai`	`voyage-2`	Yes
`@chroma-core/google-gemini`	`text-embedding-004`	Yes
`@chroma-core/jina`	`jina-embeddings-v2-base-en`	Yes
`@chroma-core/cohere`	Various Cohere models	Yes
`@chroma-core/chroma-bm25`	N/A (local algorithm)	No
`@chroma-core/all`	All providers bundled	Varies

Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md Sources: clients/new-js/packages/ai-embeddings/voyageai/README.md

Configuration Options

Each embedding function supports common configuration patterns:

const embedder = new SomeEmbeddingFunction({
  apiKey: 'your-api-key',          // Or set via environment variable
  apiKeyEnvVar: 'PROVIDER_API_KEY', // Default env var name
  modelName: 'provider-model-name', // Provider-specific model
  // Provider-specific options
  task: 'retrieval.passage',       // Jina example
  dimensions: 768,                  // Jina example
  truncate: true,                   // Jina example
  normalized: true,                 // Jina example
});

Environment Variable Configuration

Provider	Environment Variable
Together AI	`TOGETHER_API_KEY`
Voyage AI	`VOYAGE_API_KEY`
Google Gemini	`GEMINI_API_KEY`
Jina	`JINA_API_KEY`

Sources: clients/new-js/packages/ai-embeddings/jina/README.md:1-45

Rust Native Bindings

For performance-critical applications, Chroma provides pre-built Rust native bindings for Node.js.

Supported Platforms

Package Name	OS	Architecture	LibC
`chromadb-js-bindings-darwin-x64`	macOS (Intel)	x64	N/A
`chromadb-js-bindings-darwin-arm64`	macOS (Apple Silicon)	arm64	N/A
`chromadb-js-bindings-linux-x64-gnu`	Linux	x64	glibc
`chromadb-js-bindings-linux-arm64-gnu`	Linux	arm64	glibc

All bindings versions: 1.3.4 Minimum Node.js version: >= 10

Sources: rust/js_bindings/npm/darwin-x64/package.json:1-18 Sources: rust/js_bindings/npm/linux-x64-gnu/package.json:1-18

Build and Development

Build Scripts

Command	Description
`pnpm build`	Build all packages
`pnpm build:core`	Build only `@internal/chromadb-core`
`pnpm build:packages`	Build all packages except core
`pnpm watch`	Watch mode for development
`pnpm test`	Run all tests
`pnpm test:functional`	Run functional tests (excluding auth)

New-JS Client Build Configuration

{
  "scripts": {
    "build": "tsup",
    "watch": "tsup --watch",
    "typecheck": "tsc --noEmit"
  }
}

Build tooling uses tsup for efficient bundling with TypeScript support.

Sources: clients/new-js/packages/ai-embeddings/common/package.json:18-25 Sources: clients/js/package.json:22-30

Choosing a Client Package

graph TD
    A[Start] --> B{Do you need all embedding providers?};
    B -->|Yes, convenience| C[chromadb v2.4.7<br/>or @chroma-core/all + chromadb v3.4.5];
    B -->|No, want to minimize bundle| D{Do you have embedding requirements?};
    D -->|Yes, specific providers| E[chromadb-client v2.4.7<br/>with peer dependencies];
    D -->|No, just vector storage| F[chromadb-client v2.4.7<br/>or chromadb v3.4.5];
    C --> G[Include all embedding libraries];
    E --> H[Only install needed providers];
    F --> I[No embedding function needed];

Decision Matrix

Requirement	Recommended Package
Simple setup, all features	`chromadb` (bundled)
Minimal bundle size	`chromadb-client` with peer deps
Modern architecture	`chromadb` (new-js v3.4.5)
BM25 sparse embeddings	`@chroma-core/chroma-bm25`
Cloud/Remote providers	`@chroma-core/*` packages

Sources: clients/js/examples/node/README.md:1-45

TypeScript Support

All JavaScript client packages include full TypeScript type definitions:

{
  "types": "dist/chromadb.d.ts",
  "exports": {
    ".": {
      "import": {
        "types": "./dist/chromadb.d.ts"
      },
      "require": {
        "types": "./dist/cjs/chromadb.d.cts"
      }
    }
  }
}

The TypeScript minimum version requirement is ^5.0.4 for the legacy client and ^5.3.3 for new-js packages.

Sources: clients/js/packages/chromadb/package.json:8 Sources: clients/new-js/packages/ai-embeddings/common/package.json:30

Dependencies

Core Dependencies

Package	Version	Purpose
`isomorphic-fetch`	^3.0.0	HTTP client for browser/Node.js
`ajv`	^8.12.0 / ^8.17.1	JSON schema validation
`cliui`	^8.0.1	CLI utilities

Node.js Compatibility

Package Generation	Minimum Node.js
Legacy (v2.x)	>= 14.17.0
New-JS (v3.x)	>= 20
Rust Bindings	>= 10

Sources: clients/js/packages/chromadb-client/package.json:50-55 Sources: clients/new-js/packages/ai-embeddings/common/package.json:35-38

Sources: [clients/js/packages/chromadb/package.json:3](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)

Rust Backend Services Architecture

Related topics: System Architecture Overview, Data Storage & Blockstore

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Design Goals

Continue reading this section for the full explanation and source context.

Section Core Service Components

Continue reading this section for the full explanation and source context.

Section Arrow-Based Storage

Continue reading this section for the full explanation and source context.

Rust Backend Services Architecture

Overview

The Chroma Rust backend provides a high-performance, scalable vector database service built entirely in Rust. The architecture follows a distributed systems design with multiple specialized services working together to handle embedding storage, indexing, and similarity search operations.

Design Goals

Goal	Description
High Performance	Arrow-based columnar storage for efficient data access
Scalability	Multi-cloud, multi-region deployment support
Reliability	Comprehensive error handling with typed error codes
Flexibility	Multiple index types (HNSW, Spann, Inverted)
Consistency	Ordered and unordered mutation ordering options

Core Service Components

graph TD
    subgraph "Rust Backend Services"
        W[Worker Service]
        BS[Blockstore Service]
        SYS[Sysdb Service]
        LOG[Log Service]
    end
    
    W --> BS
    W --> SYS
    W --> LOG

Blockstore Architecture

The blockstore is the core storage layer in Chroma's Rust backend, providing persistent storage for vector embeddings and associated metadata using Arrow columnar format.

Arrow-Based Storage

Chroma uses Apache Arrow as its primary storage format, which provides:

Columnar Layout: Efficient analytic queries by column
Zero-Copy Reads: Memory-mapped access patterns
Cross-Language Interop: Standardized binary format
Compression Support: Built-in encoding/decoding

Sources: rust/blockstore/src/arrow/root.rs:1-40

Blockfile Structure

graph TD
    subgraph "Blockfile Components"
        BF[Blockfile]
        BR[Block Reader]
        BW[Block Writer]
        RM[Root Manager]
        BM[Block Manager]
    end
    
    BF --> BR
    BF --> BW
    BW --> RM
    BR --> BM

#### Root Management

The Root component manages the root directory structure and file operations:

pub(super) fn get_all_block_ids_from_bytes(
    bytes: &[u8],
    id: Uuid,
) -> Result<Vec<Uuid>, FromBytesError>

Key responsibilities:

Reading Arrow IPC files
Extracting block metadata and IDs
Version validation and verification

Sources: rust/blockstore/src/arrow/root.rs:28-50

#### Block Layout Verification

The block layout verification ensures data integrity:

#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}

Sources: rust/blockstore/src/arrow/block/types.rs:1-30

Error Type	Error Code	Severity
`BufferLengthNotAligned`	Internal	High
`NoRecordBatches`	Internal	High
`MultipleRecordBatches`	Internal	Medium
`InvalidMessageType`	Internal	High
`RecordBatchDecodeError`	Internal	High

Blockfile Writer Types

Chroma supports two mutation ordering strategies:

Ordering Type	Description	Use Case
`Ordered`	Sequential writes with guaranteed order	Consistent state
`Unordered`	Parallel writes for throughput	High-volume ingestion

Sources: rust/blockstore/src/arrow/provider.rs:1-50

match options.mutation_ordering {
    BlockfileWriterMutationOrdering::Ordered => {
        let file = ArrowOrderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowOrderedBlockfileWriter(file))
    }
    BlockfileWriterMutationOrdering::Unordered => {
        let file = ArrowUnorderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowUnorderedBlockfileWriter(file))
    }
}

Forking and Versioning

Blockfiles support forking for snapshot isolation:

let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await

Sources: rust/blockstore/src/arrow/provider.rs:1-30

Type System

Query Result Types

The execution layer uses a rich type system for search results:

#[derive(Clone, Debug, Default)]
pub struct SearchPayloadResult {
    pub records: Vec<SearchRecord>,
}

Sources: rust/types/src/execution/operator.rs:1-20

#### Search Results Structure

graph LR
    SR[SearchResult] --> SPR[SearchPayloadResult]
    SPR --> SR_vec[Vec<SearchRecord>]
    SR --> PLB[pulled_log_bytes]

Field	Type	Description
`results`	`Vec<SearchPayloadResult>`	Per-query search results
`pulled_log_bytes`	`u64`	Total log bytes fetched for metrics

Include Enum

The Include enum controls which fields are returned in query results:

pub enum Include {
    #[serde(rename = "distances")]
    Distance,
    #[serde(rename = "documents")]
    Document,
    #[serde(rename = "embeddings")]
    Embedding,
    #[serde(rename = "metadatas")]
    Metadata,
    #[serde(rename = "uris")]
    Uri,
}

Sources: rust/types/src/api_types.rs:1-30

Include Value	Returned Field	Default Query
`distances`	Distance scores	✓
`documents`	Text content	✓
`embeddings`	Vector data	✗
`metadatas`	Metadata objects	✓
`uris`	Resource URIs	✗

#### IncludeList Helper Methods

impl IncludeList {
    pub fn empty() -> Self { Self(Vec::new()) }
    
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    
    pub fn default_get() -> Self {
        Self(vec![Include::Document, Include::Metadata])
    }
    
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, 
                  Include::Embedding, Include::Uri])
    }
}

Sources: rust/types/src/api_types.rs:1-60

Key Filter System

The Key enum represents filterable fields in metadata queries:

pub enum Key {
    Document,
    Embedding,
    Metadata,
    Score,
    MetadataField(String),
}

Sources: rust/types/src/operator.rs:1-30

Key	Purpose	Example
`#document`	Document content	`Key::Document`
`#embedding`	Vector data	`Key::Embedding`
`#metadata`	All metadata	`Key::Metadata`
`#score`	Similarity score	`Key::Score`
`field_name`	Custom metadata	`Key::MetadataField("status")`

#### Key Factory Methods

impl Key {
    /// Creates a Key for a custom metadata field
    pub fn field(name: impl Into<String>) -> Self {
        Key::MetadataField(name.into())
    }
    
    /// Creates an equality filter: `field == value`
    pub fn eq(self, value: impl Into<MetadataValue>) -> ComparisonValue { ... }
}

Index Architecture

Spann Index

Spann is Chroma's sparse vector index implementation combining HNSW with posting lists:

#[derive(Clone, Debug)]
pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
    pub adaptive_search_nprobe: bool,
    pub params: InternalSpannConfiguration,
}

Sources: rust/index/src/spann/types.rs:1-30

#### Spann Index Structure

graph TD
    subgraph "Spann Index"
        SPI[SpannIndexReader]
        HNSW[HNSW Index]
        PL[Posting Lists]
        VM[Versions Map]
    end
    
    SPI --> HNSW
    SPI --> PL
    SPI --> VM

Component	Type	Purpose
`hnsw_index`	`HnswIndexRef`	Approximate nearest neighbor search
`posting_lists`	`BlockfileReader<u32, SpannPostingList>`	Document postings
`versions_map`	`BlockfileReader<u32, u32>`	Document versioning
`adaptive_search_nprobe`	`bool`	Adaptive parameter tuning

Sparse Posting Block

The sparse posting block implements an inverted index structure:

#[derive(Debug, Clone)]
pub struct DirectoryBlock(SparsePostingBlock);

impl DirectoryBlock {
    pub fn new(max_offsets: &[u32], max_weights: &[f32]) 
        -> Result<Self, SparsePostingBlockError>
}

Sources: rust/types/src/sparse_posting_block.rs:1-40

Field	Type	Description
`max_offset`	`u32`	Largest doc offset in posting block
`max_weight`	`f32`	Maximum weight for term pruning

Schema and Index Configuration

Collection Schema

The schema system supports multiple index types:

pub struct Schema {
    pub fn create_index(
        mut self,
        key: Option<&str>,
        config: IndexConfig,
    ) -> Result<Self, SchemaBuilderError>
}

Sources: rust/types/src/collection_schema.rs:1-50

Index Type	Key	Description
`VectorIndexConfig`	`None`	Global vector index (HNSW/Spann)
`StringInvertedIndexConfig`	`Some(field)`	Field-specific FTS
`SparseVectorIndexConfig`	`Some(field)`	Sparse vector index

Index Configuration

pub struct VectorIndexConfig {
    pub space: Option<Space>,
    pub embedding_function: Option<EmbeddingFunctionId>,
    pub source_key: Option<Key>,
    pub hnsw: Option<HnswConfig>,
    pub spann: Option<SpannConfig>,
}

Parameter	Type	Default	Description
`space`	`Option<Space>`	`None`	Vector space (Cosine, L2, etc.)
`embedding_function`	`Option<EFId>`	`None`	Embedding function ID
`hnsw`	`Option<HnswConfig>`	`None`	HNSW parameters
`spann`	`Option<SpannConfig>`	`None`	Spann parameters

Worker Service Architecture

Work Queue Client

The work queue client manages distributed task execution:

pub enum WorkQueueClientError {
    ConnectionError(#[from] tonic::Status),
    RequestError(#[from] tonic::Status),
}

Sources: rust/worker/src/work_queue/work_queue_client.rs:1-20

#### Error Code Mapping

gRPC Code	Chroma Error Code
`Unavailable`	`Unavailable`
`DeadlineExceeded`	`DeadlineExceeded`
`ResourceExhausted`	`ResourceExhausted`
`InvalidArgument`	`InvalidArgument`
`NotFound`	`NotFound`
`PermissionDenied`	`PermissionDenied`

Apply Logs Orchestrator

The apply logs orchestrator handles log-based data synchronization:

#[derive(Debug)]
pub struct ApplyLogsOrchestratorResponse {
    pub job_id: JobId,
    pub total_records_post_compaction: u64,
    pub flush_results: Vec<SegmentFlushInfo>,
    pub collection_logical_size_bytes: u64,
}

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-50

KNN Filter Architecture

The KNN filter orchestrates vector similarity search:

graph TD
    subgraph "KNN Query Pipeline"
        Q[Query Request]
        F[Filter Logs]
        K[KNN Search]
        R[Results]
    end
    
    Q --> F
    F --> K
    K --> R

#### KNN Error Handling

pub enum KnnError {
    QuantizedSpannCenterSearch(QuantizedSpannError),
    QuantizedSpannLoadCenter(QuantizedSpannError),
    InvalidDistanceFunction,
    Aborted,
    InvalidSchema(#[from] SchemaError),
}

Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1-40

Error Type	Error Code	Traced
`QuantizedSpannCenterSearch`	From inner	✓
`InvalidDistanceFunction`	`InvalidArgument`	✗
`Aborted`	`ResourceExhausted`	✗
`Result(_)`	`Internal`	✓

KNN Filter Output

#[derive(Clone, Debug)]
pub struct KnnFilterOutput {
    pub logs: FetchLogOutput,
    pub fetch_log_bytes: u64,
    pub filter_output: FilterOutput,
    pub dimension: usize,
    pub distance_function: DistanceFunction,
}

Multi-Cloud Topology

Chroma supports multi-cloud and multi-region deployments:

pub struct ProviderRegion<T: Clone + Debug> {
    pub name: RegionName,
    pub provider: String,
    pub region: String,
    pub config: T,
}

Sources: rust/types/src/topology.rs:1-30

Topology Structure

graph TD
    subgraph "Multi-Cloud Topology"
        Config[Configuration]
        Topologies[Vec<Topology>]
        Regions[Vec<ProviderRegion>]
        Preferred[Preferred Region]
    end
    
    Config --> Topologies
    Config --> Regions
    Config --> Preferred

Configuration Schema

struct RawMultiCloudMultiRegionConfiguration<R, T> {
    preferred: RegionName,
    regions: Vec<ProviderRegion<R>>,
    topologies: Vec<Topology<T>>,
}

Field	Type	Description
`preferred`	`RegionName`	Default region for operations
`regions`	`Vec<ProviderRegion>`	Available cloud regions
`topologies`	`Vec<Topology>`	Topology configurations

Error Handling Framework

Chroma Error Traits

All errors implement the ChromaError trait:

pub trait ChromaError: std::error::Error {
    fn code(&self) -> ErrorCodes;
    fn should_trace_error(&self) -> bool;
}

Error Code Registry

Code	Category	Description
`InvalidArgument`	Client	Malformed request
`NotFound`	Client	Resource missing
`AlreadyExists`	Client	Duplicate resource
`PermissionDenied`	Security	Access denied
`ResourceExhausted`	Rate	Quota exceeded
`Internal`	Server	System error

CLI Integration

The Rust CLI provides management commands:

pub enum Command {
    Browse(BrowseArgs),
    Copy(CopyArgs),
    Db(DbSubcommand),
    Docs,
    Install(InstallArgs),
    Login(LoginArgs),
    Profile(ProfileSubcommand),
    Run(RunArgs),
    Support,
    Update,
    Vacuum(VacuumArgs),
}

Sources: rust/cli/src/lib.rs:1-30

Available Commands

Command	Description
`browse`	Open web interface
`copy`	Copy data between collections
`db`	Database operations
`docs`	Open documentation
`install`	Install Chroma
`login`	Authenticate user
`profile`	Performance profiling
`run`	Start Chroma server
`support`	Open support resources
`update`	Update installation
`vacuum`	Compact storage

Go Coordinator & Distributed Systems

Related topics: System Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section BlockfileProvider

Continue reading this section for the full explanation and source context.

Section BlockfileReader

Continue reading this section for the full explanation and source context.

Section BlockfileWriter

Continue reading this section for the full explanation and source context.

Related topics: System Architecture Overview

I cannot generate this wiki page as specified. The `

Data Storage & Blockstore

Overview

The Chroma blockstore is the core storage subsystem responsible for persisting vector embeddings, metadata, and related data structures. It provides a unified abstraction layer over different storage backends (in-memory and Arrow-based) while maintaining performance characteristics suitable for high-throughput vector database operations.

The blockstore system is architected around the concept of blockfiles — persistent, columnar storage structures that organize data by prefix-based partitioning and support efficient key-value operations.

Architecture

graph TD
    subgraph "Public API Layer"
        BP[BlockfileProvider]
        BR[BlockfileReader]
        BW[BlockfileWriter]
        BF[BlockfileFlusher]
    end

    subgraph "Implementation Layer"
        ABP[ArrowBlockfileProvider]
        MBP[MemoryBlockfileProvider]
        ABF[ArrowUnorderedBlockfileWriter]
        ABO[ArrowOrderedBlockfileWriter]
    end

    subgraph "Storage Layer"
        BM[BlockManager]
        RM[RootManager]
        ST[Storage]
    end

    subgraph "Arrow Format"
        R[Root]
        SB[Sparse Index]
        B[Blocks]
    end

    BP --> ABP
    BP --> MBP
    BR --> ABP
    BR --> MBP
    BW --> ABF
    BW --> ABO

    ABP --> BM
    ABP --> RM
    ABF --> BM
    ABF --> RM
    ABO --> BM
    ABO --> RM
    BM --> ST
    RM --> ST

    RM --> R
    R --> SB
    R --> B

Core Components

BlockfileProvider

The BlockfileProvider is the main entry point for creating readers and writers. It abstracts the underlying storage implementation and provides factory methods for blockfile operations.

Variants:

Provider Type	Description	Use Case
`HashMapBlockfileProvider`	In-memory blockfile storage	Testing, ephemeral data
`ArrowBlockfileProvider`	Persistent Arrow-based storage	Production workloads

API Methods:

pub fn storage(&self) -> Option<Arc<Storage>> {
    match self {
        BlockfileProvider::ArrowBlockfileProvider(provider) => Some(provider.storage().clone()),
        BlockfileProvider::HashMapBlockfileProvider(_) => None,
    }
}

pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}

Sources: rust/blockstore/src/provider.rs:1-30

BlockfileReader

The BlockfileReader trait provides read access to stored data. It supports generic key and value types that implement the ReadKey and ReadValue traits.

Trait Definition:

pub trait ReadKey<'a>:
    Key
    + Into<KeyWrapper>
    + TryFrom<&'a KeyWrapper, Error = InvalidKeyConversion>
    + ArrowReadableKey<'a>
    + Sync
    + 'a
{}

pub trait ReadValue<'a>: Value + Readable<'a> + ArrowReadableValue<'a> + Sync + 'a {}

Sources: rust/blockstore/src/provider.rs:40-55

BlockfileWriter

The BlockfileWriter trait provides write access to blockfiles with support for ordered and unordered mutation patterns.

Core Operations:

Method	Signature	Description
`set`	`set(prefix, key, value)`	Insert or update a key-value pair
`delete`	`delete(prefix, key)`	Remove a key-value pair
`commit`	`commit()`	Finalize and persist the writer

pub async fn set<
    K: Key + Into<KeyWrapper> + ArrowWriteableKey,
    V: Value + Writeable + ArrowWriteableValue,
>(
    &self,
    prefix: &str,
    key: K,
    value: V,
) -> Result<(), Box<dyn ChromaError>>

Sources: rust/blockstore/src/types/writer.rs:50-75

Arrow Blockfile Implementation

The Arrow-based blockfile is the primary production storage implementation, providing efficient columnar storage with Arrow IPC format.

Blockfile Structure

graph TD
    R[Root File<br/>Root Writer] --> SB[Sparse Index<br/>Block Key Mapping]
    R --> BH[Block Header<br/>Metadata]
    
    SB --> B1[Block 1<br/>Arrow IPC]
    SB --> B2[Block 2<br/>Arrow IPC]
    SB --> BN[Block N<br/>Arrow IPC]
    
    B1 --> P1[Prefix: "vec_1"]
    B1 --> P2[Prefix: "vec_2"]

ArrowBlockfileProvider

The ArrowBlockfileProvider manages the lifecycle of blockfiles using Arrow IPC format with a root-sparse index architecture.

Key Features:

Fork Support: Create new blockfiles from existing ones via forking
CMEK Support: Optional Customer-Managed Encryption Keys
Block Size Management: Configurable maximum block sizes

pub async fn write<K: Key + ArrowWriteableKey, V: ArrowWriteableValue>(
    &self,
    options: BlockfileWriterOptions,
) -> Result<BlockfileWriter, Box<CreateError>>

Sources: rust/blockstore/src/arrow/provider.rs:1-50

Writer Types

#### ArrowUnorderedBlockfileWriter

Provides high-performance unordered writes optimized for bulk insertion scenarios.

impl ArrowUnorderedBlockfileWriter {
    pub(super) fn new<K: ArrowWriteableKey, V: ArrowWriteableValue>(
        id: Uuid,
        prefix_path: &str,
        block_manager: BlockManager,
        root_manager: RootManager,
        max_block_size_bytes: usize,
        cmek: Option<Cmek>,
    ) -> Self
}

Sources: rust/blockstore/src/arrow/blockfile.rs:50-80

#### ArrowOrderedBlockfileWriter

Maintains key ordering within blocks, optimized for range queries and ordered iteration.

Sources: rust/blockstore/src/arrow/ordered_blockfile_writer.rs:1-50

BlockManager and RootManager

Component	Responsibility
`BlockManager`	Manages individual data blocks, handles block creation and commitment
`RootManager`	Manages root files containing sparse indices and metadata

// Forking a new root from an existing one
let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await

Sources: rust/blockstore/src/arrow/provider.rs:45-70

Error Handling

Error Types

Error Type	Description	Error Code
`BlockNotFound`	Requested block does not exist	Internal
`BlockFetchError`	Failed to retrieve block from storage	Internal
`MigrationError`	Blockfile migration failed	Internal
`IOError`	Storage I/O operation failed	Internal
`ArrowError`	Arrow IPC parsing/encoding error	Internal
`NoRecordBatches`	Invalid Arrow file structure	Internal

#[derive(Error, Debug)]
pub enum ArrowBlockfileError {
    #[error("Block not found")]
    BlockNotFound,
    #[error("Could not fetch block")]
    BlockFetchError(#[from] GetError),
    #[error("Could not migrate blockfile to new version")]
    MigrationError(#[from] MigrationError),
}

Sources: rust/blockstore/src/arrow/blockfile.rs:25-40

Layout Verification

The system validates Arrow file layouts to ensure data integrity:

#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}

Sources: rust/blockstore/src/arrow/block/types.rs:40-60

Storage Operations

Write Flow

sequenceDiagram
    participant Client
    participant Provider as BlockfileProvider
    participant Writer as BlockfileWriter
    participant BM as BlockManager
    participant RM as RootManager
    participant Storage

    Client->>Provider: write(options)
    Provider->>Writer: create_writer()
    Provider->>RM: create/fork_root()
    Client->>Writer: set(prefix, key, value)
    Writer->>BM: create_block()
    loop Until flush
        Writer->>Writer: accumulate_data()
    end
    Client->>Writer: commit()
    Writer->>BM: commit_block()
    Writer->>RM: update_root()
    RM->>Storage: persist()
    BM->>Storage: persist()

Read Flow

sequenceDiagram
    participant Client
    participant Reader as BlockfileReader
    participant RM as RootManager
    participant BM as BlockManager
    participant Storage

    Client->>Reader: get(prefix, key)
    Reader->>RM: get_block_ids()
    RM->>Reader: block_id_list
    loop For each block
        Reader->>BM: get_block(id)
        BM->>Storage: read()
        Storage->>Reader: block_data
    end
    Reader->>Reader: search_blocks()
    Reader->>Client: value

Configuration Options

BlockfileWriterOptions

Option	Type	Default	Description
`prefix_path`	`String`	Required	Path prefix for storage
`max_block_size_bytes`	`usize`	Provider default	Maximum size per block
`mutation_ordering`	`BlockfileWriterMutationOrdering`	`Ordered`	Write ordering mode
`fork_from`	`Option<Uuid>`	`None`	Source blockfile ID for forking
`cmek`	`Option<Cmek>`	`None`	Customer-managed encryption key

let mut bf_options = BlockfileWriterOptions::new(prefix_path.to_string())
    .max_block_size_bytes(pl_block_size);
bf_options = bf_options.unordered_mutations();
if let Some(cmek) = cmek {
    bf_options = bf_options.with_cmek(cmek);
}

Sources: rust/blockstore/src/arrow/provider.rs:90-110

Memory Blockfile

For testing and ephemeral use cases, Chroma provides an in-memory blockfile implementation:

pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}

Limitations:

No persistence
No fork support
Limited to unordered mutations

if options.fork_from.is_some() {
    unimplemented!();
}

Sources: rust/blockstore/src/memory/provider.rs:40-55

Block Reading

RootReader

The RootReader is responsible for reading block metadata and identifying which blocks contain specific data:

impl RootReader {
    pub(super) fn get_all_block_ids_from_bytes(
        bytes: &[u8],
        id: Uuid,
    ) -> Result<Vec<Uuid>, FromBytesError> {
        let mut cursor = std::io::Cursor::new(bytes);
        let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
        
        let record_batch = match arrow_reader {
            Ok(mut reader) => match reader.next() {
                Some(Ok(batch)) => batch,
                Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
                None => return Err(FromBytesError::NoDataError),
            },
            Err(e) => return Err(FromBytesError::ArrowError(e)),
        };
        
        let (version, read_id) = Self::version_and_id_from_record_batch(&record_batch, id)?;
        if read_id != id {
            return Err(FromBytesError::IdMismatch);
        }
        
        Self::block_ids_from_record_batch(&record_batch, version)
    }
}

Sources: rust/blockstore/src/arrow/root.rs:20-55

SpannIndex Integration

The blockstore is used by the Spann (Sparse + ANN) index for storing posting lists:

Component	Purpose
`SpannIndexReader`	Reads posting lists and HNSW indices
`SpannIndexWriter`	Creates and manages posting list writers
`SpannPostingList`	Stores document IDs and embeddings

pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
}

Sources: rust/index/src/spann/types.rs:30-45

Summary

The Chroma blockstore provides a robust, extensible storage layer built on Arrow IPC format. Key architectural decisions include:

Separation of concerns: BlockManager handles data blocks while RootManager manages metadata and sparse indices
Dual writer support: Ordered and unordered writers for different access patterns
Forking capability: Efficient creation of derived blockfiles without full copies
Error classification: Clear mapping from internal errors to error codes for API responses
Type-safe abstractions: Generic key-value traits enabling flexible data modeling

Sources: [rust/blockstore/src/provider.rs:1-30](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)

Embedding Functions Integration

Related topics: Python Client SDK, Data Storage & Blockstore

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Architecture

Continue reading this section for the full explanation and source context.

Section Embedding Function Package Structure

Continue reading this section for the full explanation and source context.

Section Common Utilities Package

Continue reading this section for the full explanation and source context.

Embedding Functions Integration

Overview

Embedding Functions in Chroma provide a standardized interface for converting text into vector embeddings. Chroma supports multiple embedding providers through a plugin architecture that allows developers to use custom embedding functions or leverage hosted services like OpenAI, Cohere, Ollama, and others.

The embedding function system serves as the bridge between raw text data and the vector representation used for similarity search. Each embedding function implements a consistent interface that handles API communication, request formatting, and response parsing for its respective provider.

Sources: clients/new-js/packages/ai-embeddings/common/README.md

Architecture

High-Level Architecture

graph TD
    A[Client Application] --> B[Chroma Collection]
    B --> C[Embedding Function]
    C --> D[Embedding Provider API]
    D --> E[Vector Embeddings]
    E --> B
    
    F[@chroma-core/openai] --> C
    G[@chroma-core/ollama] --> C
    H[@chroma-core/cohere] --> C
    I[@chroma-core/morph] --> C
    J[@chroma-core/all] --> C

Embedding Function Package Structure

Chroma organizes embedding functions into separate packages under the @chroma-core namespace. Each package focuses on a specific provider while sharing common utilities.

Package	Provider	Environment Support
`@chroma-core/ai-embeddings-common`	Shared utilities	Node.js + Browser
`@chroma-core/openai`	OpenAI	Node.js + Browser
`@chroma-core/ollama`	Ollama (local)	Node.js + Browser
`@chroma-core/cohere`	Cohere	Node.js + Browser
`@chroma-core/jina`	Jina AI	Node.js + Browser
`@chroma-core/morph`	Morph	Node.js
`@chroma-core/all`	All providers	Node.js + Browser

Sources: clients/new-js/packages/ai-embeddings/all/README.md

Core Components

Common Utilities Package

The @chroma-core/ai-embeddings-common package provides shared functionality used by all embedding function implementations:

import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';

Key Features:

Feature	Purpose
`validateConfigSchema`	Validates embedding function configurations using JSON schemas
`snakeCase`	Converts camelCase JavaScript objects to snake_case for API compatibility
`isBrowser`	Detects browser vs Node.js runtime environment

Sources: clients/new-js/packages/ai-embeddings/common/README.md

Dynamic Loading Mechanism

The embedding function system supports dynamic loading of packages based on configuration:

const fullPackageName = `@chroma-core/${packageName}`;
await import(fullPackageName);
embeddingFunction = knownEmbeddingFunctions.get(packageName);

The system maintains mappings for known embedding function names and handles package resolution automatically when a collection is configured with a specific embedding provider.

Sources: clients/new-js/packages/chromadb/src/embedding-function.ts

Configuration Schema

Embedding functions support structured configuration with schema validation. Configuration options vary by provider but typically include:

Parameter	Description	Provider Support
`apiKey`	API key for authentication	OpenAI, Cohere, Jina, Gemini
`modelName`	Specific model identifier	All providers
`apiBase`	Custom API endpoint URL	Ollama, Morph, Gemini
`encodingFormat`	Output format (float/base64)	OpenAI, Morph

Sources: clients/new-js/packages/ai-embeddings/morph/README.md

Provider Implementations

OpenAI Embeddings

The OpenAI embedding function supports the OpenAI API for generating text embeddings:

import { OpenAIEmbeddingFunction } from '@chroma-core/openai';

const openAIEF = new OpenAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'text-embedding-3-small'
});

Ollama (Local Embeddings)

Ollama enables local embedding generation without external API calls:

# Install Ollama from ollama.ai
# Start the server
ollama serve
# Pull an embedding model
ollama pull chroma/all-minilm-l6-v2-f32

Supported Models:

Model	Dimensions
`chroma/all-minilm-l6-v2-f32` (default)	384
`nomic-embed-text`	768
`mxbai-embed-large`	1024
`snowflake-arctic-embed`	Variable

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md

Morph Embeddings

Morph provides embeddings optimized for code-related content:

const morphEmbedding = new MorphEmbeddingFunction({
  api_key: 'your-morph-api-key',
  model_name: 'morph-embedding-v2',
  api_base: 'https://api.morphllm.com/v1',
  encoding_format: 'float'
});

Sources: clients/new-js/packages/ai-embeddings/morph/README.md

Chroma Cloud Qwen

Hosted embedding service using Qwen models:

const qwenEmbedding = new QwenEmbeddingFunction({
  model: 'Qwen/Qwen3-Embedding-0.6B',
  task: 'document' // or 'query'
});

Configuration includes:

model: The Qwen model to use
task: Task type (document or query embedding)
instruction_dict: Custom instructions for specific tasks
apiKeyEnvVar: Environment variable for API key (default: CHROMA_API_KEY)

Sources: clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md

Collection Integration

Embedding Function in Collections

When creating a collection, the embedding function can be specified at multiple levels:

const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: openAIEF  // Specify embedding function
});

Space Configuration

Embedding functions can define supported distance spaces and default configurations:

if (overallEf && overallEf.defaultSpace && overallEf.supportedSpaces) {
  if (configuration?.hnsw === undefined && configuration?.spann === undefined) {
    configuration.hnsw = { space: overallEf.defaultSpace() };
  }
}

The system validates that configured spaces are supported by the embedding function and warns if mismatches occur:

Space 'cosine' is not supported by embedding function 'openai'. 
Supported spaces: cosine, euclidean, dotproduct

Sources: clients/new-js/packages/chromadb/src/collection-configuration.ts

Query Response Structure

Include Parameter

Queries support specifying which data to include in results through the Include parameter:

pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}

Default Inclusion Behavior:

Operation	Default Includes
Query	Document, Metadata, Distance
Get	Document, Metadata

Include List Methods:

Method	Returns
`IncludeList::empty()`	No includes
`IncludeList::default_query()`	Document, Metadata, Distance
`IncludeList::default_get()`	Document, Metadata
`IncludeList::all()`	All five include types

Sources: rust/types/src/api_types.rs

Usage Patterns

Basic Usage with JavaScript Client

import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";

const chroma = new ChromaClient();
const embeddingFunction = new OpenAIEmbeddingFunction({
  apiKey: process.env.OPENAI_API_KEY
});

const collection = await chroma.createCollection({
  name: "documents",
  embeddingFunction: embeddingFunction
});

await collection.add({
  ids: ["doc-1", "doc-2"],
  documents: ["Document content here", "Another document"],
  metadatas: [{ source: "notion" }, { source: "google-docs" }]
});

const results = await collection.query({
  queryTexts: ["Search query"],
  nResults: 2
});

Python Client Usage

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.create_collection("documents")

collection.add(
    documents=["Document 1", "Document 2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)

results = collection.query(
    query_texts=["Query document"],
    n_results=2
)

Sources: clients/new-js/packages/chromadb/README.md

Environment Detection

Embedding functions automatically detect the runtime environment to select the appropriate HTTP client:

import { isBrowser } from '@chroma-core/ai-embeddings-common';

if (isBrowser()) {
  // Use browser-compatible fetch
} else {
  // Use Node.js HTTP client
}

This enables packages like Ollama to work seamlessly in both browser and Node.js environments:

This package works in both Node.js and browser environments, automatically detecting the runtime and using the appropriate Ollama client.

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md

Type Safety

The embedding function system provides TypeScript types and interfaces for:

Configuration validation
Response parsing
Error handling
Provider-specific options

export const getSparseEmbeddingFunction = async (
  client: ChromaClient,
  efConfig?: EmbeddingFunctionConfiguration
) => {
  // Returns SparseEmbeddingFunction instance or undefined
};

Sources: clients/new-js/packages/chromadb/src/embedding-function.ts

Summary

Embedding Functions Integration in Chroma provides a unified, extensible system for text vectorization. Key aspects include:

Provider Abstraction: Standardized interface across multiple embedding providers
Dynamic Loading: Packages loaded on-demand based on collection configuration
Schema Validation: JSON schema-based configuration validation
Cross-Platform: Support for both Node.js and browser environments
Flexible Configuration: Provider-specific options with sensible defaults
Space Support: Distance metric configuration aligned with embedding provider capabilities

The plugin architecture allows Chroma to integrate new embedding providers while maintaining API consistency across the SDK.

Sources: [clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: The project should not be treated as fully validated until this signal is reviewed.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: capability.assumptions | github_repo:546206616 | https://github.com/chroma-core/chroma | README/documentation is current enough for a first validation pass.

2. Maintenance risk: Maintainer activity is unknown

Severity: medium
Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | last_activity_observed missing

3. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: downstream_validation.risk_items | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

4. Security or permission risk: no_demo

Severity: medium
Finding: no_demo
User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: risks.scoring_risks | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

5. Maintenance risk: issue_or_pr_quality=unknown

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | issue_or_pr_quality=unknown

6. Maintenance risk: release_recency=unknown

Severity: low
Finding: release_recency=unknown。
User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using chroma with real data or production workflows.

[[Bug]: metadata filter does not work over 20 millions chunk.](https://github.com/chroma-core/chroma/issues/4089) - github / github_issue
[[Bug]: PersistentClient second-opener hangs ~16 minutes on shared persis](https://github.com/chroma-core/chroma/issues/7040) - github / github_issue
[[Security] Unsafe pickle.load() in PersistentLocalHnswSegment enables ar](https://github.com/chroma-core/chroma/issues/6926) - github / github_issue
query(where=...) raises 'Error finding id' after batched adds until WAL - github / github_issue
1.5.9 - github / github_release
foundation-cli-v0.1.0-alpha.3 - github / github_release
1.5.8 - github / github_release
1.5.7 - github / github_release
1.5.6 - github / github_release
1.5.5 - github / github_release
README/documentation is current enough for a first validation pass. - GitHub / issue

Source: Project Pack community evidence and pitfall evidence