Doramagic Project Pack · Human Manual

chroma

Related topics: Getting Started with Chroma, System Architecture Overview

Chroma Overview

Related topics: Getting Started with Chroma, System Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Libraries

Continue reading this section for the full explanation and source context.

Section Collection Structure

Continue reading this section for the full explanation and source context.

Section Metadata Filtering

Continue reading this section for the full explanation and source context.

Related topics: Getting Started with Chroma, System Architecture Overview

Chroma Overview

Introduction

Chroma is an open-source data infrastructure platform designed specifically for AI applications. It provides the foundational building blocks for storing, querying, and managing vector embeddings along with associated metadata, enabling developers to build AI-powered applications with efficient similarity search capabilities. Sources: README.md:1

As an open-source solution, Chroma offers flexibility for self-hosting while also providing a cloud-hosted option called Chroma Cloud, which delivers serverless vector, hybrid, and full-text search capabilities. The platform is designed to be fast, cost-effective, scalable, and straightforward to deploy. Sources: README.md:17-21

Architecture Overview

Chroma follows a client-server architecture with multiple client libraries available for different programming environments. The system is built with Rust for core performance-critical components and provides idiomatic client libraries for Python and JavaScript/TypeScript.

graph TD
    A[Client Applications] --> B[Python Client / JS Client]
    B --> C[Chroma Server API]
    C --> D[Worker Nodes]
    D --> E[Blockstore<br/>Arrow Storage]
    D --> F[Compaction &<br/>Log Processing]
    E --> G[Persistent Storage]
    
    H[Chroma Cloud] -.->|Optional hosted| C

Client Libraries

Chroma provides two primary client libraries:

ClientPackageDescription
PythonchromadbFull-featured Python client library Sources: clients/python/README.md:1
Python HTTPchromadb-clientLightweight HTTP-only client for server connections Sources: clients/python/README.md:12
JavaScript/TypeScriptchromadb (npm)Full-featured JS client for Node.js and browser Sources: clients/new-js/packages/chromadb/README.md:1

#### Python Client Installation

pip install chromadb  # Full client library
pip install chromadb-client  # HTTP client only

#### JavaScript Client Example

import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });

for (let i = 0; i < 20; i++) {
  await collection.add({
    ids: ["test-id-" + i.toString()],
    embeddings: [[1, 2, 3, 4, 5]],
    documents: ["test"],
  });
}

const queryData = await collection.query({
  queryEmbeddings: [[1, 2, 3, 4, 5]],
  queryTexts: ["test"],
});

Sources: clients/new-js/packages/chromadb/README.md:9-27

Data Model

Collection Structure

Collections in Chroma serve as the primary organizational unit for storing related documents and their associated embeddings. Each collection contains:

  • Documents: The textual content to be embedded
  • Embeddings: Vector representations of documents
  • Metadatas: Key-value pairs for filtering and categorization
  • Unique Identifiers: User-provided IDs for each record Sources: clients/python/README.md:16-27

Metadata Filtering

Chroma supports rich metadata filtering through operators that enable precise data retrieval:

graph LR
    A[Query Request] --> B[Metadata Filter]
    B --> C{Operator Type}
    C -->|Contains| D[String contains check]
    C -->|NotContains| E[String excludes check]
    C -->|Regex| F[Regular expression match]
    C -->|NotRegex| G[Regex exclusion]

Supported Document Operators:

OperatorDescriptionExample
ContainsDocument contains substring{"$contains": "keyword"}
NotContainsDocument excludes substring{"$not_contains": "spam"}
RegexRegular expression match{"$regex": "^prefix.*"}
NotRegexExclude by regex pattern{"$not_regex": ".*suffix$"}

Sources: rust/types/src/metadata.rs:1-30

Search Keys

The query system supports specialized keys for accessing different aspects of stored data:

KeyDescriptionUsage
#documentFull text contentKey::Document
#embeddingVector embeddingsKey::Embedding
#metadataRecord metadataKey::Metadata
#scoreSimilarity scoreKey::Score
Custom fieldsUser-defined metadataKey::field("field_name")

Sources: rust/types/src/execution/operator.rs:1-80

Core Components

Storage Layer

The blockstore provides the underlying storage mechanism using Arrow format for efficient columnar data storage and retrieval. This enables high-performance queries across large datasets. Sources: rust/blockstore/src/arrow/root.rs:1

Execution Operators

Chroma's query execution pipeline uses operators that transform and filter data through well-defined stages:

graph TD
    A[Query Request] --> B[Log Fetch Orchestrator]
    B --> C[KNN Filter]
    C --> D[Apply Logs Orchestrator]
    D --> E[Segment Writers]
    E --> F[Compact Collection]

Key Orchestrators:

ComponentPurpose
LogFetchOrchestratorFetches and materializes log entries Sources: rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs:1
KnnFilterPerforms k-nearest neighbor filtering Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1
ApplyLogsOrchestratorApplies log entries to segment writers Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1

Error Handling

The system uses a consistent error code hierarchy for reliable error management:

Error CodeDescription
InvalidArgumentClient-provided invalid parameters
InternalSystem-level internal errors
ResourceExhaustedResource limits reached (e.g., task abortion)

Sources: rust/blockstore/src/arrow/block/types.rs:1-20

Deployment Options

Self-Hosting

Chroma can be deployed on-premises or in cloud environments using Docker, Kubernetes, or direct installation.

Deployment Requirements:

ComponentSpecification
StoragePersistent volume for vector data
NetworkPort 8000 for API access
AuthOptional token or basic authentication (v0.4.7+)

Sources: examples/deployments/do-terraform/README.md:1-50

Starting the Server:

# Install via pip
pip install chromadb

# Run in client-server mode
chroma run --path /chroma_db_path

Sources: README.md:14-16

Chroma Cloud

Chroma Cloud provides a fully managed hosted service with:

  • Serverless vector search
  • Hybrid search capabilities
  • Full-text search integration
  • Automatic scaling
  • $5 free credits for new users

Sources: README.md:23-29

Cloud Deployment (Terraform Example)

For DigitalOcean deployment:

export TF_VAR_do_token=<DIGITALOCEAN_TOKEN>
export TF_ssh_public_key="./chroma-do.pub"
export TF_ssh_private_key="./chroma-do"
export TF_VAR_chroma_release="0.4.12"
export TF_VAR_region="ams2"
export TF_VAR_public_access="true"
export TF_VAR_enable_auth="true"
export TF_VAR_auth_type="token"

terraform apply -auto-approve

Sources: examples/deployments/do-terraform/README.md:30-45

CLI Tool

The Rust-based CLI provides command-line management capabilities:

chroma run --path <db_path>     # Run the server
chroma db create <db_name>      # Create database
chroma db list                  # List databases
chroma login                    # Authenticate with Chroma Cloud
chroma profile                  # Manage profiles
chroma install                  # Install updates
chroma update                   # Check for updates

Sources: rust/cli/src/lib.rs:1-30

Embedding Integration

Ollama Integration

The JavaScript client supports Ollama for local embedding generation:

Configuration Options:

OptionDefaultDescription
urlhttp://localhost:11434Ollama server URL
modelchroma/all-minilm-l6-v2-f32Embedding model

Supported Models:

ModelDimensionsUse Case
chroma/all-minilm-l6-v2-f32384General purpose (default)
nomic-embed-text768Extended context
mxbai-embed-large1024High accuracy
snowflake-arctic-embedVariableDomain-specific

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md:1-40

API Response Format

Get Response Structure

Query results are returned with flexible inclusion options:

pub struct GetResponse {
    pub ids: Vec<String>,
    pub embeddings: Option<Vec<Vec<f32>>>,      // Optional
    pub documents: Option<Vec<Option<String>>>, // Optional
    pub uris: Option<Vec<Option<String>>>,      // Optional
    pub metadatas: Option<Vec<Option<Metadata>>>, // Optional
    pub include: IncludeList,
}

Sources: rust/types/src/api_types.rs:1-30

License

Chroma is released under the Apache 2.0 license, making it suitable for both commercial and open-source projects. Sources: README.md:10

Community and Support

ResourceLink
Documentationhttps://docs.trychroma.com/
Discordhttps://discord.gg/MMeYNTmh3x
Homepagehttps://www.trychroma.com/

Sources: [clients/new-js/packages/chromadb/README.md:9-27]()

Getting Started with Chroma

Related topics: Chroma Overview, Python Client SDK

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Client

Continue reading this section for the full explanation and source context.

Section JavaScript/TypeScript Client

Continue reading this section for the full explanation and source context.

Section Python Client Setup

Continue reading this section for the full explanation and source context.

Related topics: Chroma Overview, Python Client SDK

Getting Started with Chroma

Chroma is an open-source data infrastructure for AI that provides vector, hybrid, and full-text search capabilities. It enables developers to build AI applications by storing embeddings, documents, and metadata with efficient querying mechanisms.

Overview

Chroma serves as a vector database optimized for AI workloads. It allows you to:

  • Store embeddings alongside documents and metadata
  • Query using text or embedding vectors
  • Filter results based on metadata
  • Work with multiple programming languages including Python and JavaScript

Installation

Python Client

Install the Python client using pip:

pip install chromadb

For a lightweight HTTP-only client that connects to a Chroma server:

pip install chromadb-client

Sources: clients/python/README.md

JavaScript/TypeScript Client

For the new JavaScript client:

npm install chromadb

For a lighter package with optional dependencies:

npm install chromadb-client

Sources: clients/new-js/packages/chromadb/README.md

Basic Setup and Configuration

Python Client Setup

Connect to a Chroma server running locally:

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)

Sources: clients/python/README.md

JavaScript Client Setup

import { ChromaClient } from "chromadb";

const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });

Sources: clients/new-js/packages/chromadb/README.md

Running Chroma Server

To run Chroma in client-server mode:

chroma run --path /chroma_db_path

Sources: README.md

Core Operations

Creating a Collection

Collections are containers for your documents, embeddings, and metadata.

collection = client.create_collection("all-my-documents")

Adding Documents

Add documents with optional embeddings, metadata, and unique IDs:

collection.add(
    documents=["This is document1", "This is document2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)

Sources: clients/python/README.md

Querying Documents

Query the collection using text or embeddings:

results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)
const queryData = await collection.query({
    queryEmbeddings: [[1, 2, 3, 4, 5]],
    queryTexts: ["test"],
});

Sources: clients/python/README.md and clients/new-js/packages/chromadb/README.md

Embedding Functions

Chroma supports various embedding providers through configurable embedding functions.

Configuration Schema

Embedding functions use JSON Schema validation to ensure cross-language compatibility:

from chromadb.utils.embedding_functions.schemas import validate_config

config = {
    "api_key_env_var": "CHROMA_OPENAI_API_KEY",
    "model_name": "text-embedding-ada-002"
}
validate_config(config, "openai")

Each schema follows JSON Schema Draft-07 specification and includes version, title, description, properties, required fields, and additionalProperties settings.

Sources: chromadb/utils/embedding_functions/schemas/README.md

Available Embedding Providers

ProviderPackageAPI Key Environment Variable
OpenAI@chroma-core/openaiCHROMA_OPENAI_API_KEY
Cohere@chroma-core/cohereCOHERE_API_KEY
Jina@chroma-core/jinaJINA_API_KEY
Google Gemini@chroma-core/google-geminiGOOGLE_API_KEY
Hugging Face@chroma-core/hugging-faceHF_API_KEY
Ollama@chroma-core/ollamaOLLAMA_API_KEY
Together AI@chroma-core/together-aiTOGETHER_API_KEY
Voyage AI@chroma-core/voyageaiVOOYAGE_API_KEY
xAI@chroma-core/xaiXAI_API_KEY

Sources: clients/new-js/packages/ai-embeddings/all/README.md

Using Embedding Functions

import { ChromaClient } from 'chromadb';
import { JinaEmbeddingFunction } from '@chroma-core/jina';

const embedder = new JinaEmbeddingFunction({
    apiKey: 'your-api-key',
    modelName: 'jina-embeddings-v2-base-en',
    task: 'retrieval.passage',
    dimensions: 768,
    lateChunking: false,
    truncate: true,
    normalized: true,
    embeddingType: 'float'
});

const collection = await client.createCollection({
    name: 'my-collection',
    embeddingFunction: embedder,
});

Sources: clients/new-js/packages/ai-embeddings/jina/README.md

Common Utilities

The @chroma-core/ai-embeddings-common package provides shared utilities:

import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';

// Convert camelCase to snake_case
const snakeCaseConfig = snakeCase({ modelName: 'text-embedding-3-small' });
// Result: { model_name: 'text-embedding-3-small' }

// Check environment
if (isBrowser()) {
    // Browser-specific logic
}

Sources: clients/new-js/packages/ai-embeddings/common/README.md

JavaScript Client Packages

chromadb vs chromadb-client

Featurechromadbchromadb-client
Package sizeLargerSmaller
DependenciesBundledOptional peer dependencies
Use caseQuick setupProduction with specific providers

The chromadb-client package is ideal for production environments where you only use specific embedding providers.

Sources: clients/js/packages/chromadb-client/README.md

Chroma Cloud

Chroma Cloud provides a hosted service for serverless vector, hybrid, and full-text search. To use Chroma Cloud:

  1. Sign up at trychroma.com
  2. Create a database
  3. Get your API key from the dashboard

Configure environment variables for cloud access:

export CHROMA_API_KEY=your-api-key
export CHROMA_TENANT=your-tenant
export CHROMA_DATABASE=your-database

Sources: README.md and rust/chroma/README.md

Environment Variables

VariableDescription
CHROMA_API_KEYAPI key for Chroma Cloud authentication
CHROMA_TENANTSets the tenant (auto-inferred with API key)
CHROMA_DATABASESets the database (auto-inferred with scoped API key)
[PROVIDER]_API_KEYProvider-specific API keys (e.g., OPENAI_API_KEY)

For local development, you can use:

let client = ChromaHttpClient::from_env()?;

Sources: rust/chroma/README.md

Complete Example Workflow

graph TD
    A[Install Chroma Client] --> B[Initialize Client]
    B --> C[Create Collection]
    C --> D[Add Documents with Embeddings]
    D --> E[Query Collection]
    E --> F[Get Results]
    
    G[Configure Embedding Function] --> D
    H[Add Metadata] --> D
    I[Set API Keys] --> B

Quick Reference Commands

Installation

# Python
pip install chromadb

# JavaScript
npm install chromadb

# Start server
chroma run --path /chroma_db_path

Basic Operations

OperationPythonJavaScript
Create clientclient = chromadb.HttpClient()new ChromaClient()
Create collectionclient.create_collection(name)client.createCollection({name})
Add documentscollection.add(...)collection.add(...)
Querycollection.query(...)collection.query(...)

Additional Resources

Sources: [clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)

System Architecture Overview

Related topics: Rust Backend Services Architecture, Go Coordinator & Distributed Systems, Protocol Buffers & gRPC API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Frontend Server

Continue reading this section for the full explanation and source context.

Section Worker Server

Continue reading this section for the full explanation and source context.

Section System Database (SysDB)

Continue reading this section for the full explanation and source context.

Related topics: Rust Backend Services Architecture, Go Coordinator & Distributed Systems, Protocol Buffers & gRPC API

System Architecture Overview

Introduction

Chroma is an open-source data infrastructure platform designed for AI applications, providing vector, hybrid, and full-text search capabilities. The system is built as a distributed, scalable architecture that handles embedding storage, indexing, and query execution across multiple components. Chroma positions itself as the open-source alternative to hosted vector database services, enabling developers to deploy sophisticated AI search infrastructure while maintaining full control over their data.

The architecture follows a modular design pattern with distinct components for API serving, query processing, data storage, and system coordination. Each component is responsible for specific aspects of the data pipeline, from ingestion through indexing to query execution.

High-Level Architecture

Chroma's architecture consists of three primary layers working in concert to provide vector search capabilities:

  1. Frontend Layer - Handles API requests and response formatting
  2. Worker Layer - Executes query operations and manages indexing
  3. System Database (SysDB) Layer - Maintains metadata and system state
graph TD
    A[Client Application] --> B[Frontend Server]
    B --> C[Worker Servers]
    C --> D[SysDB]
    C --> E[Blockstore]
    E --> F[Arrow Files]
    D --> G[Collection Metadata]
    G --> H[Topology Information]

Component Architecture

Frontend Server

The frontend server component serves as the API gateway for Chroma, handling incoming HTTP/gRPC requests and translating them into internal operations. The frontend is responsible for request validation, authentication handling, and response serialization.

Key Responsibilities:

ResponsibilityDescription
API Endpoint HandlingExposes REST and gRPC endpoints for collection operations
Request ValidationValidates incoming query parameters and payload structures
Response SerializationConverts internal data structures to API response formats
Error MappingTranslates internal errors to appropriate HTTP status codes

Sources: rust/frontend/src/server.rs:1-50

The frontend server implements the ChromaError trait for consistent error handling across the system. Error codes are mapped as follows:

Internal ErrorHTTP Status Code
InvalidArgument400 Bad Request
NotFound404 Not Found
Internal500 Internal Server Error
Unavailable503 Service Unavailable

Worker Server

The worker server handles the core data operations including embedding storage, indexing, and query execution. Workers are the primary compute units in Chroma's architecture, responsible for processing search requests and maintaining index structures.

Sources: rust/worker/src/server.rs:1-60

Worker Components:

graph LR
    A[Query Request] --> B[Query Planner]
    B --> C[HNSW Index]
    B --> D[Spann Index]
    B --> E[Record Segment]
    B --> F[Metadata Segment]
    C --> G[Result Merger]
    D --> G
    E --> G
    F --> G
    G --> H[Response]

The worker server implements orchestration components for managing complex operations:

  • ApplyLogsOrchestrator - Coordinates log application and compaction
  • WorkQueueClient - Manages distributed task execution
  • Segment Writers - Handles data persistence for different segment types

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-80

System Database (SysDB)

The SysDB component maintains all metadata about collections, segments, and system topology. It provides a centralized view of the system's state and enables coordination across multiple workers.

SysDB Responsibilities:

FunctionDescription
Collection MetadataStores collection configurations and schemas
Segment RegistryTracks active segments and their locations
Topology ManagementManages provider-region mappings for distributed deployments
Transaction CoordinationEnsures consistency across distributed operations

Sources: rust/sysdb/src/sysdb.rs:1-100

The SysDB uses a provider-region topology model that supports multi-cloud and multi-region deployments:

pub struct ProviderRegion<T> {
    name: RegionName,
    provider: String,      // e.g., "aws", "gcp"
    region: String,        // e.g., "us-east-1"
    config: T,             // Provider-specific configuration
}

Sources: rust/types/src/topology.rs:1-60

Data Model Architecture

Collection Schema

Collections in Chroma follow a flexible schema model that supports multiple index types and data fields.

graph TD
    A[Collection] --> B[Record Segment]
    A --> C[Metadata Segment]
    A --> D[Vector Index]
    A --> E[Sparse Vector Index]
    D --> F[HNSW Index]
    D --> G[Spann Index]

Supported Index Types:

Index TypePurposeKey Configuration
Vector IndexDense embeddingsSpace (Cosine, L2, Dot), HNSW params
Sparse Vector IndexBM25-style inverted indexStringInvertedIndexConfig
Spann IndexMemory-efficient approximate searchInternalSpannConfiguration

Sources: rust/types/src/collection_schema.rs:1-150

API Types

The API layer defines core types for query operations:

TypePurpose
IncludeSpecifies which fields to return (distances, documents, embeddings, metadatas, uris)
IncludeListCollection of Include values with convenience constructors
WhereDocumentOperatorDocument filtering (Contains, NotContains, Regex, NotRegex)

Sources: rust/types/src/api_types.rs:1-100

pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}

impl IncludeList {
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, Include::Embedding, Include::Uri])
    }
}

Metadata Filtering

Chroma supports rich metadata filtering through the MetadataExpression and MetadataComparison types:

graph TD
    A[MetadataExpression] --> B[key: String]
    A --> C[comparison: MetadataComparison]
    C --> D[Primitive: Operator + Value]
    C --> E[Set: Operator + SetValue]

Sources: rust/types/src/metadata.rs:1-80

Blockstore Architecture

The blockstore provides persistent storage for indexed data using Apache Arrow format for efficient serialization and querying.

Arrow Block Structure

graph LR
    A[Write Operation] --> B[Block Delta]
    B --> C[Commit to Block]
    C --> D[Arrow IPC Format]
    D --> E[Disk Storage]
    E --> F[BlockfileReader]

Block Types:

Block TypeDescription
OrderedBlockDeltaSequential writes with ordering guarantees
UnorderedBlockDeltaHigh-throughput writes without ordering
DirectoryBlockSparse posting directory entries

Sources: rust/blockstore/src/arrow/block/types.rs:1-100

The Arrow layout verification ensures data integrity:

pub enum ArrowLayoutVerificationError {
    BufferLengthNotAligned,
    NoRecordBatches,
    MultipleRecordBatches,
    InvalidMessageType,
    RecordBatchDecodeError,
}

Sparse Posting Blocks

Sparse vectors use a specialized block format for efficient storage:

body = [ max_offset: u32 LE, max_weight: f32 LE ] × num_entries

The DirectoryBlock stores per-posting-block metadata for term pruning:

  • max_offset: Largest document offset in the posting block
  • max_weight: Largest weight in the posting block

Sources: rust/types/src/sparse_posting_block.rs:1-60

Spann Index Architecture

Spann is Chroma's memory-efficient approximate nearest neighbor index that combines HNSW with posting lists.

graph TD
    A[SpannIndexWriter] --> B[HNSW Index]
    A --> C[Posting Lists]
    A --> D[Versions Map]
    A --> E[MaxHeadID Blockfile]
    B --> F[Reader with adaptive search]

SpannIndexReader Structure:

ComponentTypePurpose
posting_listsBlockfileReader<u32, SpannPostingList>Term postings
hnsw_indexHnswIndexRefGraph-based search
versions_mapBlockfileReader<u32, u32>Version tracking
dimensionalityusizeVector dimension
adaptive_search_nprobeboolAdaptive parameter

Sources: rust/index/src/spann/types.rs:1-80

Indexing Pipeline

The indexing pipeline handles document ingestion through the following stages:

graph LR
    A[Add Records] --> B[ApplyLogsOrchestrator]
    B --> C[Record Segment Writer]
    B --> D[Metadata Segment Writer]
    B --> E[Vector Index Writer]
    C --> F[Flush to Blockstore]
    D --> F
    E --> F
    F --> G[Collection Update]

Error Handling:

The orchestrator implements comprehensive error tracking:

Error TypeError CodeTracing
ApplyLogInternalYes
ChannelInternalYes
CommitInternalYes
HnswSegmentInternalYes
MetadataSegmentInternalYes
SealInternalYes
InvariantViolation-Always

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-100

Query Execution Flow

Query Request Processing

graph TD
    A[Query Request] --> B[Parse Query]
    B --> C[Load Segments]
    C --> D[Parallel Segment Queries]
    D --> E{HNSW Search}
    D --> F{Spann Search}
    D --> G{Record Scan}
    E --> H[Merge Results]
    F --> H
    G --> H
    H --> I[Apply Filters]
    I --> J[Return Results]

Work Queue Integration

Distributed query execution uses a work queue system for task coordination:

graph TD
    A[Coordinator] --> B[WorkQueueClient]
    B --> C[gRPC Channel]
    C --> D[Worker Pool]
    D --> E[Task Execution]
    E --> F[Result Collection]

Error Code Mapping:

gRPC CodeChroma Error Code
UnavailableUnavailable
DeadlineExceededDeadlineExceeded
ResourceExhaustedResourceExhausted
NotFoundNotFound
InvalidArgumentInvalidArgument

Sources: rust/worker/src/work_queue/work_queue_client.rs:1-80

Deployment Topology

Chroma supports flexible deployment configurations through its topology model:

graph TD
    A[Topology] --> B[TopologyName]
    A --> C[Vec<RegionName>]
    A --> D[Config T]
    C --> E[ProviderRegion]
    E --> F[Provider]
    E --> G[Region]

The topology system enables:

  • Multi-cloud deployments (AWS, GCP, Azure)
  • Region-specific configurations
  • Custom provider extensions

Summary

Chroma's architecture provides a scalable foundation for AI-powered search with several key design principles:

  1. Separation of Concerns - Frontend, worker, and SysDB components handle distinct responsibilities
  2. Arrow-Based Storage - Efficient columnar storage for analytical queries
  3. Flexible Indexing - Support for HNSW, Spann, and sparse vector indexes
  4. Distributed Coordination - Work queues and topology management for multi-node deployments
  5. Comprehensive Error Handling - Consistent error codes and tracing across all components

The modular architecture allows Chroma to scale from single-node development deployments to distributed production clusters serving AI applications at scale.

Sources: [rust/frontend/src/server.rs:1-50]()

Protocol Buffers & gRPC API

Related topics: System Architecture Overview, Rust Backend Services Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Service Definitions

Continue reading this section for the full explanation and source context.

Section Data Type Coverage

Continue reading this section for the full explanation and source context.

Section Record Conversions

Continue reading this section for the full explanation and source context.

Related topics: System Architecture Overview, Rust Backend Services Architecture

Protocol Buffers & gRPC API

Chroma uses Protocol Buffers (protobuf) as the core serialization format for inter-service communication and data persistence. The IDL (Interface Definition Language) files in the idl/ directory define the service APIs, data structures, and message types that power Chroma's distributed architecture.

Architecture Overview

Chroma employs a client-server architecture where Protocol Buffers serve as the contract between components. The protobuf definitions are centralized in the idl/ directory and used to generate code for multiple language runtimes including Python, JavaScript, Go, and Rust.

graph TD
    subgraph "Client Layer"
        JS[JavaScript Client]
        PY[Python Client]
        GO[Go Client]
    end
    
    subgraph "IDL Definitions"
        PROTO[Protocol Buffer Definitions]
    end
    
    subgraph "Server Layer"
        API[API Server]
        COORD[Coordinator Service]
        QUERY[Query Executor]
    end
    
    JS -->|Generated TS Bindings| PROTO
    PY -->|Generated Python Stub| PROTO
    GO -->|Generated Go Code| PROTO
    API -->|gRPC/prost| PROTO
    COORD -->|gRPC/prost| PROTO
    QUERY -->|gRPC/prost| PROTO

Proto Definitions Structure

Core Service Definitions

The main protobuf definitions are organized in idl/chromadb/proto/:

Proto FilePurposeKey Messages
chroma.protoCore data types and collection operationsCollection, Database, OperationRecord
coordinator.protoCoordinator service for cluster managementTenant, Database, Segment operations
query_executor.protoQuery execution service interfaceQuery requests and responses

Data Type Coverage

The protobuf definitions cover all core data types used throughout Chroma:

Data TypeUsage
VectorEmbedding vectors with scalar encoding
OperationRecordCRUD operations for records
LogRecordWrite-ahead log entries with offsets
MetadataKey-value metadata for filtering
CollectionCollection configuration and schema
CmekCustomer-managed encryption keys

Rust Type Conversions

Chroma's Rust backend uses protobuf-generated types and converts them to idiomatic Rust types through TryFrom implementations. This pattern ensures type safety and clean separation between the wire format and internal representations.

Record Conversions

The rust/types/src/record.rs file contains conversion logic between protobuf and Rust types:

graph LR
    A[chroma_proto::LogRecord] -->|TryFrom| B[LogRecord Rust]
    A2[chroma_proto::Vector] -->|TryFrom| B2[(Vec<f32>, ScalarEncoding)]

OperationRecord Conversion (Sources: rust/types/src/record.rs:recordinfo)

The OperationRecord conversion extracts metadata and document fields from protobuf representations:

// Metadata is extracted from proto, with document potentially in metadata
let (metadata, document) = match operation_record_proto.metadata {
    Some(proto_metadata) => match UpdateMetadata::try_from(proto_metadata) {
        Ok(mut metadata) => {
            let document = metadata.remove(CHROMA_DOCUMENT_KEY);
            match document {
                Some(UpdateMetadataValue::Str(document)) => {
                    (Some(metadata), Some(document))
                }
                _ => (Some(metadata), None),
            }
        }
        Err(e) => return Err(RecordConversionError::...),
    },
    None => (None, None),
};

Vector Type Conversions

Vectors are stored with their encoding information (Sources: rust/types/src/record.rs:vector)

impl TryFrom<chroma_proto::Vector> for (Vec<f32>, ScalarEncoding) {
    type Error = VectorConversionError;
    // Conversion implementation
}

Metadata Filtering Types

The metadata system supports rich filtering expressions defined in protobuf and converted to Rust types (Sources: rust/types/src/metadata.rs:metadata-types)

Document Operators

graph TD
    DOC_OPS[WhereDocumentOperator] --> Contains
    DOC_OPS --> NotContains
    DOC_OPS --> Regex
    DOC_OPS --> NotRegex
OperatorDescription
ContainsDocument contains substring
NotContainsDocument does not contain substring
RegexDocument matches regex pattern
NotRegexDocument does not match regex pattern

Metadata Expression Structure

pub struct MetadataExpression {
    pub key: String,
    pub comparison: MetadataComparison,
}

Metadata comparisons support both primitive types (strings, integers, floats, booleans) and set operations.

Collection Schema Definitions

Schema definitions in rust/types/src/collection_schema.rs define how collections are configured for indexing (Sources: rust/types/src/collection_schema.rs:schema-struct)

Schema Builder Pattern

The Schema struct provides a fluent builder API for index configuration:

graph TD
    SCHEMA[Schema::default] --> CREATE_INDEX[.create_index]
    CREATE_INDEX --> VALIDATE[Validate Index Config]
    VALIDATE -->|Valid| RETURN[Return Self]
    VALIDATE -->|Invalid| ERROR[SchemaBuilderError]

Index Creation Example (Sources: rust/types/src/collection_schema.rs:create-index-example)

let schema = Schema::default()
    .create_index(None, VectorIndexConfig {
        space: Some(Space::Cosine),
        embedding_function: None,
        source_key: None,
        hnsw: None,
        spann: None,
    }.into())?
    .create_index(Some("category"), StringInvertedIndexConfig {}.into())?;

Supported Index Types

Index TypeConfigurationApplies To
VectorIndexConfigHNSW, Space (Cosine/L2/IP), embedding function#embedding key only
StringInvertedIndexConfigString indexingCustom string keys
FtsIndexConfigFull-text searchDocument key

CMEK (Customer-Managed Encryption Keys)

Chroma supports customer-managed encryption keys through the Cmek type defined in protobuf (Sources: rust/types/src/collection_schema.rs:cmek)

CMEK Provider Configuration

ProviderValidation PatternResource Format
GCPCMEK_GCP_RE regexGCP resource identifier
impl Cmek {
    pub fn gcp(resource: String) -> Self;
    pub fn validate_pattern(&self) -> bool;
}

Topology and Region Management

For multi-region deployments, Chroma uses topology definitions (Sources: rust/types/src/topology.rs:topology)

Provider Region Structure

classDiagram
    class ProviderRegion {
        +name: RegionName
        +provider: String
        +region: String
        +config: T
    }
    
    class Topology {
        +name: TopologyName
        +regions: Vec~RegionName~
        +config: T
    }
ComponentDescription
ProviderRegionSingle cloud provider region configuration
TopologyCollection of regions forming a deployment topology

Code Generation Pipeline

Build Process

Protobuf definitions are compiled to target languages using protoc and language-specific plugins (Sources: go/README.md:protobuf-setup)

graph LR
    A[.proto files] --> B[protoc compiler]
    B -->|Python| C[Python stubs]
    B -->|Go| D[Go gRPC code]
    B -->|JS/TS| E[TypeScript definitions]
    B -->|Rust| F[Rust + prost]

Required Tools

ToolPurpose
protocProtocol Buffer compiler
protoc-gen-goGo code generation
protoc-gen-go-grpcGo gRPC service generation

Generated API Patterns

The generated TypeScript API in clients/js/packages/chromadb-core/src/generated/api.ts follows standard gRPC-web patterns (Sources: clients/js/packages/chromadb-core/src/generated/api.ts:fetch-pattern)

const localVarFetchArgs = ApiApiFetchParamCreator(configuration).version(options);
return (fetch: FetchAPI = defaultFetch, basePath: string = BASE_PATH) => {
    return fetch(
        basePath + localVarFetchArgs.url,
        localVarFetchArgs.options,
    ).then((response) => {
        // Handle response by content type and status
        if (response.status === 200) {
            if (mimeType === "application/json") {
                return response.json();
            }
        }
        // Error handling for 401, 404, 409, 500
    });
};

Error Code Mapping

Error types are mapped from Rust/Arrow errors to Chroma error codes (Sources: rust/blockstore/src/arrow/root.rs:error-mapping)

Arrow Error TypeChroma Error Code
IOErrorInternal
ArrowErrorInternal
LayoutVerificationErrorInternal
FromBytesError variantsInvalidArgument / Internal

Message Format Details

Arrow Block Serialization

Binary data in protobuf messages uses Arrow IPC format for efficient columnar storage (Sources: rust/blockstore/src/arrow/root.rs:arrow-reader)

let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
let record_batch = match arrow_reader {
    Ok(mut reader) => match reader.next() {
        Some(Ok(batch)) => batch,
        Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
        None => return Err(FromBytesError::NoDataError),
    },
    Err(e) => return Err(FromBytesError::ArrowError(e)),
};

The Arrow footer format requires:

  • ARROW_MAGIC header (6 bytes)
  • Footer content
  • Footer length (4 bytes)
  • Footer checksum

See Also

Source: https://github.com/chroma-core/chroma / Human Manual

Python Client SDK

Related topics: Getting Started with Chroma, JavaScript/TypeScript Client SDKs, Embedding Functions Integration

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Layer

Continue reading this section for the full explanation and source context.

Section Collection Management

Continue reading this section for the full explanation and source context.

Section Data Model

Continue reading this section for the full explanation and source context.

Related topics: Getting Started with Chroma, JavaScript/TypeScript Client SDKs, Embedding Functions Integration

Python Client SDK

The Chroma Python Client SDK is the official Python library for interacting with Chroma, an open-source vector database designed for AI applications. This SDK provides a complete interface for managing collections, storing embeddings, and performing similarity searches across vector data.

Overview

Chroma positions itself as the open-source data infrastructure for AI, offering developers a streamlined way to incorporate vector search capabilities into their applications. The Python Client SDK serves as the primary client library for Python developers, enabling seamless integration with Chroma's vector database capabilities.

The SDK supports two primary modes of operation: embedded mode, where the database runs locally within the same process, and client-server mode, where the Python client communicates with a remote Chroma server via HTTP. This flexibility allows developers to choose the deployment architecture that best fits their application requirements, whether they need a lightweight local setup for development and testing or a scalable server-based deployment for production environments.

For Python-specific installations, developers can choose between the full chromadb package, which includes all embedding libraries as dependencies, or the chromadb-client package, which is a lightweight HTTP-only client that connects to a running Chroma server. The installation is straightforward via pip, making it accessible for projects of all sizes.

The SDK is designed with developer productivity in mind, providing intuitive APIs for common operations like adding documents, querying collections, and managing metadata. It handles the complexity of embedding generation and vector storage behind a clean, Pythonic interface, allowing developers to focus on building their AI applications rather than managing low-level database operations.

Architecture

The Python Client SDK follows a layered architecture that separates concerns between the client interface, API communication, and data models. Understanding this architecture helps developers effectively use the SDK and troubleshoot any issues that may arise during development.

graph TD
    A[Application Code] --> B[ChromaClient / AsyncChromaClient]
    B --> C[Collection API]
    B --> D[Embedding Functions]
    C --> E[REST API Layer]
    D --> F[External Embedding Providers]
    E --> G[Chroma Server]
    E --> H[Embedded Mode]
    G --> I[Persistent Storage]
    H --> I

Client Layer

The client layer forms the entry point for all SDK operations. Chroma provides two client implementations: the synchronous Client class for traditional Python applications and the AsyncClient class for asynchronous applications built with async/await patterns.

The synchronous client is suitable for most use cases, providing blocking API calls that execute immediately and return results. This approach is familiar to developers coming from traditional Python backgrounds and works well in scripts, batch processing jobs, and web applications that don't require high concurrency.

The asynchronous client, on the other hand, is designed for applications that need to handle many concurrent operations efficiently, such as web servers built on frameworks like FastAPI or Starlette. By using Python's asyncio library, the async client can perform multiple network operations concurrently, improving throughput in I/O-bound scenarios.

Both clients share a similar interface, with the async client simply wrapping the underlying HTTP calls with async/await syntax. This consistency makes it easy to switch between synchronous and asynchronous code as requirements evolve.

Collection Management

Collections serve as the primary organizational unit in Chroma, analogous to tables in traditional relational databases or buckets in object storage. Each collection contains a set of vectors along with their associated metadata, documents, and unique identifiers.

The SDK provides a comprehensive collection API that supports creating new collections, retrieving existing ones, listing all collections in the database, and deleting collections when they're no longer needed. Collections can be configured with specific settings at creation time, including the embedding function to use for auto-embedding documents and the name of the collection for identification purposes.

Collections maintain a schema-like structure through their use of metadata. While Chroma is schemaless in the traditional sense, the metadata associated with vectors allows developers to impose structure on their data for filtering and organization purposes.

Data Model

The data model in Chroma revolves around four core concepts: vectors, documents, metadata, and IDs. Each record in a collection consists of these four components, providing a flexible yet structured way to store and retrieve information.

Vectors are the mathematical representations of data in embedding space. They can be provided directly by the application or generated automatically using embedding functions. The SDK accepts vectors as lists of floating-point numbers, making it compatible with output from virtually any embedding model.

Documents are the original text or content that was transformed into vectors. Storing documents alongside their vectors enables applications to retrieve the original content during query operations without needing to maintain a separate document store.

Metadata provides contextual information about each record. Examples include the source of the document, timestamps, user IDs, or any other application-specific attributes. Metadata can be used for filtering during queries, allowing applications to narrow search results based on specific criteria.

IDs uniquely identify each record within a collection. The SDK accepts string identifiers, giving applications flexibility in how they choose to name and reference their data. Common patterns include using UUIDs, meaningful string identifiers derived from the document content, or sequential numbers.

Installation and Setup

Installing the Chroma Python Client SDK is straightforward using pip, Python's package manager. The SDK is available in two variants to accommodate different use cases and deployment scenarios.

pip install chromadb

This command installs the full Chroma package, which includes all core functionality plus built-in support for various embedding providers. This variant is recommended for most users who want a complete, self-contained installation.

pip install chromadb-client

This command installs only the HTTP client library, which is useful for scenarios where the Chroma server runs separately or where a minimal dependency footprint is required. This variant connects to Chroma servers via HTTP and doesn't include embedding provider libraries.

Client Initialization

Initializing the Chroma client depends on the deployment mode and desired configuration. The SDK provides flexible initialization options to accommodate different environments.

Embedded Mode

In embedded mode, Chroma runs entirely within your Python process, storing data locally. This is ideal for development, testing, and small-scale deployments where a separate server isn't required.

import chromadb

client = chromadb.Client()

The embedded client automatically creates a local database directory and manages all data storage internally. Data persists across process restarts, making it suitable for applications that need persistent storage without the complexity of a separate server process.

Client-Server Mode

In client-server mode, your Python application connects to a Chroma server running separately, either locally or on a remote machine. This architecture supports larger-scale deployments and enables sharing data across multiple client applications.

import chromadb

client = chromadb.HttpClient(
    host="localhost",
    port=8000
)

The HTTP client communicates with the server using REST API calls, handling serialization, network transport, and error handling transparently. This mode requires a Chroma server to be running and accessible at the specified host and port.

Configuration Options

The client supports various configuration options to customize its behavior for specific use cases. These options can be provided during client initialization to control aspects like SSL/TLS settings, authentication, and connection pooling.

OptionTypeDefaultDescription
hoststring"localhost"Server hostname or IP address
portinteger8000Server port number
sslbooleanfalseEnable SSL/TLS encryption
headersdictNoneCustom HTTP headers for requests
tenantstringNoneTenant identifier for multi-tenant setups
databasestringNoneDatabase name for organized data storage

Collection Operations

Collections are the central organizing structure in Chroma, grouping related vectors, documents, and metadata together. The SDK provides a comprehensive API for creating, managing, and interacting with collections.

Creating a Collection

Collections are created using the client's create_collection method, which accepts a name and optional configuration parameters.

collection = client.create_collection(
    name="my-documents",
    metadata={"description": "Document collection for RAG"},
    get_or_create=True
)

The get_or_create parameter is particularly useful in production applications, as it prevents errors if a collection with the same name already exists. When set to True, the method returns the existing collection if one exists or creates a new one if it doesn't.

Adding Data

Data is added to collections using the add method, which accepts vectors, documents, metadata, and unique identifiers. All parameters must be provided as lists of equal length, with each index representing a single record.

collection.add(
    documents=["This is the first document", "This is the second document"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc-1", "doc-2"],
    embeddings=[[1.2, 2.1, 3.5], [1.1, 2.0, 3.4]]
)

The SDK supports automatic embedding generation when embedding functions are configured for the collection. In this case, documents can be provided without explicit embeddings, and the SDK will generate the vector representations automatically.

Querying Data

Querying is performed using the query method, which accepts query text or query vectors and returns the most similar results based on vector similarity.

results = collection.query(
    query_texts=["search terms here"],
    n_results=2,
    where={"source": "notion"},
    include=["documents", "metadatas", "distances"]
)

The where parameter enables filtering results based on metadata conditions, allowing applications to narrow search results to specific subsets of data. The include parameter controls which data components are returned, helping optimize bandwidth and processing for applications that don't need all available information.

Query results include the matched document IDs, the documents themselves, associated metadata, and distance scores indicating how similar each result is to the query. Lower distance scores indicate higher similarity, with zero representing an exact match.

Updating and Deleting Data

The SDK supports updating existing records and deleting unwanted data from collections. These operations are essential for maintaining data accuracy and managing collection lifecycle.

collection.update(
    ids=["doc-1"],
    documents=["Updated document content"],
    metadatas=[{"source": "notion", "updated": True}]
)

collection.delete(
    ids=["doc-2"],
    where={"source": "google-docs"}
)

Update operations modify existing records identified by their IDs, replacing the specified fields while preserving unchanged data. Delete operations remove records matching the provided ID or metadata filters, with the ability to delete multiple records simultaneously.

Querying and Filtering

Chroma provides powerful querying and filtering capabilities that enable precise retrieval of relevant results. Understanding these capabilities is essential for building effective vector search applications.

The core query operation performs vector similarity search, finding the most similar records to a given query vector or text. The SDK handles text queries by first embedding them using the collection's configured embedding function.

Results are ranked by similarity, with the most similar results appearing first. The n_results parameter controls how many results are returned, allowing applications to balance result completeness with performance considerations.

Metadata Filtering

Metadata filtering narrows search results based on document attributes stored alongside vectors. This is particularly useful for applications that need to search within specific subsets of data, such as documents from a particular source or within a date range.

results = collection.query(
    query_texts=["search terms"],
    where={
        "source": "notion",
        "category": {"$in": ["technical", "documentation"]}
    }
)

The filter syntax supports various operators including equality, inequality, comparison operators for numeric ranges, and set membership tests. Complex filter expressions can be constructed using logical operators to combine multiple conditions.

Result Inclusion

The include parameter controls which data components are included in query results. This allows applications to optimize their queries by requesting only the data they need.

Include OptionDescription
embeddingsInclude the full vector for each result
documentsInclude the original document text
metadatasInclude the associated metadata
distancesInclude similarity distance scores

By default, only documents and distances are included in results. Applications should specify only the needed components to minimize bandwidth usage and processing overhead.

Embedding Functions

Embedding functions transform text into vector representations that capture semantic meaning. Chroma supports multiple embedding providers, allowing applications to choose the approach that best fits their requirements.

Built-in Embeddings

For simple use cases, Chroma includes a default embedding function that works out of the box without additional configuration. This function is suitable for development and testing but may not provide the best quality embeddings for production applications.

External Providers

For production applications requiring higher quality embeddings, Chroma supports integration with external embedding services. These services provide state-of-the-art embedding models that can significantly improve search quality.

Supported providers include OpenAI's embedding models, which offer excellent quality for English text, and various open-source alternatives. Each provider has its own configuration requirements, typically involving API keys and model selection parameters.

Configuration is typically done at the collection level, allowing different collections to use different embedding functions if needed. This flexibility supports applications that work with multiple data types or require different embedding strategies for different use cases.

Custom Embedding Functions

For specialized use cases, applications can implement custom embedding functions by conforming to the SDK's embedding function interface. This allows integration with any embedding model or service that can be accessed from Python.

Custom functions receive a list of texts and return a corresponding list of vectors. They can implement any logic needed, including batching, caching, and error handling, giving applications full control over the embedding process.

Error Handling

The SDK provides comprehensive error handling to help applications gracefully manage failure scenarios. Understanding the error types and how to handle them is important for building robust applications.

Connection Errors

Connection errors occur when the client cannot establish communication with the Chroma server. These errors can result from network issues, server unavailability, or incorrect server configuration.

try:
    collection = client.get_collection("my-collection")
except chromadb.connection.ChromaConnectionError:
    print("Unable to connect to Chroma server")

Applications should implement appropriate retry logic and user-facing error messages when connection errors occur, as these situations typically require intervention beyond the application's control.

Collection Not Found

Operations on non-existent collections raise specific errors that can be caught and handled appropriately.

try:
    collection = client.get_collection("non-existent")
except chromadb.not_found.NotFound:
    print("Collection does not exist")

The get_or_create parameter available during collection creation provides an alternative to explicit error handling when the existence of a collection is uncertain.

Invalid Arguments

Invalid argument errors indicate problems with the data or parameters provided to SDK methods. These errors typically result from bugs in application code or invalid user input.

Examples include malformed IDs, vectors of incorrect dimensions, mismatched list lengths, and invalid filter expressions. The error messages provide guidance on what parameter is problematic, making debugging straightforward.

Best Practices

Following best practices ensures optimal performance, reliability, and maintainability when using the Python Client SDK in production applications.

Connection Management

Applications should create a single client instance and reuse it across the application rather than creating new clients for each operation. The client manages connection pooling and state internally, and creating multiple instances can lead to resource waste and inconsistent state.

client = chromadb.HttpClient(host="localhost", port=8000)

def get_collection():
    return client.get_collection("my-documents")

For applications that require clean-up, the client should be properly closed when the application terminates, ensuring any pending operations complete and resources are released.

Batch Operations

When adding or querying large numbers of records, batching operations improves performance by reducing network overhead and allowing the server to optimize processing. The SDK handles batching internally for the most common operations, but applications should be aware of batch size considerations.

Error Recovery

Production applications should implement comprehensive error handling that distinguishes between recoverable errors (like temporary network issues) and non-recoverable errors (like invalid input). Recoverable errors can be handled with retry logic, while non-recoverable errors should surface appropriate feedback to users.

For further information on using Chroma's Python Client SDK, the following resources provide additional context and examples.

The official Chroma documentation at trychroma.com provides comprehensive guides on getting started, deployment options, and advanced usage patterns. The documentation includes tutorials, API reference material, and example applications that demonstrate real-world usage.

The GitHub repository at github.com/chroma-core/chroma contains the complete source code for Chroma, including the Python Client SDK. Developers interested in understanding implementation details or contributing to the project can explore the codebase directly.

The Chroma Discord community provides a forum for asking questions, sharing experiences, and connecting with other developers using Chroma. The community is an excellent resource for troubleshooting issues and discovering best practices from experienced users.

Source: https://github.com/chroma-core/chroma / Human Manual

JavaScript/TypeScript Client SDKs

Related topics: Python Client SDK, Getting Started with Chroma

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Client Package Versions

Continue reading this section for the full explanation and source context.

Section Legacy Client (v2.x)

Continue reading this section for the full explanation and source context.

Section New-JS Client (v3.x)

Continue reading this section for the full explanation and source context.

Related topics: Python Client SDK, Getting Started with Chroma

JavaScript/TypeScript Client SDKs

Chroma provides comprehensive JavaScript and TypeScript client libraries for interacting with Chroma servers from browser and Node.js environments. The SDKs offer both low-level HTTP API access and high-level abstractions for collections, embedding functions, and query operations.

Architecture Overview

Chroma maintains two generations of JavaScript clients to support different use cases and ecosystem requirements.

graph TD
    A[Chroma Server] <--> B[HTTP API];
    B <--> C[Legacy JS Client v2.4.7];
    B <--> D[new-js Client v3.4.5];
    C --> E[chromadb<br/>Bundled];
    C --> F[chromadb-client<br/>Peer Dependencies];
    D --> G[ChromaClient];
    D --> H[Embedding Functions<br/>via @chroma-core/*];

Client Package Versions

PackageVersionTypeDescription
chromadb (legacy)2.4.7npmBundled package with all embedding libraries included
chromadb-client (legacy)2.4.7npmClient package requiring peer dependencies
chromadb (new-js)3.4.5npmModern client with modular architecture
@internal/chromadb-core2.4.7workspaceShared core functionality

Sources: clients/js/packages/chromadb/package.json:3 Sources: clients/new-js/packages/chromadb/package.json:3

Package Structure

Legacy Client (v2.x)

The legacy client provides two distribution options:

graph LR
    A[chromadb] --> B[chromadb-core<br/>+ All Embeddings];
    C[chromadb-client] --> D[chromadb-core<br/>+ Peer Dependencies];
    B --> E[@google/generative-ai];
    B --> F[@xenova/transformers];
    B --> G[cohere-ai];
    D --> E;
    D --> F;
    D --> G;
PackageUse CaseEmbedding Libraries
chromadbSimple projects wanting everything includedBundled with all providers
chromadb-clientProjects needing specific embedding librariesPeer dependencies required

Sources: clients/js/packages/chromadb-client/package.json:1-55

New-JS Client (v3.x)

The new JavaScript client uses a modular workspace architecture with the following structure:

clients/new-js/
├── packages/
│   ├── chromadb/                    # Core client package
│   │   └── src/
│   │       ├── chroma-client.ts     # Main client implementation
│   │       └── api/
│   │           └── sdk.gen.ts       # Generated API client
│   └── ai-embeddings/
│       ├── common/                  # Shared utilities
│       ├── all/                     # Aggregated providers
│       ├── chroma-bm25/             # BM25 sparse embeddings
│       ├── cohere/                  # Cohere provider
│       ├── google-gemini/           # Google Gemini provider
│       ├── huggingface-server/      # HuggingFace server
│       ├── jina/                    # Jina AI provider
│       ├── together-ai/             # Together AI provider
│       └── voyageai/                # Voyage AI provider

Sources: clients/new-js/packages/ai-embeddings/all/package.json:1-45

Module Exports Configuration

Both client generations support modern JavaScript module resolution with ESM and CommonJS exports.

Export Structure

graph TD
    A[Package Entry] --> B{Import Type};
    B -->|ESM import| C[.mjs / .d.ts];
    B -->|CommonJS require| D[.cjs / .d.cts];
    C --> E[dist/*.mjs];
    D --> F[dist/cjs/*.cjs];
Export ConditionEntry PointType Definitions
ESM importdist/chromadb.mjsdist/chromadb.d.ts
CommonJS requiredist/cjs/chromadb.cjsdist/cjs/chromadb.d.cts

Sources: clients/js/packages/chromadb/package.json:12-25 Sources: clients/new-js/packages/chromadb/package.json:12-25

Client Initialization

Basic Connection

import { ChromaClient } from "chromadb";

// Initialize the client
const chroma = new ChromaClient({ 
  path: "http://localhost:8000" 
});

Sources: clients/js/packages/chromadb-client/README.md:15-20

With Embedding Function

import { ChromaClient } from 'chromadb';
import { TogetherAIEmbeddingFunction } from '@chroma-core/together-ai';

const embedder = new TogetherAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'togethercomputer/m2-bert-80M-8k-retrieval',
});

const client = new ChromaClient({
  path: 'http://localhost:8000',
});

Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md:1-35

Collection Operations

Collections are the primary data structure for storing and querying embeddings.

Create Collection

const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: embedder,  // Optional
  metadata: {                    // Optional
    description: "My document collection"
  }
});

Add Documents

await collection.add({
  ids: ["id1", "id2"],
  embeddings: [                  // Optional if embedding function provided
    [1.1, 2.3, 3.2],
    [4.5, 6.9, 4.4],
  ],
  metadatas: [{ source: "doc1" }, { source: "doc2" }],
  documents: ["Document 1 content", "Document 2 content"],
});

Query Collection

const results = await collection.query({
  queryEmbeddings: [1.1, 2.3, 3.2],    // Or queryTexts with embedding function
  queryTexts: ["Sample query"],          // Text query (uses embedding function)
  nResults: 2,                           // Number of results
  where: { source: "doc1" },             // Optional metadata filter
  include: ["documents", "metadatas", "distances"]
});

Sources: clients/js/packages/chromadb-client/README.md:25-50

Embedding Function Providers

The new-js client provides first-class support for multiple embedding providers through the @chroma-core/* packages.

Available Providers

Provider PackageModel ExamplesAPI Required
@chroma-core/together-aitogethercomputer/m2-bert-80M-8k-retrievalYes
@chroma-core/voyageaivoyage-2Yes
@chroma-core/google-geminitext-embedding-004Yes
@chroma-core/jinajina-embeddings-v2-base-enYes
@chroma-core/cohereVarious Cohere modelsYes
@chroma-core/chroma-bm25N/A (local algorithm)No
@chroma-core/allAll providers bundledVaries

Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md Sources: clients/new-js/packages/ai-embeddings/voyageai/README.md

Configuration Options

Each embedding function supports common configuration patterns:

const embedder = new SomeEmbeddingFunction({
  apiKey: 'your-api-key',          // Or set via environment variable
  apiKeyEnvVar: 'PROVIDER_API_KEY', // Default env var name
  modelName: 'provider-model-name', // Provider-specific model
  // Provider-specific options
  task: 'retrieval.passage',       // Jina example
  dimensions: 768,                  // Jina example
  truncate: true,                   // Jina example
  normalized: true,                 // Jina example
});

Environment Variable Configuration

ProviderEnvironment Variable
Together AITOGETHER_API_KEY
Voyage AIVOYAGE_API_KEY
Google GeminiGEMINI_API_KEY
JinaJINA_API_KEY

Sources: clients/new-js/packages/ai-embeddings/jina/README.md:1-45

Rust Native Bindings

For performance-critical applications, Chroma provides pre-built Rust native bindings for Node.js.

Supported Platforms

Package NameOSArchitectureLibC
chromadb-js-bindings-darwin-x64macOS (Intel)x64N/A
chromadb-js-bindings-darwin-arm64macOS (Apple Silicon)arm64N/A
chromadb-js-bindings-linux-x64-gnuLinuxx64glibc
chromadb-js-bindings-linux-arm64-gnuLinuxarm64glibc

All bindings versions: 1.3.4 Minimum Node.js version: >= 10

Sources: rust/js_bindings/npm/darwin-x64/package.json:1-18 Sources: rust/js_bindings/npm/linux-x64-gnu/package.json:1-18

Build and Development

Build Scripts

CommandDescription
pnpm buildBuild all packages
pnpm build:coreBuild only @internal/chromadb-core
pnpm build:packagesBuild all packages except core
pnpm watchWatch mode for development
pnpm testRun all tests
pnpm test:functionalRun functional tests (excluding auth)

New-JS Client Build Configuration

{
  "scripts": {
    "build": "tsup",
    "watch": "tsup --watch",
    "typecheck": "tsc --noEmit"
  }
}

Build tooling uses tsup for efficient bundling with TypeScript support.

Sources: clients/new-js/packages/ai-embeddings/common/package.json:18-25 Sources: clients/js/package.json:22-30

Choosing a Client Package

graph TD
    A[Start] --> B{Do you need all embedding providers?};
    B -->|Yes, convenience| C[chromadb v2.4.7<br/>or @chroma-core/all + chromadb v3.4.5];
    B -->|No, want to minimize bundle| D{Do you have embedding requirements?};
    D -->|Yes, specific providers| E[chromadb-client v2.4.7<br/>with peer dependencies];
    D -->|No, just vector storage| F[chromadb-client v2.4.7<br/>or chromadb v3.4.5];
    C --> G[Include all embedding libraries];
    E --> H[Only install needed providers];
    F --> I[No embedding function needed];

Decision Matrix

RequirementRecommended Package
Simple setup, all featureschromadb (bundled)
Minimal bundle sizechromadb-client with peer deps
Modern architecturechromadb (new-js v3.4.5)
BM25 sparse embeddings@chroma-core/chroma-bm25
Cloud/Remote providers@chroma-core/* packages

Sources: clients/js/examples/node/README.md:1-45

TypeScript Support

All JavaScript client packages include full TypeScript type definitions:

{
  "types": "dist/chromadb.d.ts",
  "exports": {
    ".": {
      "import": {
        "types": "./dist/chromadb.d.ts"
      },
      "require": {
        "types": "./dist/cjs/chromadb.d.cts"
      }
    }
  }
}

The TypeScript minimum version requirement is ^5.0.4 for the legacy client and ^5.3.3 for new-js packages.

Sources: clients/js/packages/chromadb/package.json:8 Sources: clients/new-js/packages/ai-embeddings/common/package.json:30

Dependencies

Core Dependencies

PackageVersionPurpose
isomorphic-fetch^3.0.0HTTP client for browser/Node.js
ajv^8.12.0 / ^8.17.1JSON schema validation
cliui^8.0.1CLI utilities

Node.js Compatibility

Package GenerationMinimum Node.js
Legacy (v2.x)>= 14.17.0
New-JS (v3.x)>= 20
Rust Bindings>= 10

Sources: clients/js/packages/chromadb-client/package.json:50-55 Sources: clients/new-js/packages/ai-embeddings/common/package.json:35-38

Sources: [clients/js/packages/chromadb/package.json:3](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)

Rust Backend Services Architecture

Related topics: System Architecture Overview, Data Storage & Blockstore

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Design Goals

Continue reading this section for the full explanation and source context.

Section Core Service Components

Continue reading this section for the full explanation and source context.

Section Arrow-Based Storage

Continue reading this section for the full explanation and source context.

Related topics: System Architecture Overview, Data Storage & Blockstore

Rust Backend Services Architecture

Overview

The Chroma Rust backend provides a high-performance, scalable vector database service built entirely in Rust. The architecture follows a distributed systems design with multiple specialized services working together to handle embedding storage, indexing, and similarity search operations.

Design Goals

GoalDescription
High PerformanceArrow-based columnar storage for efficient data access
ScalabilityMulti-cloud, multi-region deployment support
ReliabilityComprehensive error handling with typed error codes
FlexibilityMultiple index types (HNSW, Spann, Inverted)
ConsistencyOrdered and unordered mutation ordering options

Core Service Components

graph TD
    subgraph "Rust Backend Services"
        W[Worker Service]
        BS[Blockstore Service]
        SYS[Sysdb Service]
        LOG[Log Service]
    end
    
    W --> BS
    W --> SYS
    W --> LOG

Blockstore Architecture

The blockstore is the core storage layer in Chroma's Rust backend, providing persistent storage for vector embeddings and associated metadata using Arrow columnar format.

Arrow-Based Storage

Chroma uses Apache Arrow as its primary storage format, which provides:

  • Columnar Layout: Efficient analytic queries by column
  • Zero-Copy Reads: Memory-mapped access patterns
  • Cross-Language Interop: Standardized binary format
  • Compression Support: Built-in encoding/decoding

Sources: rust/blockstore/src/arrow/root.rs:1-40

Blockfile Structure

graph TD
    subgraph "Blockfile Components"
        BF[Blockfile]
        BR[Block Reader]
        BW[Block Writer]
        RM[Root Manager]
        BM[Block Manager]
    end
    
    BF --> BR
    BF --> BW
    BW --> RM
    BR --> BM

#### Root Management

The Root component manages the root directory structure and file operations:

pub(super) fn get_all_block_ids_from_bytes(
    bytes: &[u8],
    id: Uuid,
) -> Result<Vec<Uuid>, FromBytesError>

Key responsibilities:

  • Reading Arrow IPC files
  • Extracting block metadata and IDs
  • Version validation and verification

Sources: rust/blockstore/src/arrow/root.rs:28-50

#### Block Layout Verification

The block layout verification ensures data integrity:

#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}

Sources: rust/blockstore/src/arrow/block/types.rs:1-30

Error TypeError CodeSeverity
BufferLengthNotAlignedInternalHigh
NoRecordBatchesInternalHigh
MultipleRecordBatchesInternalMedium
InvalidMessageTypeInternalHigh
RecordBatchDecodeErrorInternalHigh

Blockfile Writer Types

Chroma supports two mutation ordering strategies:

Ordering TypeDescriptionUse Case
OrderedSequential writes with guaranteed orderConsistent state
UnorderedParallel writes for throughputHigh-volume ingestion

Sources: rust/blockstore/src/arrow/provider.rs:1-50

match options.mutation_ordering {
    BlockfileWriterMutationOrdering::Ordered => {
        let file = ArrowOrderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowOrderedBlockfileWriter(file))
    }
    BlockfileWriterMutationOrdering::Unordered => {
        let file = ArrowUnorderedBlockfileWriter::from_root(...);
        Ok(BlockfileWriter::ArrowUnorderedBlockfileWriter(file))
    }
}

Forking and Versioning

Blockfiles support forking for snapshot isolation:

let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await

Sources: rust/blockstore/src/arrow/provider.rs:1-30

Type System

Query Result Types

The execution layer uses a rich type system for search results:

#[derive(Clone, Debug, Default)]
pub struct SearchPayloadResult {
    pub records: Vec<SearchRecord>,
}

Sources: rust/types/src/execution/operator.rs:1-20

#### Search Results Structure

graph LR
    SR[SearchResult] --> SPR[SearchPayloadResult]
    SPR --> SR_vec[Vec<SearchRecord>]
    SR --> PLB[pulled_log_bytes]
FieldTypeDescription
resultsVec<SearchPayloadResult>Per-query search results
pulled_log_bytesu64Total log bytes fetched for metrics

Include Enum

The Include enum controls which fields are returned in query results:

pub enum Include {
    #[serde(rename = "distances")]
    Distance,
    #[serde(rename = "documents")]
    Document,
    #[serde(rename = "embeddings")]
    Embedding,
    #[serde(rename = "metadatas")]
    Metadata,
    #[serde(rename = "uris")]
    Uri,
}

Sources: rust/types/src/api_types.rs:1-30

Include ValueReturned FieldDefault Query
distancesDistance scores
documentsText content
embeddingsVector data
metadatasMetadata objects
urisResource URIs

#### IncludeList Helper Methods

impl IncludeList {
    pub fn empty() -> Self { Self(Vec::new()) }
    
    pub fn default_query() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance])
    }
    
    pub fn default_get() -> Self {
        Self(vec![Include::Document, Include::Metadata])
    }
    
    pub fn all() -> Self {
        Self(vec![Include::Document, Include::Metadata, Include::Distance, 
                  Include::Embedding, Include::Uri])
    }
}

Sources: rust/types/src/api_types.rs:1-60

Key Filter System

The Key enum represents filterable fields in metadata queries:

pub enum Key {
    Document,
    Embedding,
    Metadata,
    Score,
    MetadataField(String),
}

Sources: rust/types/src/operator.rs:1-30

KeyPurposeExample
#documentDocument contentKey::Document
#embeddingVector dataKey::Embedding
#metadataAll metadataKey::Metadata
#scoreSimilarity scoreKey::Score
field_nameCustom metadataKey::MetadataField("status")

#### Key Factory Methods

impl Key {
    /// Creates a Key for a custom metadata field
    pub fn field(name: impl Into<String>) -> Self {
        Key::MetadataField(name.into())
    }
    
    /// Creates an equality filter: `field == value`
    pub fn eq(self, value: impl Into<MetadataValue>) -> ComparisonValue { ... }
}

Index Architecture

Spann Index

Spann is Chroma's sparse vector index implementation combining HNSW with posting lists:

#[derive(Clone, Debug)]
pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
    pub adaptive_search_nprobe: bool,
    pub params: InternalSpannConfiguration,
}

Sources: rust/index/src/spann/types.rs:1-30

#### Spann Index Structure

graph TD
    subgraph "Spann Index"
        SPI[SpannIndexReader]
        HNSW[HNSW Index]
        PL[Posting Lists]
        VM[Versions Map]
    end
    
    SPI --> HNSW
    SPI --> PL
    SPI --> VM
ComponentTypePurpose
hnsw_indexHnswIndexRefApproximate nearest neighbor search
posting_listsBlockfileReader<u32, SpannPostingList>Document postings
versions_mapBlockfileReader<u32, u32>Document versioning
adaptive_search_nprobeboolAdaptive parameter tuning

Sparse Posting Block

The sparse posting block implements an inverted index structure:

#[derive(Debug, Clone)]
pub struct DirectoryBlock(SparsePostingBlock);

impl DirectoryBlock {
    pub fn new(max_offsets: &[u32], max_weights: &[f32]) 
        -> Result<Self, SparsePostingBlockError>
}

Sources: rust/types/src/sparse_posting_block.rs:1-40

FieldTypeDescription
max_offsetu32Largest doc offset in posting block
max_weightf32Maximum weight for term pruning

Schema and Index Configuration

Collection Schema

The schema system supports multiple index types:

pub struct Schema {
    pub fn create_index(
        mut self,
        key: Option<&str>,
        config: IndexConfig,
    ) -> Result<Self, SchemaBuilderError>
}

Sources: rust/types/src/collection_schema.rs:1-50

Index TypeKeyDescription
VectorIndexConfigNoneGlobal vector index (HNSW/Spann)
StringInvertedIndexConfigSome(field)Field-specific FTS
SparseVectorIndexConfigSome(field)Sparse vector index

Index Configuration

pub struct VectorIndexConfig {
    pub space: Option<Space>,
    pub embedding_function: Option<EmbeddingFunctionId>,
    pub source_key: Option<Key>,
    pub hnsw: Option<HnswConfig>,
    pub spann: Option<SpannConfig>,
}
ParameterTypeDefaultDescription
spaceOption<Space>NoneVector space (Cosine, L2, etc.)
embedding_functionOption<EFId>NoneEmbedding function ID
hnswOption<HnswConfig>NoneHNSW parameters
spannOption<SpannConfig>NoneSpann parameters

Worker Service Architecture

Work Queue Client

The work queue client manages distributed task execution:

pub enum WorkQueueClientError {
    ConnectionError(#[from] tonic::Status),
    RequestError(#[from] tonic::Status),
}

Sources: rust/worker/src/work_queue/work_queue_client.rs:1-20

#### Error Code Mapping

gRPC CodeChroma Error Code
UnavailableUnavailable
DeadlineExceededDeadlineExceeded
ResourceExhaustedResourceExhausted
InvalidArgumentInvalidArgument
NotFoundNotFound
PermissionDeniedPermissionDenied

Apply Logs Orchestrator

The apply logs orchestrator handles log-based data synchronization:

#[derive(Debug)]
pub struct ApplyLogsOrchestratorResponse {
    pub job_id: JobId,
    pub total_records_post_compaction: u64,
    pub flush_results: Vec<SegmentFlushInfo>,
    pub collection_logical_size_bytes: u64,
}

Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-50

KNN Filter Architecture

The KNN filter orchestrates vector similarity search:

graph TD
    subgraph "KNN Query Pipeline"
        Q[Query Request]
        F[Filter Logs]
        K[KNN Search]
        R[Results]
    end
    
    Q --> F
    F --> K
    K --> R

#### KNN Error Handling

pub enum KnnError {
    QuantizedSpannCenterSearch(QuantizedSpannError),
    QuantizedSpannLoadCenter(QuantizedSpannError),
    InvalidDistanceFunction,
    Aborted,
    InvalidSchema(#[from] SchemaError),
}

Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1-40

Error TypeError CodeTraced
QuantizedSpannCenterSearchFrom inner
InvalidDistanceFunctionInvalidArgument
AbortedResourceExhausted
Result(_)Internal

KNN Filter Output

#[derive(Clone, Debug)]
pub struct KnnFilterOutput {
    pub logs: FetchLogOutput,
    pub fetch_log_bytes: u64,
    pub filter_output: FilterOutput,
    pub dimension: usize,
    pub distance_function: DistanceFunction,
}

Multi-Cloud Topology

Chroma supports multi-cloud and multi-region deployments:

pub struct ProviderRegion<T: Clone + Debug> {
    pub name: RegionName,
    pub provider: String,
    pub region: String,
    pub config: T,
}

Sources: rust/types/src/topology.rs:1-30

Topology Structure

graph TD
    subgraph "Multi-Cloud Topology"
        Config[Configuration]
        Topologies[Vec<Topology>]
        Regions[Vec<ProviderRegion>]
        Preferred[Preferred Region]
    end
    
    Config --> Topologies
    Config --> Regions
    Config --> Preferred

Configuration Schema

struct RawMultiCloudMultiRegionConfiguration<R, T> {
    preferred: RegionName,
    regions: Vec<ProviderRegion<R>>,
    topologies: Vec<Topology<T>>,
}
FieldTypeDescription
preferredRegionNameDefault region for operations
regionsVec<ProviderRegion>Available cloud regions
topologiesVec<Topology>Topology configurations

Error Handling Framework

Chroma Error Traits

All errors implement the ChromaError trait:

pub trait ChromaError: std::error::Error {
    fn code(&self) -> ErrorCodes;
    fn should_trace_error(&self) -> bool;
}

Error Code Registry

CodeCategoryDescription
InvalidArgumentClientMalformed request
NotFoundClientResource missing
AlreadyExistsClientDuplicate resource
PermissionDeniedSecurityAccess denied
ResourceExhaustedRateQuota exceeded
InternalServerSystem error

CLI Integration

The Rust CLI provides management commands:

pub enum Command {
    Browse(BrowseArgs),
    Copy(CopyArgs),
    Db(DbSubcommand),
    Docs,
    Install(InstallArgs),
    Login(LoginArgs),
    Profile(ProfileSubcommand),
    Run(RunArgs),
    Support,
    Update,
    Vacuum(VacuumArgs),
}

Sources: rust/cli/src/lib.rs:1-30

Available Commands

CommandDescription
browseOpen web interface
copyCopy data between collections
dbDatabase operations
docsOpen documentation
installInstall Chroma
loginAuthenticate user
profilePerformance profiling
runStart Chroma server
supportOpen support resources
updateUpdate installation
vacuumCompact storage

See Also

Sources: [rust/blockstore/src/arrow/root.rs:1-40]()

Go Coordinator & Distributed Systems

Related topics: System Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section BlockfileProvider

Continue reading this section for the full explanation and source context.

Section BlockfileReader

Continue reading this section for the full explanation and source context.

Section BlockfileWriter

Continue reading this section for the full explanation and source context.

Related topics: System Architecture Overview

I cannot generate this wiki page as specified. The `

Data Storage & Blockstore

Overview

The Chroma blockstore is the core storage subsystem responsible for persisting vector embeddings, metadata, and related data structures. It provides a unified abstraction layer over different storage backends (in-memory and Arrow-based) while maintaining performance characteristics suitable for high-throughput vector database operations.

The blockstore system is architected around the concept of blockfiles — persistent, columnar storage structures that organize data by prefix-based partitioning and support efficient key-value operations.

Architecture

graph TD
    subgraph "Public API Layer"
        BP[BlockfileProvider]
        BR[BlockfileReader]
        BW[BlockfileWriter]
        BF[BlockfileFlusher]
    end

    subgraph "Implementation Layer"
        ABP[ArrowBlockfileProvider]
        MBP[MemoryBlockfileProvider]
        ABF[ArrowUnorderedBlockfileWriter]
        ABO[ArrowOrderedBlockfileWriter]
    end

    subgraph "Storage Layer"
        BM[BlockManager]
        RM[RootManager]
        ST[Storage]
    end

    subgraph "Arrow Format"
        R[Root]
        SB[Sparse Index]
        B[Blocks]
    end

    BP --> ABP
    BP --> MBP
    BR --> ABP
    BR --> MBP
    BW --> ABF
    BW --> ABO

    ABP --> BM
    ABP --> RM
    ABF --> BM
    ABF --> RM
    ABO --> BM
    ABO --> RM
    BM --> ST
    RM --> ST

    RM --> R
    R --> SB
    R --> B

Core Components

BlockfileProvider

The BlockfileProvider is the main entry point for creating readers and writers. It abstracts the underlying storage implementation and provides factory methods for blockfile operations.

Variants:

Provider TypeDescriptionUse Case
HashMapBlockfileProviderIn-memory blockfile storageTesting, ephemeral data
ArrowBlockfileProviderPersistent Arrow-based storageProduction workloads

API Methods:

pub fn storage(&self) -> Option<Arc<Storage>> {
    match self {
        BlockfileProvider::ArrowBlockfileProvider(provider) => Some(provider.storage().clone()),
        BlockfileProvider::HashMapBlockfileProvider(_) => None,
    }
}

pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}

Sources: rust/blockstore/src/provider.rs:1-30

BlockfileReader

The BlockfileReader trait provides read access to stored data. It supports generic key and value types that implement the ReadKey and ReadValue traits.

Trait Definition:

pub trait ReadKey<'a>:
    Key
    + Into<KeyWrapper>
    + TryFrom<&'a KeyWrapper, Error = InvalidKeyConversion>
    + ArrowReadableKey<'a>
    + Sync
    + 'a
{}

pub trait ReadValue<'a>: Value + Readable<'a> + ArrowReadableValue<'a> + Sync + 'a {}

Sources: rust/blockstore/src/provider.rs:40-55

BlockfileWriter

The BlockfileWriter trait provides write access to blockfiles with support for ordered and unordered mutation patterns.

Core Operations:

MethodSignatureDescription
setset(prefix, key, value)Insert or update a key-value pair
deletedelete(prefix, key)Remove a key-value pair
commitcommit()Finalize and persist the writer
pub async fn set<
    K: Key + Into<KeyWrapper> + ArrowWriteableKey,
    V: Value + Writeable + ArrowWriteableValue,
>(
    &self,
    prefix: &str,
    key: K,
    value: V,
) -> Result<(), Box<dyn ChromaError>>

Sources: rust/blockstore/src/types/writer.rs:50-75

Arrow Blockfile Implementation

The Arrow-based blockfile is the primary production storage implementation, providing efficient columnar storage with Arrow IPC format.

Blockfile Structure

graph TD
    R[Root File<br/>Root Writer] --> SB[Sparse Index<br/>Block Key Mapping]
    R --> BH[Block Header<br/>Metadata]
    
    SB --> B1[Block 1<br/>Arrow IPC]
    SB --> B2[Block 2<br/>Arrow IPC]
    SB --> BN[Block N<br/>Arrow IPC]
    
    B1 --> P1[Prefix: "vec_1"]
    B1 --> P2[Prefix: "vec_2"]

ArrowBlockfileProvider

The ArrowBlockfileProvider manages the lifecycle of blockfiles using Arrow IPC format with a root-sparse index architecture.

Key Features:

  • Fork Support: Create new blockfiles from existing ones via forking
  • CMEK Support: Optional Customer-Managed Encryption Keys
  • Block Size Management: Configurable maximum block sizes
pub async fn write<K: Key + ArrowWriteableKey, V: ArrowWriteableValue>(
    &self,
    options: BlockfileWriterOptions,
) -> Result<BlockfileWriter, Box<CreateError>>

Sources: rust/blockstore/src/arrow/provider.rs:1-50

Writer Types

#### ArrowUnorderedBlockfileWriter

Provides high-performance unordered writes optimized for bulk insertion scenarios.

impl ArrowUnorderedBlockfileWriter {
    pub(super) fn new<K: ArrowWriteableKey, V: ArrowWriteableValue>(
        id: Uuid,
        prefix_path: &str,
        block_manager: BlockManager,
        root_manager: RootManager,
        max_block_size_bytes: usize,
        cmek: Option<Cmek>,
    ) -> Self
}

Sources: rust/blockstore/src/arrow/blockfile.rs:50-80

#### ArrowOrderedBlockfileWriter

Maintains key ordering within blocks, optimized for range queries and ordered iteration.

Sources: rust/blockstore/src/arrow/ordered_blockfile_writer.rs:1-50

BlockManager and RootManager

ComponentResponsibility
BlockManagerManages individual data blocks, handles block creation and commitment
RootManagerManages root files containing sparse indices and metadata
// Forking a new root from an existing one
let new_root = self
    .root_manager
    .fork::<K>(
        &fork_from,
        new_id,
        &options.prefix_path,
        self.block_manager.default_max_block_size_bytes(),
    )
    .await

Sources: rust/blockstore/src/arrow/provider.rs:45-70

Error Handling

Error Types

Error TypeDescriptionError Code
BlockNotFoundRequested block does not existInternal
BlockFetchErrorFailed to retrieve block from storageInternal
MigrationErrorBlockfile migration failedInternal
IOErrorStorage I/O operation failedInternal
ArrowErrorArrow IPC parsing/encoding errorInternal
NoRecordBatchesInvalid Arrow file structureInternal
#[derive(Error, Debug)]
pub enum ArrowBlockfileError {
    #[error("Block not found")]
    BlockNotFound,
    #[error("Could not fetch block")]
    BlockFetchError(#[from] GetError),
    #[error("Could not migrate blockfile to new version")]
    MigrationError(#[from] MigrationError),
}

Sources: rust/blockstore/src/arrow/blockfile.rs:25-40

Layout Verification

The system validates Arrow file layouts to ensure data integrity:

#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
    #[error("Buffer length is not 64 byte aligned")]
    BufferLengthNotAligned,
    #[error("No record batches in footer")]
    NoRecordBatches,
    #[error("More than one record batch in IPC file")]
    MultipleRecordBatches,
    #[error("Invalid message type")]
    InvalidMessageType,
}

Sources: rust/blockstore/src/arrow/block/types.rs:40-60

Storage Operations

Write Flow

sequenceDiagram
    participant Client
    participant Provider as BlockfileProvider
    participant Writer as BlockfileWriter
    participant BM as BlockManager
    participant RM as RootManager
    participant Storage

    Client->>Provider: write(options)
    Provider->>Writer: create_writer()
    Provider->>RM: create/fork_root()
    Client->>Writer: set(prefix, key, value)
    Writer->>BM: create_block()
    loop Until flush
        Writer->>Writer: accumulate_data()
    end
    Client->>Writer: commit()
    Writer->>BM: commit_block()
    Writer->>RM: update_root()
    RM->>Storage: persist()
    BM->>Storage: persist()

Read Flow

sequenceDiagram
    participant Client
    participant Reader as BlockfileReader
    participant RM as RootManager
    participant BM as BlockManager
    participant Storage

    Client->>Reader: get(prefix, key)
    Reader->>RM: get_block_ids()
    RM->>Reader: block_id_list
    loop For each block
        Reader->>BM: get_block(id)
        BM->>Storage: read()
        Storage->>Reader: block_data
    end
    Reader->>Reader: search_blocks()
    Reader->>Client: value

Configuration Options

BlockfileWriterOptions

OptionTypeDefaultDescription
prefix_pathStringRequiredPath prefix for storage
max_block_size_bytesusizeProvider defaultMaximum size per block
mutation_orderingBlockfileWriterMutationOrderingOrderedWrite ordering mode
fork_fromOption<Uuid>NoneSource blockfile ID for forking
cmekOption<Cmek>NoneCustomer-managed encryption key
let mut bf_options = BlockfileWriterOptions::new(prefix_path.to_string())
    .max_block_size_bytes(pl_block_size);
bf_options = bf_options.unordered_mutations();
if let Some(cmek) = cmek {
    bf_options = bf_options.with_cmek(cmek);
}

Sources: rust/blockstore/src/arrow/provider.rs:90-110

Memory Blockfile

For testing and ephemeral use cases, Chroma provides an in-memory blockfile implementation:

pub fn new_memory() -> Self {
    BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}

Limitations:

  • No persistence
  • No fork support
  • Limited to unordered mutations
if options.fork_from.is_some() {
    unimplemented!();
}

Sources: rust/blockstore/src/memory/provider.rs:40-55

Block Reading

RootReader

The RootReader is responsible for reading block metadata and identifying which blocks contain specific data:

impl RootReader {
    pub(super) fn get_all_block_ids_from_bytes(
        bytes: &[u8],
        id: Uuid,
    ) -> Result<Vec<Uuid>, FromBytesError> {
        let mut cursor = std::io::Cursor::new(bytes);
        let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
        
        let record_batch = match arrow_reader {
            Ok(mut reader) => match reader.next() {
                Some(Ok(batch)) => batch,
                Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
                None => return Err(FromBytesError::NoDataError),
            },
            Err(e) => return Err(FromBytesError::ArrowError(e)),
        };
        
        let (version, read_id) = Self::version_and_id_from_record_batch(&record_batch, id)?;
        if read_id != id {
            return Err(FromBytesError::IdMismatch);
        }
        
        Self::block_ids_from_record_batch(&record_batch, version)
    }
}

Sources: rust/blockstore/src/arrow/root.rs:20-55

SpannIndex Integration

The blockstore is used by the Spann (Sparse + ANN) index for storing posting lists:

ComponentPurpose
SpannIndexReaderReads posting lists and HNSW indices
SpannIndexWriterCreates and manages posting list writers
SpannPostingListStores document IDs and embeddings
pub struct SpannIndexReader<'me> {
    pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
    pub hnsw_index: HnswIndexRef,
    pub versions_map: BlockfileReader<'me, u32, u32>,
    pub dimensionality: usize,
}

Sources: rust/index/src/spann/types.rs:30-45

Summary

The Chroma blockstore provides a robust, extensible storage layer built on Arrow IPC format. Key architectural decisions include:

  1. Separation of concerns: BlockManager handles data blocks while RootManager manages metadata and sparse indices
  2. Dual writer support: Ordered and unordered writers for different access patterns
  3. Forking capability: Efficient creation of derived blockfiles without full copies
  4. Error classification: Clear mapping from internal errors to error codes for API responses
  5. Type-safe abstractions: Generic key-value traits enabling flexible data modeling

Sources: [rust/blockstore/src/provider.rs:1-30](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)

Embedding Functions Integration

Related topics: Python Client SDK, Data Storage & Blockstore

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Architecture

Continue reading this section for the full explanation and source context.

Section Embedding Function Package Structure

Continue reading this section for the full explanation and source context.

Section Common Utilities Package

Continue reading this section for the full explanation and source context.

Related topics: Python Client SDK, Data Storage & Blockstore

Embedding Functions Integration

Overview

Embedding Functions in Chroma provide a standardized interface for converting text into vector embeddings. Chroma supports multiple embedding providers through a plugin architecture that allows developers to use custom embedding functions or leverage hosted services like OpenAI, Cohere, Ollama, and others.

The embedding function system serves as the bridge between raw text data and the vector representation used for similarity search. Each embedding function implements a consistent interface that handles API communication, request formatting, and response parsing for its respective provider.

Sources: clients/new-js/packages/ai-embeddings/common/README.md

Architecture

High-Level Architecture

graph TD
    A[Client Application] --> B[Chroma Collection]
    B --> C[Embedding Function]
    C --> D[Embedding Provider API]
    D --> E[Vector Embeddings]
    E --> B
    
    F[@chroma-core/openai] --> C
    G[@chroma-core/ollama] --> C
    H[@chroma-core/cohere] --> C
    I[@chroma-core/morph] --> C
    J[@chroma-core/all] --> C

Embedding Function Package Structure

Chroma organizes embedding functions into separate packages under the @chroma-core namespace. Each package focuses on a specific provider while sharing common utilities.

PackageProviderEnvironment Support
@chroma-core/ai-embeddings-commonShared utilitiesNode.js + Browser
@chroma-core/openaiOpenAINode.js + Browser
@chroma-core/ollamaOllama (local)Node.js + Browser
@chroma-core/cohereCohereNode.js + Browser
@chroma-core/jinaJina AINode.js + Browser
@chroma-core/morphMorphNode.js
@chroma-core/allAll providersNode.js + Browser

Sources: clients/new-js/packages/ai-embeddings/all/README.md

Core Components

Common Utilities Package

The @chroma-core/ai-embeddings-common package provides shared functionality used by all embedding function implementations:

import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';

Key Features:

FeaturePurpose
validateConfigSchemaValidates embedding function configurations using JSON schemas
snakeCaseConverts camelCase JavaScript objects to snake_case for API compatibility
isBrowserDetects browser vs Node.js runtime environment

Sources: clients/new-js/packages/ai-embeddings/common/README.md

Dynamic Loading Mechanism

The embedding function system supports dynamic loading of packages based on configuration:

const fullPackageName = `@chroma-core/${packageName}`;
await import(fullPackageName);
embeddingFunction = knownEmbeddingFunctions.get(packageName);

The system maintains mappings for known embedding function names and handles package resolution automatically when a collection is configured with a specific embedding provider.

Sources: clients/new-js/packages/chromadb/src/embedding-function.ts

Configuration Schema

Embedding functions support structured configuration with schema validation. Configuration options vary by provider but typically include:

ParameterDescriptionProvider Support
apiKeyAPI key for authenticationOpenAI, Cohere, Jina, Gemini
modelNameSpecific model identifierAll providers
apiBaseCustom API endpoint URLOllama, Morph, Gemini
encodingFormatOutput format (float/base64)OpenAI, Morph

Sources: clients/new-js/packages/ai-embeddings/morph/README.md

Provider Implementations

OpenAI Embeddings

The OpenAI embedding function supports the OpenAI API for generating text embeddings:

import { OpenAIEmbeddingFunction } from '@chroma-core/openai';

const openAIEF = new OpenAIEmbeddingFunction({
  apiKey: 'your-api-key',
  modelName: 'text-embedding-3-small'
});

Ollama (Local Embeddings)

Ollama enables local embedding generation without external API calls:

# Install Ollama from ollama.ai
# Start the server
ollama serve
# Pull an embedding model
ollama pull chroma/all-minilm-l6-v2-f32

Supported Models:

ModelDimensions
chroma/all-minilm-l6-v2-f32 (default)384
nomic-embed-text768
mxbai-embed-large1024
snowflake-arctic-embedVariable

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md

Morph Embeddings

Morph provides embeddings optimized for code-related content:

const morphEmbedding = new MorphEmbeddingFunction({
  api_key: 'your-morph-api-key',
  model_name: 'morph-embedding-v2',
  api_base: 'https://api.morphllm.com/v1',
  encoding_format: 'float'
});

Sources: clients/new-js/packages/ai-embeddings/morph/README.md

Chroma Cloud Qwen

Hosted embedding service using Qwen models:

const qwenEmbedding = new QwenEmbeddingFunction({
  model: 'Qwen/Qwen3-Embedding-0.6B',
  task: 'document' // or 'query'
});

Configuration includes:

  • model: The Qwen model to use
  • task: Task type (document or query embedding)
  • instruction_dict: Custom instructions for specific tasks
  • apiKeyEnvVar: Environment variable for API key (default: CHROMA_API_KEY)

Sources: clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md

Collection Integration

Embedding Function in Collections

When creating a collection, the embedding function can be specified at multiple levels:

const collection = await chroma.createCollection({
  name: "my-collection",
  embeddingFunction: openAIEF  // Specify embedding function
});

Space Configuration

Embedding functions can define supported distance spaces and default configurations:

if (overallEf && overallEf.defaultSpace && overallEf.supportedSpaces) {
  if (configuration?.hnsw === undefined && configuration?.spann === undefined) {
    configuration.hnsw = { space: overallEf.defaultSpace() };
  }
}

The system validates that configured spaces are supported by the embedding function and warns if mismatches occur:

Space 'cosine' is not supported by embedding function 'openai'. 
Supported spaces: cosine, euclidean, dotproduct

Sources: clients/new-js/packages/chromadb/src/collection-configuration.ts

Query Response Structure

Include Parameter

Queries support specifying which data to include in results through the Include parameter:

pub enum Include {
    Distance,
    Document,
    Embedding,
    Metadata,
    Uri,
}

Default Inclusion Behavior:

OperationDefault Includes
QueryDocument, Metadata, Distance
GetDocument, Metadata

Include List Methods:

MethodReturns
IncludeList::empty()No includes
IncludeList::default_query()Document, Metadata, Distance
IncludeList::default_get()Document, Metadata
IncludeList::all()All five include types

Sources: rust/types/src/api_types.rs

Usage Patterns

Basic Usage with JavaScript Client

import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";

const chroma = new ChromaClient();
const embeddingFunction = new OpenAIEmbeddingFunction({
  apiKey: process.env.OPENAI_API_KEY
});

const collection = await chroma.createCollection({
  name: "documents",
  embeddingFunction: embeddingFunction
});

await collection.add({
  ids: ["doc-1", "doc-2"],
  documents: ["Document content here", "Another document"],
  metadatas: [{ source: "notion" }, { source: "google-docs" }]
});

const results = await collection.query({
  queryTexts: ["Search query"],
  nResults: 2
});

Python Client Usage

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.create_collection("documents")

collection.add(
    documents=["Document 1", "Document 2"],
    metadatas=[{"source": "notion"}, {"source": "google-docs"}],
    ids=["doc1", "doc2"],
    embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)

results = collection.query(
    query_texts=["Query document"],
    n_results=2
)

Sources: clients/new-js/packages/chromadb/README.md

Environment Detection

Embedding functions automatically detect the runtime environment to select the appropriate HTTP client:

import { isBrowser } from '@chroma-core/ai-embeddings-common';

if (isBrowser()) {
  // Use browser-compatible fetch
} else {
  // Use Node.js HTTP client
}

This enables packages like Ollama to work seamlessly in both browser and Node.js environments:

This package works in both Node.js and browser environments, automatically detecting the runtime and using the appropriate Ollama client.

Sources: clients/new-js/packages/ai-embeddings/ollama/README.md

Type Safety

The embedding function system provides TypeScript types and interfaces for:

  • Configuration validation
  • Response parsing
  • Error handling
  • Provider-specific options
export const getSparseEmbeddingFunction = async (
  client: ChromaClient,
  efConfig?: EmbeddingFunctionConfiguration
) => {
  // Returns SparseEmbeddingFunction instance or undefined
};

Sources: clients/new-js/packages/chromadb/src/embedding-function.ts

Summary

Embedding Functions Integration in Chroma provides a unified, extensible system for text vectorization. Key aspects include:

  1. Provider Abstraction: Standardized interface across multiple embedding providers
  2. Dynamic Loading: Packages loaded on-demand based on collection configuration
  3. Schema Validation: JSON schema-based configuration validation
  4. Cross-Platform: Support for both Node.js and browser environments
  5. Flexible Configuration: Provider-specific options with sensible defaults
  6. Space Support: Distance metric configuration aligned with embedding provider capabilities

The plugin architecture allows Chroma to integrate new embedding providers while maintaining API consistency across the SDK.

Sources: [clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.

1. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:546206616 | https://github.com/chroma-core/chroma | README/documentation is current enough for a first validation pass.

2. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | last_activity_observed missing

3. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

4. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium

5. Maintenance risk: issue_or_pr_quality=unknown

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | issue_or_pr_quality=unknown

6. Maintenance risk: release_recency=unknown

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | release_recency=unknown

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using chroma with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence