Doramagic Project Pack · Human Manual
chroma
Related topics: Getting Started with Chroma, System Architecture Overview
Chroma Overview
Related topics: Getting Started with Chroma, System Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Getting Started with Chroma, System Architecture Overview
Chroma Overview
Introduction
Chroma is an open-source data infrastructure platform designed specifically for AI applications. It provides the foundational building blocks for storing, querying, and managing vector embeddings along with associated metadata, enabling developers to build AI-powered applications with efficient similarity search capabilities. Sources: README.md:1
As an open-source solution, Chroma offers flexibility for self-hosting while also providing a cloud-hosted option called Chroma Cloud, which delivers serverless vector, hybrid, and full-text search capabilities. The platform is designed to be fast, cost-effective, scalable, and straightforward to deploy. Sources: README.md:17-21
Architecture Overview
Chroma follows a client-server architecture with multiple client libraries available for different programming environments. The system is built with Rust for core performance-critical components and provides idiomatic client libraries for Python and JavaScript/TypeScript.
graph TD
A[Client Applications] --> B[Python Client / JS Client]
B --> C[Chroma Server API]
C --> D[Worker Nodes]
D --> E[Blockstore<br/>Arrow Storage]
D --> F[Compaction &<br/>Log Processing]
E --> G[Persistent Storage]
H[Chroma Cloud] -.->|Optional hosted| CClient Libraries
Chroma provides two primary client libraries:
| Client | Package | Description |
|---|---|---|
| Python | chromadb | Full-featured Python client library Sources: clients/python/README.md:1 |
| Python HTTP | chromadb-client | Lightweight HTTP-only client for server connections Sources: clients/python/README.md:12 |
| JavaScript/TypeScript | chromadb (npm) | Full-featured JS client for Node.js and browser Sources: clients/new-js/packages/chromadb/README.md:1 |
#### Python Client Installation
pip install chromadb # Full client library
pip install chromadb-client # HTTP client only
#### JavaScript Client Example
import { ChromaClient } from "chromadb";
const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });
for (let i = 0; i < 20; i++) {
await collection.add({
ids: ["test-id-" + i.toString()],
embeddings: [[1, 2, 3, 4, 5]],
documents: ["test"],
});
}
const queryData = await collection.query({
queryEmbeddings: [[1, 2, 3, 4, 5]],
queryTexts: ["test"],
});
Sources: clients/new-js/packages/chromadb/README.md:9-27
Data Model
Collection Structure
Collections in Chroma serve as the primary organizational unit for storing related documents and their associated embeddings. Each collection contains:
- Documents: The textual content to be embedded
- Embeddings: Vector representations of documents
- Metadatas: Key-value pairs for filtering and categorization
- Unique Identifiers: User-provided IDs for each record Sources: clients/python/README.md:16-27
Metadata Filtering
Chroma supports rich metadata filtering through operators that enable precise data retrieval:
graph LR
A[Query Request] --> B[Metadata Filter]
B --> C{Operator Type}
C -->|Contains| D[String contains check]
C -->|NotContains| E[String excludes check]
C -->|Regex| F[Regular expression match]
C -->|NotRegex| G[Regex exclusion]Supported Document Operators:
| Operator | Description | Example |
|---|---|---|
Contains | Document contains substring | {"$contains": "keyword"} |
NotContains | Document excludes substring | {"$not_contains": "spam"} |
Regex | Regular expression match | {"$regex": "^prefix.*"} |
NotRegex | Exclude by regex pattern | {"$not_regex": ".*suffix$"} |
Sources: rust/types/src/metadata.rs:1-30
Search Keys
The query system supports specialized keys for accessing different aspects of stored data:
| Key | Description | Usage |
|---|---|---|
#document | Full text content | Key::Document |
#embedding | Vector embeddings | Key::Embedding |
#metadata | Record metadata | Key::Metadata |
#score | Similarity score | Key::Score |
| Custom fields | User-defined metadata | Key::field("field_name") |
Sources: rust/types/src/execution/operator.rs:1-80
Core Components
Storage Layer
The blockstore provides the underlying storage mechanism using Arrow format for efficient columnar data storage and retrieval. This enables high-performance queries across large datasets. Sources: rust/blockstore/src/arrow/root.rs:1
Execution Operators
Chroma's query execution pipeline uses operators that transform and filter data through well-defined stages:
graph TD
A[Query Request] --> B[Log Fetch Orchestrator]
B --> C[KNN Filter]
C --> D[Apply Logs Orchestrator]
D --> E[Segment Writers]
E --> F[Compact Collection]Key Orchestrators:
| Component | Purpose |
|---|---|
LogFetchOrchestrator | Fetches and materializes log entries Sources: rust/worker/src/execution/orchestration/log_fetch_orchestrator.rs:1 |
KnnFilter | Performs k-nearest neighbor filtering Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1 |
ApplyLogsOrchestrator | Applies log entries to segment writers Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1 |
Error Handling
The system uses a consistent error code hierarchy for reliable error management:
| Error Code | Description |
|---|---|
InvalidArgument | Client-provided invalid parameters |
Internal | System-level internal errors |
ResourceExhausted | Resource limits reached (e.g., task abortion) |
Sources: rust/blockstore/src/arrow/block/types.rs:1-20
Deployment Options
Self-Hosting
Chroma can be deployed on-premises or in cloud environments using Docker, Kubernetes, or direct installation.
Deployment Requirements:
| Component | Specification |
|---|---|
| Storage | Persistent volume for vector data |
| Network | Port 8000 for API access |
| Auth | Optional token or basic authentication (v0.4.7+) |
Sources: examples/deployments/do-terraform/README.md:1-50
Starting the Server:
# Install via pip
pip install chromadb
# Run in client-server mode
chroma run --path /chroma_db_path
Sources: README.md:14-16
Chroma Cloud
Chroma Cloud provides a fully managed hosted service with:
- Serverless vector search
- Hybrid search capabilities
- Full-text search integration
- Automatic scaling
- $5 free credits for new users
Sources: README.md:23-29
Cloud Deployment (Terraform Example)
For DigitalOcean deployment:
export TF_VAR_do_token=<DIGITALOCEAN_TOKEN>
export TF_ssh_public_key="./chroma-do.pub"
export TF_ssh_private_key="./chroma-do"
export TF_VAR_chroma_release="0.4.12"
export TF_VAR_region="ams2"
export TF_VAR_public_access="true"
export TF_VAR_enable_auth="true"
export TF_VAR_auth_type="token"
terraform apply -auto-approve
Sources: examples/deployments/do-terraform/README.md:30-45
CLI Tool
The Rust-based CLI provides command-line management capabilities:
chroma run --path <db_path> # Run the server
chroma db create <db_name> # Create database
chroma db list # List databases
chroma login # Authenticate with Chroma Cloud
chroma profile # Manage profiles
chroma install # Install updates
chroma update # Check for updates
Sources: rust/cli/src/lib.rs:1-30
Embedding Integration
Ollama Integration
The JavaScript client supports Ollama for local embedding generation:
Configuration Options:
| Option | Default | Description |
|---|---|---|
url | http://localhost:11434 | Ollama server URL |
model | chroma/all-minilm-l6-v2-f32 | Embedding model |
Supported Models:
| Model | Dimensions | Use Case |
|---|---|---|
chroma/all-minilm-l6-v2-f32 | 384 | General purpose (default) |
nomic-embed-text | 768 | Extended context |
mxbai-embed-large | 1024 | High accuracy |
snowflake-arctic-embed | Variable | Domain-specific |
Sources: clients/new-js/packages/ai-embeddings/ollama/README.md:1-40
API Response Format
Get Response Structure
Query results are returned with flexible inclusion options:
pub struct GetResponse {
pub ids: Vec<String>,
pub embeddings: Option<Vec<Vec<f32>>>, // Optional
pub documents: Option<Vec<Option<String>>>, // Optional
pub uris: Option<Vec<Option<String>>>, // Optional
pub metadatas: Option<Vec<Option<Metadata>>>, // Optional
pub include: IncludeList,
}
Sources: rust/types/src/api_types.rs:1-30
License
Chroma is released under the Apache 2.0 license, making it suitable for both commercial and open-source projects. Sources: README.md:10
Community and Support
| Resource | Link |
|---|---|
| Documentation | https://docs.trychroma.com/ |
| Discord | https://discord.gg/MMeYNTmh3x |
| Homepage | https://www.trychroma.com/ |
Sources: [clients/new-js/packages/chromadb/README.md:9-27]()
Getting Started with Chroma
Related topics: Chroma Overview, Python Client SDK
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Chroma Overview, Python Client SDK
Getting Started with Chroma
Chroma is an open-source data infrastructure for AI that provides vector, hybrid, and full-text search capabilities. It enables developers to build AI applications by storing embeddings, documents, and metadata with efficient querying mechanisms.
Overview
Chroma serves as a vector database optimized for AI workloads. It allows you to:
- Store embeddings alongside documents and metadata
- Query using text or embedding vectors
- Filter results based on metadata
- Work with multiple programming languages including Python and JavaScript
Installation
Python Client
Install the Python client using pip:
pip install chromadb
For a lightweight HTTP-only client that connects to a Chroma server:
pip install chromadb-client
Sources: clients/python/README.md
JavaScript/TypeScript Client
For the new JavaScript client:
npm install chromadb
For a lighter package with optional dependencies:
npm install chromadb-client
Sources: clients/new-js/packages/chromadb/README.md
Basic Setup and Configuration
Python Client Setup
Connect to a Chroma server running locally:
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
Sources: clients/python/README.md
JavaScript Client Setup
import { ChromaClient } from "chromadb";
const chroma = new ChromaClient();
const collection = await chroma.createCollection({ name: "test-from-js" });
Sources: clients/new-js/packages/chromadb/README.md
Running Chroma Server
To run Chroma in client-server mode:
chroma run --path /chroma_db_path
Sources: README.md
Core Operations
Creating a Collection
Collections are containers for your documents, embeddings, and metadata.
collection = client.create_collection("all-my-documents")
Adding Documents
Add documents with optional embeddings, metadata, and unique IDs:
collection.add(
documents=["This is document1", "This is document2"],
metadatas=[{"source": "notion"}, {"source": "google-docs"}],
ids=["doc1", "doc2"],
embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)
Sources: clients/python/README.md
Querying Documents
Query the collection using text or embeddings:
results = collection.query(
query_texts=["This is a query document"],
n_results=2
)
const queryData = await collection.query({
queryEmbeddings: [[1, 2, 3, 4, 5]],
queryTexts: ["test"],
});
Sources: clients/python/README.md and clients/new-js/packages/chromadb/README.md
Embedding Functions
Chroma supports various embedding providers through configurable embedding functions.
Configuration Schema
Embedding functions use JSON Schema validation to ensure cross-language compatibility:
from chromadb.utils.embedding_functions.schemas import validate_config
config = {
"api_key_env_var": "CHROMA_OPENAI_API_KEY",
"model_name": "text-embedding-ada-002"
}
validate_config(config, "openai")
Each schema follows JSON Schema Draft-07 specification and includes version, title, description, properties, required fields, and additionalProperties settings.
Sources: chromadb/utils/embedding_functions/schemas/README.md
Available Embedding Providers
| Provider | Package | API Key Environment Variable |
|---|---|---|
| OpenAI | @chroma-core/openai | CHROMA_OPENAI_API_KEY |
| Cohere | @chroma-core/cohere | COHERE_API_KEY |
| Jina | @chroma-core/jina | JINA_API_KEY |
| Google Gemini | @chroma-core/google-gemini | GOOGLE_API_KEY |
| Hugging Face | @chroma-core/hugging-face | HF_API_KEY |
| Ollama | @chroma-core/ollama | OLLAMA_API_KEY |
| Together AI | @chroma-core/together-ai | TOGETHER_API_KEY |
| Voyage AI | @chroma-core/voyageai | VOOYAGE_API_KEY |
| xAI | @chroma-core/xai | XAI_API_KEY |
Sources: clients/new-js/packages/ai-embeddings/all/README.md
Using Embedding Functions
import { ChromaClient } from 'chromadb';
import { JinaEmbeddingFunction } from '@chroma-core/jina';
const embedder = new JinaEmbeddingFunction({
apiKey: 'your-api-key',
modelName: 'jina-embeddings-v2-base-en',
task: 'retrieval.passage',
dimensions: 768,
lateChunking: false,
truncate: true,
normalized: true,
embeddingType: 'float'
});
const collection = await client.createCollection({
name: 'my-collection',
embeddingFunction: embedder,
});
Sources: clients/new-js/packages/ai-embeddings/jina/README.md
Common Utilities
The @chroma-core/ai-embeddings-common package provides shared utilities:
import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';
// Convert camelCase to snake_case
const snakeCaseConfig = snakeCase({ modelName: 'text-embedding-3-small' });
// Result: { model_name: 'text-embedding-3-small' }
// Check environment
if (isBrowser()) {
// Browser-specific logic
}
Sources: clients/new-js/packages/ai-embeddings/common/README.md
JavaScript Client Packages
chromadb vs chromadb-client
| Feature | chromadb | chromadb-client |
|---|---|---|
| Package size | Larger | Smaller |
| Dependencies | Bundled | Optional peer dependencies |
| Use case | Quick setup | Production with specific providers |
The chromadb-client package is ideal for production environments where you only use specific embedding providers.
Sources: clients/js/packages/chromadb-client/README.md
Chroma Cloud
Chroma Cloud provides a hosted service for serverless vector, hybrid, and full-text search. To use Chroma Cloud:
- Sign up at trychroma.com
- Create a database
- Get your API key from the dashboard
Configure environment variables for cloud access:
export CHROMA_API_KEY=your-api-key
export CHROMA_TENANT=your-tenant
export CHROMA_DATABASE=your-database
Sources: README.md and rust/chroma/README.md
Environment Variables
| Variable | Description |
|---|---|
CHROMA_API_KEY | API key for Chroma Cloud authentication |
CHROMA_TENANT | Sets the tenant (auto-inferred with API key) |
CHROMA_DATABASE | Sets the database (auto-inferred with scoped API key) |
[PROVIDER]_API_KEY | Provider-specific API keys (e.g., OPENAI_API_KEY) |
For local development, you can use:
let client = ChromaHttpClient::from_env()?;
Sources: rust/chroma/README.md
Complete Example Workflow
graph TD
A[Install Chroma Client] --> B[Initialize Client]
B --> C[Create Collection]
C --> D[Add Documents with Embeddings]
D --> E[Query Collection]
E --> F[Get Results]
G[Configure Embedding Function] --> D
H[Add Metadata] --> D
I[Set API Keys] --> BQuick Reference Commands
Installation
# Python
pip install chromadb
# JavaScript
npm install chromadb
# Start server
chroma run --path /chroma_db_path
Basic Operations
| Operation | Python | JavaScript |
|---|---|---|
| Create client | client = chromadb.HttpClient() | new ChromaClient() |
| Create collection | client.create_collection(name) | client.createCollection({name}) |
| Add documents | collection.add(...) | collection.add(...) |
| Query | collection.query(...) | collection.query(...) |
Additional Resources
Sources: [clients/python/README.md](https://github.com/chroma-core/chroma/blob/main/clients/python/README.md)
System Architecture Overview
Related topics: Rust Backend Services Architecture, Go Coordinator & Distributed Systems, Protocol Buffers & gRPC API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Rust Backend Services Architecture, Go Coordinator & Distributed Systems, Protocol Buffers & gRPC API
System Architecture Overview
Introduction
Chroma is an open-source data infrastructure platform designed for AI applications, providing vector, hybrid, and full-text search capabilities. The system is built as a distributed, scalable architecture that handles embedding storage, indexing, and query execution across multiple components. Chroma positions itself as the open-source alternative to hosted vector database services, enabling developers to deploy sophisticated AI search infrastructure while maintaining full control over their data.
The architecture follows a modular design pattern with distinct components for API serving, query processing, data storage, and system coordination. Each component is responsible for specific aspects of the data pipeline, from ingestion through indexing to query execution.
High-Level Architecture
Chroma's architecture consists of three primary layers working in concert to provide vector search capabilities:
- Frontend Layer - Handles API requests and response formatting
- Worker Layer - Executes query operations and manages indexing
- System Database (SysDB) Layer - Maintains metadata and system state
graph TD
A[Client Application] --> B[Frontend Server]
B --> C[Worker Servers]
C --> D[SysDB]
C --> E[Blockstore]
E --> F[Arrow Files]
D --> G[Collection Metadata]
G --> H[Topology Information]Component Architecture
Frontend Server
The frontend server component serves as the API gateway for Chroma, handling incoming HTTP/gRPC requests and translating them into internal operations. The frontend is responsible for request validation, authentication handling, and response serialization.
Key Responsibilities:
| Responsibility | Description |
|---|---|
| API Endpoint Handling | Exposes REST and gRPC endpoints for collection operations |
| Request Validation | Validates incoming query parameters and payload structures |
| Response Serialization | Converts internal data structures to API response formats |
| Error Mapping | Translates internal errors to appropriate HTTP status codes |
Sources: rust/frontend/src/server.rs:1-50
The frontend server implements the ChromaError trait for consistent error handling across the system. Error codes are mapped as follows:
| Internal Error | HTTP Status Code |
|---|---|
| InvalidArgument | 400 Bad Request |
| NotFound | 404 Not Found |
| Internal | 500 Internal Server Error |
| Unavailable | 503 Service Unavailable |
Worker Server
The worker server handles the core data operations including embedding storage, indexing, and query execution. Workers are the primary compute units in Chroma's architecture, responsible for processing search requests and maintaining index structures.
Sources: rust/worker/src/server.rs:1-60
Worker Components:
graph LR
A[Query Request] --> B[Query Planner]
B --> C[HNSW Index]
B --> D[Spann Index]
B --> E[Record Segment]
B --> F[Metadata Segment]
C --> G[Result Merger]
D --> G
E --> G
F --> G
G --> H[Response]The worker server implements orchestration components for managing complex operations:
- ApplyLogsOrchestrator - Coordinates log application and compaction
- WorkQueueClient - Manages distributed task execution
- Segment Writers - Handles data persistence for different segment types
Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-80
System Database (SysDB)
The SysDB component maintains all metadata about collections, segments, and system topology. It provides a centralized view of the system's state and enables coordination across multiple workers.
SysDB Responsibilities:
| Function | Description |
|---|---|
| Collection Metadata | Stores collection configurations and schemas |
| Segment Registry | Tracks active segments and their locations |
| Topology Management | Manages provider-region mappings for distributed deployments |
| Transaction Coordination | Ensures consistency across distributed operations |
Sources: rust/sysdb/src/sysdb.rs:1-100
The SysDB uses a provider-region topology model that supports multi-cloud and multi-region deployments:
pub struct ProviderRegion<T> {
name: RegionName,
provider: String, // e.g., "aws", "gcp"
region: String, // e.g., "us-east-1"
config: T, // Provider-specific configuration
}
Sources: rust/types/src/topology.rs:1-60
Data Model Architecture
Collection Schema
Collections in Chroma follow a flexible schema model that supports multiple index types and data fields.
graph TD
A[Collection] --> B[Record Segment]
A --> C[Metadata Segment]
A --> D[Vector Index]
A --> E[Sparse Vector Index]
D --> F[HNSW Index]
D --> G[Spann Index]Supported Index Types:
| Index Type | Purpose | Key Configuration |
|---|---|---|
| Vector Index | Dense embeddings | Space (Cosine, L2, Dot), HNSW params |
| Sparse Vector Index | BM25-style inverted index | StringInvertedIndexConfig |
| Spann Index | Memory-efficient approximate search | InternalSpannConfiguration |
Sources: rust/types/src/collection_schema.rs:1-150
API Types
The API layer defines core types for query operations:
| Type | Purpose |
|---|---|
Include | Specifies which fields to return (distances, documents, embeddings, metadatas, uris) |
IncludeList | Collection of Include values with convenience constructors |
WhereDocumentOperator | Document filtering (Contains, NotContains, Regex, NotRegex) |
Sources: rust/types/src/api_types.rs:1-100
pub enum Include {
Distance,
Document,
Embedding,
Metadata,
Uri,
}
impl IncludeList {
pub fn default_query() -> Self {
Self(vec![Include::Document, Include::Metadata, Include::Distance])
}
pub fn all() -> Self {
Self(vec![Include::Document, Include::Metadata, Include::Distance, Include::Embedding, Include::Uri])
}
}
Metadata Filtering
Chroma supports rich metadata filtering through the MetadataExpression and MetadataComparison types:
graph TD
A[MetadataExpression] --> B[key: String]
A --> C[comparison: MetadataComparison]
C --> D[Primitive: Operator + Value]
C --> E[Set: Operator + SetValue]Sources: rust/types/src/metadata.rs:1-80
Blockstore Architecture
The blockstore provides persistent storage for indexed data using Apache Arrow format for efficient serialization and querying.
Arrow Block Structure
graph LR
A[Write Operation] --> B[Block Delta]
B --> C[Commit to Block]
C --> D[Arrow IPC Format]
D --> E[Disk Storage]
E --> F[BlockfileReader]Block Types:
| Block Type | Description |
|---|---|
OrderedBlockDelta | Sequential writes with ordering guarantees |
UnorderedBlockDelta | High-throughput writes without ordering |
DirectoryBlock | Sparse posting directory entries |
Sources: rust/blockstore/src/arrow/block/types.rs:1-100
The Arrow layout verification ensures data integrity:
pub enum ArrowLayoutVerificationError {
BufferLengthNotAligned,
NoRecordBatches,
MultipleRecordBatches,
InvalidMessageType,
RecordBatchDecodeError,
}
Sparse Posting Blocks
Sparse vectors use a specialized block format for efficient storage:
body = [ max_offset: u32 LE, max_weight: f32 LE ] × num_entries
The DirectoryBlock stores per-posting-block metadata for term pruning:
max_offset: Largest document offset in the posting blockmax_weight: Largest weight in the posting block
Sources: rust/types/src/sparse_posting_block.rs:1-60
Spann Index Architecture
Spann is Chroma's memory-efficient approximate nearest neighbor index that combines HNSW with posting lists.
graph TD
A[SpannIndexWriter] --> B[HNSW Index]
A --> C[Posting Lists]
A --> D[Versions Map]
A --> E[MaxHeadID Blockfile]
B --> F[Reader with adaptive search]SpannIndexReader Structure:
| Component | Type | Purpose |
|---|---|---|
| posting_lists | BlockfileReader<u32, SpannPostingList> | Term postings |
| hnsw_index | HnswIndexRef | Graph-based search |
| versions_map | BlockfileReader<u32, u32> | Version tracking |
| dimensionality | usize | Vector dimension |
| adaptive_search_nprobe | bool | Adaptive parameter |
Sources: rust/index/src/spann/types.rs:1-80
Indexing Pipeline
The indexing pipeline handles document ingestion through the following stages:
graph LR
A[Add Records] --> B[ApplyLogsOrchestrator]
B --> C[Record Segment Writer]
B --> D[Metadata Segment Writer]
B --> E[Vector Index Writer]
C --> F[Flush to Blockstore]
D --> F
E --> F
F --> G[Collection Update]Error Handling:
The orchestrator implements comprehensive error tracking:
| Error Type | Error Code | Tracing |
|---|---|---|
| ApplyLog | Internal | Yes |
| Channel | Internal | Yes |
| Commit | Internal | Yes |
| HnswSegment | Internal | Yes |
| MetadataSegment | Internal | Yes |
| Seal | Internal | Yes |
| InvariantViolation | - | Always |
Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-100
Query Execution Flow
Query Request Processing
graph TD
A[Query Request] --> B[Parse Query]
B --> C[Load Segments]
C --> D[Parallel Segment Queries]
D --> E{HNSW Search}
D --> F{Spann Search}
D --> G{Record Scan}
E --> H[Merge Results]
F --> H
G --> H
H --> I[Apply Filters]
I --> J[Return Results]Work Queue Integration
Distributed query execution uses a work queue system for task coordination:
graph TD
A[Coordinator] --> B[WorkQueueClient]
B --> C[gRPC Channel]
C --> D[Worker Pool]
D --> E[Task Execution]
E --> F[Result Collection]Error Code Mapping:
| gRPC Code | Chroma Error Code |
|---|---|
| Unavailable | Unavailable |
| DeadlineExceeded | DeadlineExceeded |
| ResourceExhausted | ResourceExhausted |
| NotFound | NotFound |
| InvalidArgument | InvalidArgument |
Sources: rust/worker/src/work_queue/work_queue_client.rs:1-80
Deployment Topology
Chroma supports flexible deployment configurations through its topology model:
graph TD
A[Topology] --> B[TopologyName]
A --> C[Vec<RegionName>]
A --> D[Config T]
C --> E[ProviderRegion]
E --> F[Provider]
E --> G[Region]The topology system enables:
- Multi-cloud deployments (AWS, GCP, Azure)
- Region-specific configurations
- Custom provider extensions
Summary
Chroma's architecture provides a scalable foundation for AI-powered search with several key design principles:
- Separation of Concerns - Frontend, worker, and SysDB components handle distinct responsibilities
- Arrow-Based Storage - Efficient columnar storage for analytical queries
- Flexible Indexing - Support for HNSW, Spann, and sparse vector indexes
- Distributed Coordination - Work queues and topology management for multi-node deployments
- Comprehensive Error Handling - Consistent error codes and tracing across all components
The modular architecture allows Chroma to scale from single-node development deployments to distributed production clusters serving AI applications at scale.
Sources: [rust/frontend/src/server.rs:1-50]()
Protocol Buffers & gRPC API
Related topics: System Architecture Overview, Rust Backend Services Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture Overview, Rust Backend Services Architecture
Protocol Buffers & gRPC API
Chroma uses Protocol Buffers (protobuf) as the core serialization format for inter-service communication and data persistence. The IDL (Interface Definition Language) files in the idl/ directory define the service APIs, data structures, and message types that power Chroma's distributed architecture.
Architecture Overview
Chroma employs a client-server architecture where Protocol Buffers serve as the contract between components. The protobuf definitions are centralized in the idl/ directory and used to generate code for multiple language runtimes including Python, JavaScript, Go, and Rust.
graph TD
subgraph "Client Layer"
JS[JavaScript Client]
PY[Python Client]
GO[Go Client]
end
subgraph "IDL Definitions"
PROTO[Protocol Buffer Definitions]
end
subgraph "Server Layer"
API[API Server]
COORD[Coordinator Service]
QUERY[Query Executor]
end
JS -->|Generated TS Bindings| PROTO
PY -->|Generated Python Stub| PROTO
GO -->|Generated Go Code| PROTO
API -->|gRPC/prost| PROTO
COORD -->|gRPC/prost| PROTO
QUERY -->|gRPC/prost| PROTOProto Definitions Structure
Core Service Definitions
The main protobuf definitions are organized in idl/chromadb/proto/:
| Proto File | Purpose | Key Messages |
|---|---|---|
chroma.proto | Core data types and collection operations | Collection, Database, OperationRecord |
coordinator.proto | Coordinator service for cluster management | Tenant, Database, Segment operations |
query_executor.proto | Query execution service interface | Query requests and responses |
Data Type Coverage
The protobuf definitions cover all core data types used throughout Chroma:
| Data Type | Usage |
|---|---|
Vector | Embedding vectors with scalar encoding |
OperationRecord | CRUD operations for records |
LogRecord | Write-ahead log entries with offsets |
Metadata | Key-value metadata for filtering |
Collection | Collection configuration and schema |
Cmek | Customer-managed encryption keys |
Rust Type Conversions
Chroma's Rust backend uses protobuf-generated types and converts them to idiomatic Rust types through TryFrom implementations. This pattern ensures type safety and clean separation between the wire format and internal representations.
Record Conversions
The rust/types/src/record.rs file contains conversion logic between protobuf and Rust types:
graph LR
A[chroma_proto::LogRecord] -->|TryFrom| B[LogRecord Rust]
A2[chroma_proto::Vector] -->|TryFrom| B2[(Vec<f32>, ScalarEncoding)]OperationRecord Conversion (Sources: rust/types/src/record.rs:recordinfo)
The OperationRecord conversion extracts metadata and document fields from protobuf representations:
// Metadata is extracted from proto, with document potentially in metadata
let (metadata, document) = match operation_record_proto.metadata {
Some(proto_metadata) => match UpdateMetadata::try_from(proto_metadata) {
Ok(mut metadata) => {
let document = metadata.remove(CHROMA_DOCUMENT_KEY);
match document {
Some(UpdateMetadataValue::Str(document)) => {
(Some(metadata), Some(document))
}
_ => (Some(metadata), None),
}
}
Err(e) => return Err(RecordConversionError::...),
},
None => (None, None),
};
Vector Type Conversions
Vectors are stored with their encoding information (Sources: rust/types/src/record.rs:vector)
impl TryFrom<chroma_proto::Vector> for (Vec<f32>, ScalarEncoding) {
type Error = VectorConversionError;
// Conversion implementation
}
Metadata Filtering Types
The metadata system supports rich filtering expressions defined in protobuf and converted to Rust types (Sources: rust/types/src/metadata.rs:metadata-types)
Document Operators
graph TD
DOC_OPS[WhereDocumentOperator] --> Contains
DOC_OPS --> NotContains
DOC_OPS --> Regex
DOC_OPS --> NotRegex| Operator | Description |
|---|---|
Contains | Document contains substring |
NotContains | Document does not contain substring |
Regex | Document matches regex pattern |
NotRegex | Document does not match regex pattern |
Metadata Expression Structure
pub struct MetadataExpression {
pub key: String,
pub comparison: MetadataComparison,
}
Metadata comparisons support both primitive types (strings, integers, floats, booleans) and set operations.
Collection Schema Definitions
Schema definitions in rust/types/src/collection_schema.rs define how collections are configured for indexing (Sources: rust/types/src/collection_schema.rs:schema-struct)
Schema Builder Pattern
The Schema struct provides a fluent builder API for index configuration:
graph TD
SCHEMA[Schema::default] --> CREATE_INDEX[.create_index]
CREATE_INDEX --> VALIDATE[Validate Index Config]
VALIDATE -->|Valid| RETURN[Return Self]
VALIDATE -->|Invalid| ERROR[SchemaBuilderError]Index Creation Example (Sources: rust/types/src/collection_schema.rs:create-index-example)
let schema = Schema::default()
.create_index(None, VectorIndexConfig {
space: Some(Space::Cosine),
embedding_function: None,
source_key: None,
hnsw: None,
spann: None,
}.into())?
.create_index(Some("category"), StringInvertedIndexConfig {}.into())?;
Supported Index Types
| Index Type | Configuration | Applies To |
|---|---|---|
VectorIndexConfig | HNSW, Space (Cosine/L2/IP), embedding function | #embedding key only |
StringInvertedIndexConfig | String indexing | Custom string keys |
FtsIndexConfig | Full-text search | Document key |
CMEK (Customer-Managed Encryption Keys)
Chroma supports customer-managed encryption keys through the Cmek type defined in protobuf (Sources: rust/types/src/collection_schema.rs:cmek)
CMEK Provider Configuration
| Provider | Validation Pattern | Resource Format |
|---|---|---|
| GCP | CMEK_GCP_RE regex | GCP resource identifier |
impl Cmek {
pub fn gcp(resource: String) -> Self;
pub fn validate_pattern(&self) -> bool;
}
Topology and Region Management
For multi-region deployments, Chroma uses topology definitions (Sources: rust/types/src/topology.rs:topology)
Provider Region Structure
classDiagram
class ProviderRegion {
+name: RegionName
+provider: String
+region: String
+config: T
}
class Topology {
+name: TopologyName
+regions: Vec~RegionName~
+config: T
}| Component | Description |
|---|---|
ProviderRegion | Single cloud provider region configuration |
Topology | Collection of regions forming a deployment topology |
Code Generation Pipeline
Build Process
Protobuf definitions are compiled to target languages using protoc and language-specific plugins (Sources: go/README.md:protobuf-setup)
graph LR
A[.proto files] --> B[protoc compiler]
B -->|Python| C[Python stubs]
B -->|Go| D[Go gRPC code]
B -->|JS/TS| E[TypeScript definitions]
B -->|Rust| F[Rust + prost]Required Tools
| Tool | Purpose |
|---|---|
protoc | Protocol Buffer compiler |
protoc-gen-go | Go code generation |
protoc-gen-go-grpc | Go gRPC service generation |
Generated API Patterns
The generated TypeScript API in clients/js/packages/chromadb-core/src/generated/api.ts follows standard gRPC-web patterns (Sources: clients/js/packages/chromadb-core/src/generated/api.ts:fetch-pattern)
const localVarFetchArgs = ApiApiFetchParamCreator(configuration).version(options);
return (fetch: FetchAPI = defaultFetch, basePath: string = BASE_PATH) => {
return fetch(
basePath + localVarFetchArgs.url,
localVarFetchArgs.options,
).then((response) => {
// Handle response by content type and status
if (response.status === 200) {
if (mimeType === "application/json") {
return response.json();
}
}
// Error handling for 401, 404, 409, 500
});
};
Error Code Mapping
Error types are mapped from Rust/Arrow errors to Chroma error codes (Sources: rust/blockstore/src/arrow/root.rs:error-mapping)
| Arrow Error Type | Chroma Error Code |
|---|---|
IOError | Internal |
ArrowError | Internal |
LayoutVerificationError | Internal |
FromBytesError variants | InvalidArgument / Internal |
Message Format Details
Arrow Block Serialization
Binary data in protobuf messages uses Arrow IPC format for efficient columnar storage (Sources: rust/blockstore/src/arrow/root.rs:arrow-reader)
let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
let record_batch = match arrow_reader {
Ok(mut reader) => match reader.next() {
Some(Ok(batch)) => batch,
Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
None => return Err(FromBytesError::NoDataError),
},
Err(e) => return Err(FromBytesError::ArrowError(e)),
};
IPC Footer Structure
The Arrow footer format requires:
- ARROW_MAGIC header (6 bytes)
- Footer content
- Footer length (4 bytes)
- Footer checksum
See Also
- Rust Types Module - Internal Rust type definitions
- Block Store Architecture - Data persistence with Arrow
- Client SDKs - Multi-language client implementations
- Go Server Implementation - Server-side gRPC implementation
Source: https://github.com/chroma-core/chroma / Human Manual
Python Client SDK
Related topics: Getting Started with Chroma, JavaScript/TypeScript Client SDKs, Embedding Functions Integration
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Getting Started with Chroma, JavaScript/TypeScript Client SDKs, Embedding Functions Integration
Python Client SDK
The Chroma Python Client SDK is the official Python library for interacting with Chroma, an open-source vector database designed for AI applications. This SDK provides a complete interface for managing collections, storing embeddings, and performing similarity searches across vector data.
Overview
Chroma positions itself as the open-source data infrastructure for AI, offering developers a streamlined way to incorporate vector search capabilities into their applications. The Python Client SDK serves as the primary client library for Python developers, enabling seamless integration with Chroma's vector database capabilities.
The SDK supports two primary modes of operation: embedded mode, where the database runs locally within the same process, and client-server mode, where the Python client communicates with a remote Chroma server via HTTP. This flexibility allows developers to choose the deployment architecture that best fits their application requirements, whether they need a lightweight local setup for development and testing or a scalable server-based deployment for production environments.
For Python-specific installations, developers can choose between the full chromadb package, which includes all embedding libraries as dependencies, or the chromadb-client package, which is a lightweight HTTP-only client that connects to a running Chroma server. The installation is straightforward via pip, making it accessible for projects of all sizes.
The SDK is designed with developer productivity in mind, providing intuitive APIs for common operations like adding documents, querying collections, and managing metadata. It handles the complexity of embedding generation and vector storage behind a clean, Pythonic interface, allowing developers to focus on building their AI applications rather than managing low-level database operations.
Architecture
The Python Client SDK follows a layered architecture that separates concerns between the client interface, API communication, and data models. Understanding this architecture helps developers effectively use the SDK and troubleshoot any issues that may arise during development.
graph TD
A[Application Code] --> B[ChromaClient / AsyncChromaClient]
B --> C[Collection API]
B --> D[Embedding Functions]
C --> E[REST API Layer]
D --> F[External Embedding Providers]
E --> G[Chroma Server]
E --> H[Embedded Mode]
G --> I[Persistent Storage]
H --> IClient Layer
The client layer forms the entry point for all SDK operations. Chroma provides two client implementations: the synchronous Client class for traditional Python applications and the AsyncClient class for asynchronous applications built with async/await patterns.
The synchronous client is suitable for most use cases, providing blocking API calls that execute immediately and return results. This approach is familiar to developers coming from traditional Python backgrounds and works well in scripts, batch processing jobs, and web applications that don't require high concurrency.
The asynchronous client, on the other hand, is designed for applications that need to handle many concurrent operations efficiently, such as web servers built on frameworks like FastAPI or Starlette. By using Python's asyncio library, the async client can perform multiple network operations concurrently, improving throughput in I/O-bound scenarios.
Both clients share a similar interface, with the async client simply wrapping the underlying HTTP calls with async/await syntax. This consistency makes it easy to switch between synchronous and asynchronous code as requirements evolve.
Collection Management
Collections serve as the primary organizational unit in Chroma, analogous to tables in traditional relational databases or buckets in object storage. Each collection contains a set of vectors along with their associated metadata, documents, and unique identifiers.
The SDK provides a comprehensive collection API that supports creating new collections, retrieving existing ones, listing all collections in the database, and deleting collections when they're no longer needed. Collections can be configured with specific settings at creation time, including the embedding function to use for auto-embedding documents and the name of the collection for identification purposes.
Collections maintain a schema-like structure through their use of metadata. While Chroma is schemaless in the traditional sense, the metadata associated with vectors allows developers to impose structure on their data for filtering and organization purposes.
Data Model
The data model in Chroma revolves around four core concepts: vectors, documents, metadata, and IDs. Each record in a collection consists of these four components, providing a flexible yet structured way to store and retrieve information.
Vectors are the mathematical representations of data in embedding space. They can be provided directly by the application or generated automatically using embedding functions. The SDK accepts vectors as lists of floating-point numbers, making it compatible with output from virtually any embedding model.
Documents are the original text or content that was transformed into vectors. Storing documents alongside their vectors enables applications to retrieve the original content during query operations without needing to maintain a separate document store.
Metadata provides contextual information about each record. Examples include the source of the document, timestamps, user IDs, or any other application-specific attributes. Metadata can be used for filtering during queries, allowing applications to narrow search results based on specific criteria.
IDs uniquely identify each record within a collection. The SDK accepts string identifiers, giving applications flexibility in how they choose to name and reference their data. Common patterns include using UUIDs, meaningful string identifiers derived from the document content, or sequential numbers.
Installation and Setup
Installing the Chroma Python Client SDK is straightforward using pip, Python's package manager. The SDK is available in two variants to accommodate different use cases and deployment scenarios.
pip install chromadb
This command installs the full Chroma package, which includes all core functionality plus built-in support for various embedding providers. This variant is recommended for most users who want a complete, self-contained installation.
pip install chromadb-client
This command installs only the HTTP client library, which is useful for scenarios where the Chroma server runs separately or where a minimal dependency footprint is required. This variant connects to Chroma servers via HTTP and doesn't include embedding provider libraries.
Client Initialization
Initializing the Chroma client depends on the deployment mode and desired configuration. The SDK provides flexible initialization options to accommodate different environments.
Embedded Mode
In embedded mode, Chroma runs entirely within your Python process, storing data locally. This is ideal for development, testing, and small-scale deployments where a separate server isn't required.
import chromadb
client = chromadb.Client()
The embedded client automatically creates a local database directory and manages all data storage internally. Data persists across process restarts, making it suitable for applications that need persistent storage without the complexity of a separate server process.
Client-Server Mode
In client-server mode, your Python application connects to a Chroma server running separately, either locally or on a remote machine. This architecture supports larger-scale deployments and enables sharing data across multiple client applications.
import chromadb
client = chromadb.HttpClient(
host="localhost",
port=8000
)
The HTTP client communicates with the server using REST API calls, handling serialization, network transport, and error handling transparently. This mode requires a Chroma server to be running and accessible at the specified host and port.
Configuration Options
The client supports various configuration options to customize its behavior for specific use cases. These options can be provided during client initialization to control aspects like SSL/TLS settings, authentication, and connection pooling.
| Option | Type | Default | Description |
|---|---|---|---|
host | string | "localhost" | Server hostname or IP address |
port | integer | 8000 | Server port number |
ssl | boolean | false | Enable SSL/TLS encryption |
headers | dict | None | Custom HTTP headers for requests |
tenant | string | None | Tenant identifier for multi-tenant setups |
database | string | None | Database name for organized data storage |
Collection Operations
Collections are the central organizing structure in Chroma, grouping related vectors, documents, and metadata together. The SDK provides a comprehensive API for creating, managing, and interacting with collections.
Creating a Collection
Collections are created using the client's create_collection method, which accepts a name and optional configuration parameters.
collection = client.create_collection(
name="my-documents",
metadata={"description": "Document collection for RAG"},
get_or_create=True
)
The get_or_create parameter is particularly useful in production applications, as it prevents errors if a collection with the same name already exists. When set to True, the method returns the existing collection if one exists or creates a new one if it doesn't.
Adding Data
Data is added to collections using the add method, which accepts vectors, documents, metadata, and unique identifiers. All parameters must be provided as lists of equal length, with each index representing a single record.
collection.add(
documents=["This is the first document", "This is the second document"],
metadatas=[{"source": "notion"}, {"source": "google-docs"}],
ids=["doc-1", "doc-2"],
embeddings=[[1.2, 2.1, 3.5], [1.1, 2.0, 3.4]]
)
The SDK supports automatic embedding generation when embedding functions are configured for the collection. In this case, documents can be provided without explicit embeddings, and the SDK will generate the vector representations automatically.
Querying Data
Querying is performed using the query method, which accepts query text or query vectors and returns the most similar results based on vector similarity.
results = collection.query(
query_texts=["search terms here"],
n_results=2,
where={"source": "notion"},
include=["documents", "metadatas", "distances"]
)
The where parameter enables filtering results based on metadata conditions, allowing applications to narrow search results to specific subsets of data. The include parameter controls which data components are returned, helping optimize bandwidth and processing for applications that don't need all available information.
Query results include the matched document IDs, the documents themselves, associated metadata, and distance scores indicating how similar each result is to the query. Lower distance scores indicate higher similarity, with zero representing an exact match.
Updating and Deleting Data
The SDK supports updating existing records and deleting unwanted data from collections. These operations are essential for maintaining data accuracy and managing collection lifecycle.
collection.update(
ids=["doc-1"],
documents=["Updated document content"],
metadatas=[{"source": "notion", "updated": True}]
)
collection.delete(
ids=["doc-2"],
where={"source": "google-docs"}
)
Update operations modify existing records identified by their IDs, replacing the specified fields while preserving unchanged data. Delete operations remove records matching the provided ID or metadata filters, with the ability to delete multiple records simultaneously.
Querying and Filtering
Chroma provides powerful querying and filtering capabilities that enable precise retrieval of relevant results. Understanding these capabilities is essential for building effective vector search applications.
Vector Similarity Search
The core query operation performs vector similarity search, finding the most similar records to a given query vector or text. The SDK handles text queries by first embedding them using the collection's configured embedding function.
Results are ranked by similarity, with the most similar results appearing first. The n_results parameter controls how many results are returned, allowing applications to balance result completeness with performance considerations.
Metadata Filtering
Metadata filtering narrows search results based on document attributes stored alongside vectors. This is particularly useful for applications that need to search within specific subsets of data, such as documents from a particular source or within a date range.
results = collection.query(
query_texts=["search terms"],
where={
"source": "notion",
"category": {"$in": ["technical", "documentation"]}
}
)
The filter syntax supports various operators including equality, inequality, comparison operators for numeric ranges, and set membership tests. Complex filter expressions can be constructed using logical operators to combine multiple conditions.
Result Inclusion
The include parameter controls which data components are included in query results. This allows applications to optimize their queries by requesting only the data they need.
| Include Option | Description |
|---|---|
embeddings | Include the full vector for each result |
documents | Include the original document text |
metadatas | Include the associated metadata |
distances | Include similarity distance scores |
By default, only documents and distances are included in results. Applications should specify only the needed components to minimize bandwidth usage and processing overhead.
Embedding Functions
Embedding functions transform text into vector representations that capture semantic meaning. Chroma supports multiple embedding providers, allowing applications to choose the approach that best fits their requirements.
Built-in Embeddings
For simple use cases, Chroma includes a default embedding function that works out of the box without additional configuration. This function is suitable for development and testing but may not provide the best quality embeddings for production applications.
External Providers
For production applications requiring higher quality embeddings, Chroma supports integration with external embedding services. These services provide state-of-the-art embedding models that can significantly improve search quality.
Supported providers include OpenAI's embedding models, which offer excellent quality for English text, and various open-source alternatives. Each provider has its own configuration requirements, typically involving API keys and model selection parameters.
Configuration is typically done at the collection level, allowing different collections to use different embedding functions if needed. This flexibility supports applications that work with multiple data types or require different embedding strategies for different use cases.
Custom Embedding Functions
For specialized use cases, applications can implement custom embedding functions by conforming to the SDK's embedding function interface. This allows integration with any embedding model or service that can be accessed from Python.
Custom functions receive a list of texts and return a corresponding list of vectors. They can implement any logic needed, including batching, caching, and error handling, giving applications full control over the embedding process.
Error Handling
The SDK provides comprehensive error handling to help applications gracefully manage failure scenarios. Understanding the error types and how to handle them is important for building robust applications.
Connection Errors
Connection errors occur when the client cannot establish communication with the Chroma server. These errors can result from network issues, server unavailability, or incorrect server configuration.
try:
collection = client.get_collection("my-collection")
except chromadb.connection.ChromaConnectionError:
print("Unable to connect to Chroma server")
Applications should implement appropriate retry logic and user-facing error messages when connection errors occur, as these situations typically require intervention beyond the application's control.
Collection Not Found
Operations on non-existent collections raise specific errors that can be caught and handled appropriately.
try:
collection = client.get_collection("non-existent")
except chromadb.not_found.NotFound:
print("Collection does not exist")
The get_or_create parameter available during collection creation provides an alternative to explicit error handling when the existence of a collection is uncertain.
Invalid Arguments
Invalid argument errors indicate problems with the data or parameters provided to SDK methods. These errors typically result from bugs in application code or invalid user input.
Examples include malformed IDs, vectors of incorrect dimensions, mismatched list lengths, and invalid filter expressions. The error messages provide guidance on what parameter is problematic, making debugging straightforward.
Best Practices
Following best practices ensures optimal performance, reliability, and maintainability when using the Python Client SDK in production applications.
Connection Management
Applications should create a single client instance and reuse it across the application rather than creating new clients for each operation. The client manages connection pooling and state internally, and creating multiple instances can lead to resource waste and inconsistent state.
client = chromadb.HttpClient(host="localhost", port=8000)
def get_collection():
return client.get_collection("my-documents")
For applications that require clean-up, the client should be properly closed when the application terminates, ensuring any pending operations complete and resources are released.
Batch Operations
When adding or querying large numbers of records, batching operations improves performance by reducing network overhead and allowing the server to optimize processing. The SDK handles batching internally for the most common operations, but applications should be aware of batch size considerations.
Error Recovery
Production applications should implement comprehensive error handling that distinguishes between recoverable errors (like temporary network issues) and non-recoverable errors (like invalid input). Recoverable errors can be handled with retry logic, while non-recoverable errors should surface appropriate feedback to users.
Related Documentation
For further information on using Chroma's Python Client SDK, the following resources provide additional context and examples.
The official Chroma documentation at trychroma.com provides comprehensive guides on getting started, deployment options, and advanced usage patterns. The documentation includes tutorials, API reference material, and example applications that demonstrate real-world usage.
The GitHub repository at github.com/chroma-core/chroma contains the complete source code for Chroma, including the Python Client SDK. Developers interested in understanding implementation details or contributing to the project can explore the codebase directly.
The Chroma Discord community provides a forum for asking questions, sharing experiences, and connecting with other developers using Chroma. The community is an excellent resource for troubleshooting issues and discovering best practices from experienced users.
Source: https://github.com/chroma-core/chroma / Human Manual
JavaScript/TypeScript Client SDKs
Related topics: Python Client SDK, Getting Started with Chroma
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Python Client SDK, Getting Started with Chroma
JavaScript/TypeScript Client SDKs
Chroma provides comprehensive JavaScript and TypeScript client libraries for interacting with Chroma servers from browser and Node.js environments. The SDKs offer both low-level HTTP API access and high-level abstractions for collections, embedding functions, and query operations.
Architecture Overview
Chroma maintains two generations of JavaScript clients to support different use cases and ecosystem requirements.
graph TD
A[Chroma Server] <--> B[HTTP API];
B <--> C[Legacy JS Client v2.4.7];
B <--> D[new-js Client v3.4.5];
C --> E[chromadb<br/>Bundled];
C --> F[chromadb-client<br/>Peer Dependencies];
D --> G[ChromaClient];
D --> H[Embedding Functions<br/>via @chroma-core/*];Client Package Versions
| Package | Version | Type | Description |
|---|---|---|---|
chromadb (legacy) | 2.4.7 | npm | Bundled package with all embedding libraries included |
chromadb-client (legacy) | 2.4.7 | npm | Client package requiring peer dependencies |
chromadb (new-js) | 3.4.5 | npm | Modern client with modular architecture |
@internal/chromadb-core | 2.4.7 | workspace | Shared core functionality |
Sources: clients/js/packages/chromadb/package.json:3 Sources: clients/new-js/packages/chromadb/package.json:3
Package Structure
Legacy Client (v2.x)
The legacy client provides two distribution options:
graph LR
A[chromadb] --> B[chromadb-core<br/>+ All Embeddings];
C[chromadb-client] --> D[chromadb-core<br/>+ Peer Dependencies];
B --> E[@google/generative-ai];
B --> F[@xenova/transformers];
B --> G[cohere-ai];
D --> E;
D --> F;
D --> G;| Package | Use Case | Embedding Libraries |
|---|---|---|
chromadb | Simple projects wanting everything included | Bundled with all providers |
chromadb-client | Projects needing specific embedding libraries | Peer dependencies required |
Sources: clients/js/packages/chromadb-client/package.json:1-55
New-JS Client (v3.x)
The new JavaScript client uses a modular workspace architecture with the following structure:
clients/new-js/
├── packages/
│ ├── chromadb/ # Core client package
│ │ └── src/
│ │ ├── chroma-client.ts # Main client implementation
│ │ └── api/
│ │ └── sdk.gen.ts # Generated API client
│ └── ai-embeddings/
│ ├── common/ # Shared utilities
│ ├── all/ # Aggregated providers
│ ├── chroma-bm25/ # BM25 sparse embeddings
│ ├── cohere/ # Cohere provider
│ ├── google-gemini/ # Google Gemini provider
│ ├── huggingface-server/ # HuggingFace server
│ ├── jina/ # Jina AI provider
│ ├── together-ai/ # Together AI provider
│ └── voyageai/ # Voyage AI provider
Sources: clients/new-js/packages/ai-embeddings/all/package.json:1-45
Module Exports Configuration
Both client generations support modern JavaScript module resolution with ESM and CommonJS exports.
Export Structure
graph TD
A[Package Entry] --> B{Import Type};
B -->|ESM import| C[.mjs / .d.ts];
B -->|CommonJS require| D[.cjs / .d.cts];
C --> E[dist/*.mjs];
D --> F[dist/cjs/*.cjs];| Export Condition | Entry Point | Type Definitions |
|---|---|---|
ESM import | dist/chromadb.mjs | dist/chromadb.d.ts |
CommonJS require | dist/cjs/chromadb.cjs | dist/cjs/chromadb.d.cts |
Sources: clients/js/packages/chromadb/package.json:12-25 Sources: clients/new-js/packages/chromadb/package.json:12-25
Client Initialization
Basic Connection
import { ChromaClient } from "chromadb";
// Initialize the client
const chroma = new ChromaClient({
path: "http://localhost:8000"
});
Sources: clients/js/packages/chromadb-client/README.md:15-20
With Embedding Function
import { ChromaClient } from 'chromadb';
import { TogetherAIEmbeddingFunction } from '@chroma-core/together-ai';
const embedder = new TogetherAIEmbeddingFunction({
apiKey: 'your-api-key',
modelName: 'togethercomputer/m2-bert-80M-8k-retrieval',
});
const client = new ChromaClient({
path: 'http://localhost:8000',
});
Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md:1-35
Collection Operations
Collections are the primary data structure for storing and querying embeddings.
Create Collection
const collection = await chroma.createCollection({
name: "my-collection",
embeddingFunction: embedder, // Optional
metadata: { // Optional
description: "My document collection"
}
});
Add Documents
await collection.add({
ids: ["id1", "id2"],
embeddings: [ // Optional if embedding function provided
[1.1, 2.3, 3.2],
[4.5, 6.9, 4.4],
],
metadatas: [{ source: "doc1" }, { source: "doc2" }],
documents: ["Document 1 content", "Document 2 content"],
});
Query Collection
const results = await collection.query({
queryEmbeddings: [1.1, 2.3, 3.2], // Or queryTexts with embedding function
queryTexts: ["Sample query"], // Text query (uses embedding function)
nResults: 2, // Number of results
where: { source: "doc1" }, // Optional metadata filter
include: ["documents", "metadatas", "distances"]
});
Sources: clients/js/packages/chromadb-client/README.md:25-50
Embedding Function Providers
The new-js client provides first-class support for multiple embedding providers through the @chroma-core/* packages.
Available Providers
| Provider Package | Model Examples | API Required |
|---|---|---|
@chroma-core/together-ai | togethercomputer/m2-bert-80M-8k-retrieval | Yes |
@chroma-core/voyageai | voyage-2 | Yes |
@chroma-core/google-gemini | text-embedding-004 | Yes |
@chroma-core/jina | jina-embeddings-v2-base-en | Yes |
@chroma-core/cohere | Various Cohere models | Yes |
@chroma-core/chroma-bm25 | N/A (local algorithm) | No |
@chroma-core/all | All providers bundled | Varies |
Sources: clients/new-js/packages/ai-embeddings/together-ai/README.md Sources: clients/new-js/packages/ai-embeddings/voyageai/README.md
Configuration Options
Each embedding function supports common configuration patterns:
const embedder = new SomeEmbeddingFunction({
apiKey: 'your-api-key', // Or set via environment variable
apiKeyEnvVar: 'PROVIDER_API_KEY', // Default env var name
modelName: 'provider-model-name', // Provider-specific model
// Provider-specific options
task: 'retrieval.passage', // Jina example
dimensions: 768, // Jina example
truncate: true, // Jina example
normalized: true, // Jina example
});
Environment Variable Configuration
| Provider | Environment Variable |
|---|---|
| Together AI | TOGETHER_API_KEY |
| Voyage AI | VOYAGE_API_KEY |
| Google Gemini | GEMINI_API_KEY |
| Jina | JINA_API_KEY |
Sources: clients/new-js/packages/ai-embeddings/jina/README.md:1-45
Rust Native Bindings
For performance-critical applications, Chroma provides pre-built Rust native bindings for Node.js.
Supported Platforms
| Package Name | OS | Architecture | LibC |
|---|---|---|---|
chromadb-js-bindings-darwin-x64 | macOS (Intel) | x64 | N/A |
chromadb-js-bindings-darwin-arm64 | macOS (Apple Silicon) | arm64 | N/A |
chromadb-js-bindings-linux-x64-gnu | Linux | x64 | glibc |
chromadb-js-bindings-linux-arm64-gnu | Linux | arm64 | glibc |
All bindings versions: 1.3.4 Minimum Node.js version: >= 10
Sources: rust/js_bindings/npm/darwin-x64/package.json:1-18 Sources: rust/js_bindings/npm/linux-x64-gnu/package.json:1-18
Build and Development
Build Scripts
| Command | Description |
|---|---|
pnpm build | Build all packages |
pnpm build:core | Build only @internal/chromadb-core |
pnpm build:packages | Build all packages except core |
pnpm watch | Watch mode for development |
pnpm test | Run all tests |
pnpm test:functional | Run functional tests (excluding auth) |
New-JS Client Build Configuration
{
"scripts": {
"build": "tsup",
"watch": "tsup --watch",
"typecheck": "tsc --noEmit"
}
}
Build tooling uses tsup for efficient bundling with TypeScript support.
Sources: clients/new-js/packages/ai-embeddings/common/package.json:18-25 Sources: clients/js/package.json:22-30
Choosing a Client Package
graph TD
A[Start] --> B{Do you need all embedding providers?};
B -->|Yes, convenience| C[chromadb v2.4.7<br/>or @chroma-core/all + chromadb v3.4.5];
B -->|No, want to minimize bundle| D{Do you have embedding requirements?};
D -->|Yes, specific providers| E[chromadb-client v2.4.7<br/>with peer dependencies];
D -->|No, just vector storage| F[chromadb-client v2.4.7<br/>or chromadb v3.4.5];
C --> G[Include all embedding libraries];
E --> H[Only install needed providers];
F --> I[No embedding function needed];Decision Matrix
| Requirement | Recommended Package |
|---|---|
| Simple setup, all features | chromadb (bundled) |
| Minimal bundle size | chromadb-client with peer deps |
| Modern architecture | chromadb (new-js v3.4.5) |
| BM25 sparse embeddings | @chroma-core/chroma-bm25 |
| Cloud/Remote providers | @chroma-core/* packages |
Sources: clients/js/examples/node/README.md:1-45
TypeScript Support
All JavaScript client packages include full TypeScript type definitions:
{
"types": "dist/chromadb.d.ts",
"exports": {
".": {
"import": {
"types": "./dist/chromadb.d.ts"
},
"require": {
"types": "./dist/cjs/chromadb.d.cts"
}
}
}
}
The TypeScript minimum version requirement is ^5.0.4 for the legacy client and ^5.3.3 for new-js packages.
Sources: clients/js/packages/chromadb/package.json:8 Sources: clients/new-js/packages/ai-embeddings/common/package.json:30
Dependencies
Core Dependencies
| Package | Version | Purpose |
|---|---|---|
isomorphic-fetch | ^3.0.0 | HTTP client for browser/Node.js |
ajv | ^8.12.0 / ^8.17.1 | JSON schema validation |
cliui | ^8.0.1 | CLI utilities |
Node.js Compatibility
| Package Generation | Minimum Node.js |
|---|---|
| Legacy (v2.x) | >= 14.17.0 |
| New-JS (v3.x) | >= 20 |
| Rust Bindings | >= 10 |
Sources: clients/js/packages/chromadb-client/package.json:50-55 Sources: clients/new-js/packages/ai-embeddings/common/package.json:35-38
Sources: [clients/js/packages/chromadb/package.json:3](https://github.com/chroma-core/chroma/blob/main/clients/js/packages/chromadb/package.json)
Rust Backend Services Architecture
Related topics: System Architecture Overview, Data Storage & Blockstore
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture Overview, Data Storage & Blockstore
Rust Backend Services Architecture
Overview
The Chroma Rust backend provides a high-performance, scalable vector database service built entirely in Rust. The architecture follows a distributed systems design with multiple specialized services working together to handle embedding storage, indexing, and similarity search operations.
Design Goals
| Goal | Description |
|---|---|
| High Performance | Arrow-based columnar storage for efficient data access |
| Scalability | Multi-cloud, multi-region deployment support |
| Reliability | Comprehensive error handling with typed error codes |
| Flexibility | Multiple index types (HNSW, Spann, Inverted) |
| Consistency | Ordered and unordered mutation ordering options |
Core Service Components
graph TD
subgraph "Rust Backend Services"
W[Worker Service]
BS[Blockstore Service]
SYS[Sysdb Service]
LOG[Log Service]
end
W --> BS
W --> SYS
W --> LOGBlockstore Architecture
The blockstore is the core storage layer in Chroma's Rust backend, providing persistent storage for vector embeddings and associated metadata using Arrow columnar format.
Arrow-Based Storage
Chroma uses Apache Arrow as its primary storage format, which provides:
- Columnar Layout: Efficient analytic queries by column
- Zero-Copy Reads: Memory-mapped access patterns
- Cross-Language Interop: Standardized binary format
- Compression Support: Built-in encoding/decoding
Sources: rust/blockstore/src/arrow/root.rs:1-40
Blockfile Structure
graph TD
subgraph "Blockfile Components"
BF[Blockfile]
BR[Block Reader]
BW[Block Writer]
RM[Root Manager]
BM[Block Manager]
end
BF --> BR
BF --> BW
BW --> RM
BR --> BM#### Root Management
The Root component manages the root directory structure and file operations:
pub(super) fn get_all_block_ids_from_bytes(
bytes: &[u8],
id: Uuid,
) -> Result<Vec<Uuid>, FromBytesError>
Key responsibilities:
- Reading Arrow IPC files
- Extracting block metadata and IDs
- Version validation and verification
Sources: rust/blockstore/src/arrow/root.rs:28-50
#### Block Layout Verification
The block layout verification ensures data integrity:
#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
#[error("Buffer length is not 64 byte aligned")]
BufferLengthNotAligned,
#[error("No record batches in footer")]
NoRecordBatches,
#[error("More than one record batch in IPC file")]
MultipleRecordBatches,
#[error("Invalid message type")]
InvalidMessageType,
}
Sources: rust/blockstore/src/arrow/block/types.rs:1-30
| Error Type | Error Code | Severity |
|---|---|---|
BufferLengthNotAligned | Internal | High |
NoRecordBatches | Internal | High |
MultipleRecordBatches | Internal | Medium |
InvalidMessageType | Internal | High |
RecordBatchDecodeError | Internal | High |
Blockfile Writer Types
Chroma supports two mutation ordering strategies:
| Ordering Type | Description | Use Case |
|---|---|---|
Ordered | Sequential writes with guaranteed order | Consistent state |
Unordered | Parallel writes for throughput | High-volume ingestion |
Sources: rust/blockstore/src/arrow/provider.rs:1-50
match options.mutation_ordering {
BlockfileWriterMutationOrdering::Ordered => {
let file = ArrowOrderedBlockfileWriter::from_root(...);
Ok(BlockfileWriter::ArrowOrderedBlockfileWriter(file))
}
BlockfileWriterMutationOrdering::Unordered => {
let file = ArrowUnorderedBlockfileWriter::from_root(...);
Ok(BlockfileWriter::ArrowUnorderedBlockfileWriter(file))
}
}
Forking and Versioning
Blockfiles support forking for snapshot isolation:
let new_root = self
.root_manager
.fork::<K>(
&fork_from,
new_id,
&options.prefix_path,
self.block_manager.default_max_block_size_bytes(),
)
.await
Sources: rust/blockstore/src/arrow/provider.rs:1-30
Type System
Query Result Types
The execution layer uses a rich type system for search results:
#[derive(Clone, Debug, Default)]
pub struct SearchPayloadResult {
pub records: Vec<SearchRecord>,
}
Sources: rust/types/src/execution/operator.rs:1-20
#### Search Results Structure
graph LR
SR[SearchResult] --> SPR[SearchPayloadResult]
SPR --> SR_vec[Vec<SearchRecord>]
SR --> PLB[pulled_log_bytes]| Field | Type | Description |
|---|---|---|
results | Vec<SearchPayloadResult> | Per-query search results |
pulled_log_bytes | u64 | Total log bytes fetched for metrics |
Include Enum
The Include enum controls which fields are returned in query results:
pub enum Include {
#[serde(rename = "distances")]
Distance,
#[serde(rename = "documents")]
Document,
#[serde(rename = "embeddings")]
Embedding,
#[serde(rename = "metadatas")]
Metadata,
#[serde(rename = "uris")]
Uri,
}
Sources: rust/types/src/api_types.rs:1-30
| Include Value | Returned Field | Default Query |
|---|---|---|
distances | Distance scores | ✓ |
documents | Text content | ✓ |
embeddings | Vector data | ✗ |
metadatas | Metadata objects | ✓ |
uris | Resource URIs | ✗ |
#### IncludeList Helper Methods
impl IncludeList {
pub fn empty() -> Self { Self(Vec::new()) }
pub fn default_query() -> Self {
Self(vec![Include::Document, Include::Metadata, Include::Distance])
}
pub fn default_get() -> Self {
Self(vec![Include::Document, Include::Metadata])
}
pub fn all() -> Self {
Self(vec![Include::Document, Include::Metadata, Include::Distance,
Include::Embedding, Include::Uri])
}
}
Sources: rust/types/src/api_types.rs:1-60
Key Filter System
The Key enum represents filterable fields in metadata queries:
pub enum Key {
Document,
Embedding,
Metadata,
Score,
MetadataField(String),
}
Sources: rust/types/src/operator.rs:1-30
| Key | Purpose | Example |
|---|---|---|
#document | Document content | Key::Document |
#embedding | Vector data | Key::Embedding |
#metadata | All metadata | Key::Metadata |
#score | Similarity score | Key::Score |
field_name | Custom metadata | Key::MetadataField("status") |
#### Key Factory Methods
impl Key {
/// Creates a Key for a custom metadata field
pub fn field(name: impl Into<String>) -> Self {
Key::MetadataField(name.into())
}
/// Creates an equality filter: `field == value`
pub fn eq(self, value: impl Into<MetadataValue>) -> ComparisonValue { ... }
}
Index Architecture
Spann Index
Spann is Chroma's sparse vector index implementation combining HNSW with posting lists:
#[derive(Clone, Debug)]
pub struct SpannIndexReader<'me> {
pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
pub hnsw_index: HnswIndexRef,
pub versions_map: BlockfileReader<'me, u32, u32>,
pub dimensionality: usize,
pub adaptive_search_nprobe: bool,
pub params: InternalSpannConfiguration,
}
Sources: rust/index/src/spann/types.rs:1-30
#### Spann Index Structure
graph TD
subgraph "Spann Index"
SPI[SpannIndexReader]
HNSW[HNSW Index]
PL[Posting Lists]
VM[Versions Map]
end
SPI --> HNSW
SPI --> PL
SPI --> VM| Component | Type | Purpose |
|---|---|---|
hnsw_index | HnswIndexRef | Approximate nearest neighbor search |
posting_lists | BlockfileReader<u32, SpannPostingList> | Document postings |
versions_map | BlockfileReader<u32, u32> | Document versioning |
adaptive_search_nprobe | bool | Adaptive parameter tuning |
Sparse Posting Block
The sparse posting block implements an inverted index structure:
#[derive(Debug, Clone)]
pub struct DirectoryBlock(SparsePostingBlock);
impl DirectoryBlock {
pub fn new(max_offsets: &[u32], max_weights: &[f32])
-> Result<Self, SparsePostingBlockError>
}
Sources: rust/types/src/sparse_posting_block.rs:1-40
| Field | Type | Description |
|---|---|---|
max_offset | u32 | Largest doc offset in posting block |
max_weight | f32 | Maximum weight for term pruning |
Schema and Index Configuration
Collection Schema
The schema system supports multiple index types:
pub struct Schema {
pub fn create_index(
mut self,
key: Option<&str>,
config: IndexConfig,
) -> Result<Self, SchemaBuilderError>
}
Sources: rust/types/src/collection_schema.rs:1-50
| Index Type | Key | Description |
|---|---|---|
VectorIndexConfig | None | Global vector index (HNSW/Spann) |
StringInvertedIndexConfig | Some(field) | Field-specific FTS |
SparseVectorIndexConfig | Some(field) | Sparse vector index |
Index Configuration
pub struct VectorIndexConfig {
pub space: Option<Space>,
pub embedding_function: Option<EmbeddingFunctionId>,
pub source_key: Option<Key>,
pub hnsw: Option<HnswConfig>,
pub spann: Option<SpannConfig>,
}
| Parameter | Type | Default | Description |
|---|---|---|---|
space | Option<Space> | None | Vector space (Cosine, L2, etc.) |
embedding_function | Option<EFId> | None | Embedding function ID |
hnsw | Option<HnswConfig> | None | HNSW parameters |
spann | Option<SpannConfig> | None | Spann parameters |
Worker Service Architecture
Work Queue Client
The work queue client manages distributed task execution:
pub enum WorkQueueClientError {
ConnectionError(#[from] tonic::Status),
RequestError(#[from] tonic::Status),
}
Sources: rust/worker/src/work_queue/work_queue_client.rs:1-20
#### Error Code Mapping
| gRPC Code | Chroma Error Code |
|---|---|
Unavailable | Unavailable |
DeadlineExceeded | DeadlineExceeded |
ResourceExhausted | ResourceExhausted |
InvalidArgument | InvalidArgument |
NotFound | NotFound |
PermissionDenied | PermissionDenied |
Apply Logs Orchestrator
The apply logs orchestrator handles log-based data synchronization:
#[derive(Debug)]
pub struct ApplyLogsOrchestratorResponse {
pub job_id: JobId,
pub total_records_post_compaction: u64,
pub flush_results: Vec<SegmentFlushInfo>,
pub collection_logical_size_bytes: u64,
}
Sources: rust/worker/src/execution/orchestration/apply_logs_orchestrator.rs:1-50
KNN Filter Architecture
The KNN filter orchestrates vector similarity search:
graph TD
subgraph "KNN Query Pipeline"
Q[Query Request]
F[Filter Logs]
K[KNN Search]
R[Results]
end
Q --> F
F --> K
K --> R#### KNN Error Handling
pub enum KnnError {
QuantizedSpannCenterSearch(QuantizedSpannError),
QuantizedSpannLoadCenter(QuantizedSpannError),
InvalidDistanceFunction,
Aborted,
InvalidSchema(#[from] SchemaError),
}
Sources: rust/worker/src/execution/orchestration/knn_filter.rs:1-40
| Error Type | Error Code | Traced |
|---|---|---|
QuantizedSpannCenterSearch | From inner | ✓ |
InvalidDistanceFunction | InvalidArgument | ✗ |
Aborted | ResourceExhausted | ✗ |
Result(_) | Internal | ✓ |
KNN Filter Output
#[derive(Clone, Debug)]
pub struct KnnFilterOutput {
pub logs: FetchLogOutput,
pub fetch_log_bytes: u64,
pub filter_output: FilterOutput,
pub dimension: usize,
pub distance_function: DistanceFunction,
}
Multi-Cloud Topology
Chroma supports multi-cloud and multi-region deployments:
pub struct ProviderRegion<T: Clone + Debug> {
pub name: RegionName,
pub provider: String,
pub region: String,
pub config: T,
}
Sources: rust/types/src/topology.rs:1-30
Topology Structure
graph TD
subgraph "Multi-Cloud Topology"
Config[Configuration]
Topologies[Vec<Topology>]
Regions[Vec<ProviderRegion>]
Preferred[Preferred Region]
end
Config --> Topologies
Config --> Regions
Config --> PreferredConfiguration Schema
struct RawMultiCloudMultiRegionConfiguration<R, T> {
preferred: RegionName,
regions: Vec<ProviderRegion<R>>,
topologies: Vec<Topology<T>>,
}
| Field | Type | Description |
|---|---|---|
preferred | RegionName | Default region for operations |
regions | Vec<ProviderRegion> | Available cloud regions |
topologies | Vec<Topology> | Topology configurations |
Error Handling Framework
Chroma Error Traits
All errors implement the ChromaError trait:
pub trait ChromaError: std::error::Error {
fn code(&self) -> ErrorCodes;
fn should_trace_error(&self) -> bool;
}
Error Code Registry
| Code | Category | Description |
|---|---|---|
InvalidArgument | Client | Malformed request |
NotFound | Client | Resource missing |
AlreadyExists | Client | Duplicate resource |
PermissionDenied | Security | Access denied |
ResourceExhausted | Rate | Quota exceeded |
Internal | Server | System error |
CLI Integration
The Rust CLI provides management commands:
pub enum Command {
Browse(BrowseArgs),
Copy(CopyArgs),
Db(DbSubcommand),
Docs,
Install(InstallArgs),
Login(LoginArgs),
Profile(ProfileSubcommand),
Run(RunArgs),
Support,
Update,
Vacuum(VacuumArgs),
}
Sources: rust/cli/src/lib.rs:1-30
Available Commands
| Command | Description |
|---|---|
browse | Open web interface |
copy | Copy data between collections |
db | Database operations |
docs | Open documentation |
install | Install Chroma |
login | Authenticate user |
profile | Performance profiling |
run | Start Chroma server |
support | Open support resources |
update | Update installation |
vacuum | Compact storage |
See Also
Sources: [rust/blockstore/src/arrow/root.rs:1-40]()
Go Coordinator & Distributed Systems
Related topics: System Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture Overview
I cannot generate this wiki page as specified. The `
Data Storage & Blockstore
Overview
The Chroma blockstore is the core storage subsystem responsible for persisting vector embeddings, metadata, and related data structures. It provides a unified abstraction layer over different storage backends (in-memory and Arrow-based) while maintaining performance characteristics suitable for high-throughput vector database operations.
The blockstore system is architected around the concept of blockfiles — persistent, columnar storage structures that organize data by prefix-based partitioning and support efficient key-value operations.
Architecture
graph TD
subgraph "Public API Layer"
BP[BlockfileProvider]
BR[BlockfileReader]
BW[BlockfileWriter]
BF[BlockfileFlusher]
end
subgraph "Implementation Layer"
ABP[ArrowBlockfileProvider]
MBP[MemoryBlockfileProvider]
ABF[ArrowUnorderedBlockfileWriter]
ABO[ArrowOrderedBlockfileWriter]
end
subgraph "Storage Layer"
BM[BlockManager]
RM[RootManager]
ST[Storage]
end
subgraph "Arrow Format"
R[Root]
SB[Sparse Index]
B[Blocks]
end
BP --> ABP
BP --> MBP
BR --> ABP
BR --> MBP
BW --> ABF
BW --> ABO
ABP --> BM
ABP --> RM
ABF --> BM
ABF --> RM
ABO --> BM
ABO --> RM
BM --> ST
RM --> ST
RM --> R
R --> SB
R --> BCore Components
BlockfileProvider
The BlockfileProvider is the main entry point for creating readers and writers. It abstracts the underlying storage implementation and provides factory methods for blockfile operations.
Variants:
| Provider Type | Description | Use Case |
|---|---|---|
HashMapBlockfileProvider | In-memory blockfile storage | Testing, ephemeral data |
ArrowBlockfileProvider | Persistent Arrow-based storage | Production workloads |
API Methods:
pub fn storage(&self) -> Option<Arc<Storage>> {
match self {
BlockfileProvider::ArrowBlockfileProvider(provider) => Some(provider.storage().clone()),
BlockfileProvider::HashMapBlockfileProvider(_) => None,
}
}
pub fn new_memory() -> Self {
BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}
Sources: rust/blockstore/src/provider.rs:1-30
BlockfileReader
The BlockfileReader trait provides read access to stored data. It supports generic key and value types that implement the ReadKey and ReadValue traits.
Trait Definition:
pub trait ReadKey<'a>:
Key
+ Into<KeyWrapper>
+ TryFrom<&'a KeyWrapper, Error = InvalidKeyConversion>
+ ArrowReadableKey<'a>
+ Sync
+ 'a
{}
pub trait ReadValue<'a>: Value + Readable<'a> + ArrowReadableValue<'a> + Sync + 'a {}
Sources: rust/blockstore/src/provider.rs:40-55
BlockfileWriter
The BlockfileWriter trait provides write access to blockfiles with support for ordered and unordered mutation patterns.
Core Operations:
| Method | Signature | Description |
|---|---|---|
set | set(prefix, key, value) | Insert or update a key-value pair |
delete | delete(prefix, key) | Remove a key-value pair |
commit | commit() | Finalize and persist the writer |
pub async fn set<
K: Key + Into<KeyWrapper> + ArrowWriteableKey,
V: Value + Writeable + ArrowWriteableValue,
>(
&self,
prefix: &str,
key: K,
value: V,
) -> Result<(), Box<dyn ChromaError>>
Sources: rust/blockstore/src/types/writer.rs:50-75
Arrow Blockfile Implementation
The Arrow-based blockfile is the primary production storage implementation, providing efficient columnar storage with Arrow IPC format.
Blockfile Structure
graph TD
R[Root File<br/>Root Writer] --> SB[Sparse Index<br/>Block Key Mapping]
R --> BH[Block Header<br/>Metadata]
SB --> B1[Block 1<br/>Arrow IPC]
SB --> B2[Block 2<br/>Arrow IPC]
SB --> BN[Block N<br/>Arrow IPC]
B1 --> P1[Prefix: "vec_1"]
B1 --> P2[Prefix: "vec_2"]ArrowBlockfileProvider
The ArrowBlockfileProvider manages the lifecycle of blockfiles using Arrow IPC format with a root-sparse index architecture.
Key Features:
- Fork Support: Create new blockfiles from existing ones via forking
- CMEK Support: Optional Customer-Managed Encryption Keys
- Block Size Management: Configurable maximum block sizes
pub async fn write<K: Key + ArrowWriteableKey, V: ArrowWriteableValue>(
&self,
options: BlockfileWriterOptions,
) -> Result<BlockfileWriter, Box<CreateError>>
Sources: rust/blockstore/src/arrow/provider.rs:1-50
Writer Types
#### ArrowUnorderedBlockfileWriter
Provides high-performance unordered writes optimized for bulk insertion scenarios.
impl ArrowUnorderedBlockfileWriter {
pub(super) fn new<K: ArrowWriteableKey, V: ArrowWriteableValue>(
id: Uuid,
prefix_path: &str,
block_manager: BlockManager,
root_manager: RootManager,
max_block_size_bytes: usize,
cmek: Option<Cmek>,
) -> Self
}
Sources: rust/blockstore/src/arrow/blockfile.rs:50-80
#### ArrowOrderedBlockfileWriter
Maintains key ordering within blocks, optimized for range queries and ordered iteration.
Sources: rust/blockstore/src/arrow/ordered_blockfile_writer.rs:1-50
BlockManager and RootManager
| Component | Responsibility |
|---|---|
BlockManager | Manages individual data blocks, handles block creation and commitment |
RootManager | Manages root files containing sparse indices and metadata |
// Forking a new root from an existing one
let new_root = self
.root_manager
.fork::<K>(
&fork_from,
new_id,
&options.prefix_path,
self.block_manager.default_max_block_size_bytes(),
)
.await
Sources: rust/blockstore/src/arrow/provider.rs:45-70
Error Handling
Error Types
| Error Type | Description | Error Code |
|---|---|---|
BlockNotFound | Requested block does not exist | Internal |
BlockFetchError | Failed to retrieve block from storage | Internal |
MigrationError | Blockfile migration failed | Internal |
IOError | Storage I/O operation failed | Internal |
ArrowError | Arrow IPC parsing/encoding error | Internal |
NoRecordBatches | Invalid Arrow file structure | Internal |
#[derive(Error, Debug)]
pub enum ArrowBlockfileError {
#[error("Block not found")]
BlockNotFound,
#[error("Could not fetch block")]
BlockFetchError(#[from] GetError),
#[error("Could not migrate blockfile to new version")]
MigrationError(#[from] MigrationError),
}
Sources: rust/blockstore/src/arrow/blockfile.rs:25-40
Layout Verification
The system validates Arrow file layouts to ensure data integrity:
#[derive(Error, Debug)]
pub enum ArrowLayoutVerificationError {
#[error("Buffer length is not 64 byte aligned")]
BufferLengthNotAligned,
#[error("No record batches in footer")]
NoRecordBatches,
#[error("More than one record batch in IPC file")]
MultipleRecordBatches,
#[error("Invalid message type")]
InvalidMessageType,
}
Sources: rust/blockstore/src/arrow/block/types.rs:40-60
Storage Operations
Write Flow
sequenceDiagram
participant Client
participant Provider as BlockfileProvider
participant Writer as BlockfileWriter
participant BM as BlockManager
participant RM as RootManager
participant Storage
Client->>Provider: write(options)
Provider->>Writer: create_writer()
Provider->>RM: create/fork_root()
Client->>Writer: set(prefix, key, value)
Writer->>BM: create_block()
loop Until flush
Writer->>Writer: accumulate_data()
end
Client->>Writer: commit()
Writer->>BM: commit_block()
Writer->>RM: update_root()
RM->>Storage: persist()
BM->>Storage: persist()Read Flow
sequenceDiagram
participant Client
participant Reader as BlockfileReader
participant RM as RootManager
participant BM as BlockManager
participant Storage
Client->>Reader: get(prefix, key)
Reader->>RM: get_block_ids()
RM->>Reader: block_id_list
loop For each block
Reader->>BM: get_block(id)
BM->>Storage: read()
Storage->>Reader: block_data
end
Reader->>Reader: search_blocks()
Reader->>Client: valueConfiguration Options
BlockfileWriterOptions
| Option | Type | Default | Description |
|---|---|---|---|
prefix_path | String | Required | Path prefix for storage |
max_block_size_bytes | usize | Provider default | Maximum size per block |
mutation_ordering | BlockfileWriterMutationOrdering | Ordered | Write ordering mode |
fork_from | Option<Uuid> | None | Source blockfile ID for forking |
cmek | Option<Cmek> | None | Customer-managed encryption key |
let mut bf_options = BlockfileWriterOptions::new(prefix_path.to_string())
.max_block_size_bytes(pl_block_size);
bf_options = bf_options.unordered_mutations();
if let Some(cmek) = cmek {
bf_options = bf_options.with_cmek(cmek);
}
Sources: rust/blockstore/src/arrow/provider.rs:90-110
Memory Blockfile
For testing and ephemeral use cases, Chroma provides an in-memory blockfile implementation:
pub fn new_memory() -> Self {
BlockfileProvider::HashMapBlockfileProvider(MemoryBlockfileProvider::new())
}
Limitations:
- No persistence
- No fork support
- Limited to unordered mutations
if options.fork_from.is_some() {
unimplemented!();
}
Sources: rust/blockstore/src/memory/provider.rs:40-55
Block Reading
RootReader
The RootReader is responsible for reading block metadata and identifying which blocks contain specific data:
impl RootReader {
pub(super) fn get_all_block_ids_from_bytes(
bytes: &[u8],
id: Uuid,
) -> Result<Vec<Uuid>, FromBytesError> {
let mut cursor = std::io::Cursor::new(bytes);
let arrow_reader = arrow::ipc::reader::FileReader::try_new(&mut cursor, None);
let record_batch = match arrow_reader {
Ok(mut reader) => match reader.next() {
Some(Ok(batch)) => batch,
Some(Err(e)) => return Err(FromBytesError::ArrowError(e)),
None => return Err(FromBytesError::NoDataError),
},
Err(e) => return Err(FromBytesError::ArrowError(e)),
};
let (version, read_id) = Self::version_and_id_from_record_batch(&record_batch, id)?;
if read_id != id {
return Err(FromBytesError::IdMismatch);
}
Self::block_ids_from_record_batch(&record_batch, version)
}
}
Sources: rust/blockstore/src/arrow/root.rs:20-55
Related Components
SpannIndex Integration
The blockstore is used by the Spann (Sparse + ANN) index for storing posting lists:
| Component | Purpose |
|---|---|
SpannIndexReader | Reads posting lists and HNSW indices |
SpannIndexWriter | Creates and manages posting list writers |
SpannPostingList | Stores document IDs and embeddings |
pub struct SpannIndexReader<'me> {
pub posting_lists: BlockfileReader<'me, u32, SpannPostingList<'me>>,
pub hnsw_index: HnswIndexRef,
pub versions_map: BlockfileReader<'me, u32, u32>,
pub dimensionality: usize,
}
Sources: rust/index/src/spann/types.rs:30-45
Summary
The Chroma blockstore provides a robust, extensible storage layer built on Arrow IPC format. Key architectural decisions include:
- Separation of concerns: BlockManager handles data blocks while RootManager manages metadata and sparse indices
- Dual writer support: Ordered and unordered writers for different access patterns
- Forking capability: Efficient creation of derived blockfiles without full copies
- Error classification: Clear mapping from internal errors to error codes for API responses
- Type-safe abstractions: Generic key-value traits enabling flexible data modeling
Sources: [rust/blockstore/src/provider.rs:1-30](https://github.com/chroma-core/chroma/blob/main/rust/blockstore/src/provider.rs)
Embedding Functions Integration
Related topics: Python Client SDK, Data Storage & Blockstore
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Python Client SDK, Data Storage & Blockstore
Embedding Functions Integration
Overview
Embedding Functions in Chroma provide a standardized interface for converting text into vector embeddings. Chroma supports multiple embedding providers through a plugin architecture that allows developers to use custom embedding functions or leverage hosted services like OpenAI, Cohere, Ollama, and others.
The embedding function system serves as the bridge between raw text data and the vector representation used for similarity search. Each embedding function implements a consistent interface that handles API communication, request formatting, and response parsing for its respective provider.
Sources: clients/new-js/packages/ai-embeddings/common/README.md
Architecture
High-Level Architecture
graph TD
A[Client Application] --> B[Chroma Collection]
B --> C[Embedding Function]
C --> D[Embedding Provider API]
D --> E[Vector Embeddings]
E --> B
F[@chroma-core/openai] --> C
G[@chroma-core/ollama] --> C
H[@chroma-core/cohere] --> C
I[@chroma-core/morph] --> C
J[@chroma-core/all] --> CEmbedding Function Package Structure
Chroma organizes embedding functions into separate packages under the @chroma-core namespace. Each package focuses on a specific provider while sharing common utilities.
| Package | Provider | Environment Support |
|---|---|---|
@chroma-core/ai-embeddings-common | Shared utilities | Node.js + Browser |
@chroma-core/openai | OpenAI | Node.js + Browser |
@chroma-core/ollama | Ollama (local) | Node.js + Browser |
@chroma-core/cohere | Cohere | Node.js + Browser |
@chroma-core/jina | Jina AI | Node.js + Browser |
@chroma-core/morph | Morph | Node.js |
@chroma-core/all | All providers | Node.js + Browser |
Sources: clients/new-js/packages/ai-embeddings/all/README.md
Core Components
Common Utilities Package
The @chroma-core/ai-embeddings-common package provides shared functionality used by all embedding function implementations:
import { validateConfigSchema, snakeCase, isBrowser } from '@chroma-core/ai-embeddings-common';
Key Features:
| Feature | Purpose |
|---|---|
validateConfigSchema | Validates embedding function configurations using JSON schemas |
snakeCase | Converts camelCase JavaScript objects to snake_case for API compatibility |
isBrowser | Detects browser vs Node.js runtime environment |
Sources: clients/new-js/packages/ai-embeddings/common/README.md
Dynamic Loading Mechanism
The embedding function system supports dynamic loading of packages based on configuration:
const fullPackageName = `@chroma-core/${packageName}`;
await import(fullPackageName);
embeddingFunction = knownEmbeddingFunctions.get(packageName);
The system maintains mappings for known embedding function names and handles package resolution automatically when a collection is configured with a specific embedding provider.
Sources: clients/new-js/packages/chromadb/src/embedding-function.ts
Configuration Schema
Embedding functions support structured configuration with schema validation. Configuration options vary by provider but typically include:
| Parameter | Description | Provider Support |
|---|---|---|
apiKey | API key for authentication | OpenAI, Cohere, Jina, Gemini |
modelName | Specific model identifier | All providers |
apiBase | Custom API endpoint URL | Ollama, Morph, Gemini |
encodingFormat | Output format (float/base64) | OpenAI, Morph |
Sources: clients/new-js/packages/ai-embeddings/morph/README.md
Provider Implementations
OpenAI Embeddings
The OpenAI embedding function supports the OpenAI API for generating text embeddings:
import { OpenAIEmbeddingFunction } from '@chroma-core/openai';
const openAIEF = new OpenAIEmbeddingFunction({
apiKey: 'your-api-key',
modelName: 'text-embedding-3-small'
});
Ollama (Local Embeddings)
Ollama enables local embedding generation without external API calls:
# Install Ollama from ollama.ai
# Start the server
ollama serve
# Pull an embedding model
ollama pull chroma/all-minilm-l6-v2-f32
Supported Models:
| Model | Dimensions |
|---|---|
chroma/all-minilm-l6-v2-f32 (default) | 384 |
nomic-embed-text | 768 |
mxbai-embed-large | 1024 |
snowflake-arctic-embed | Variable |
Sources: clients/new-js/packages/ai-embeddings/ollama/README.md
Morph Embeddings
Morph provides embeddings optimized for code-related content:
const morphEmbedding = new MorphEmbeddingFunction({
api_key: 'your-morph-api-key',
model_name: 'morph-embedding-v2',
api_base: 'https://api.morphllm.com/v1',
encoding_format: 'float'
});
Sources: clients/new-js/packages/ai-embeddings/morph/README.md
Chroma Cloud Qwen
Hosted embedding service using Qwen models:
const qwenEmbedding = new QwenEmbeddingFunction({
model: 'Qwen/Qwen3-Embedding-0.6B',
task: 'document' // or 'query'
});
Configuration includes:
model: The Qwen model to usetask: Task type (document or query embedding)instruction_dict: Custom instructions for specific tasksapiKeyEnvVar: Environment variable for API key (default:CHROMA_API_KEY)
Sources: clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/README.md
Collection Integration
Embedding Function in Collections
When creating a collection, the embedding function can be specified at multiple levels:
const collection = await chroma.createCollection({
name: "my-collection",
embeddingFunction: openAIEF // Specify embedding function
});
Space Configuration
Embedding functions can define supported distance spaces and default configurations:
if (overallEf && overallEf.defaultSpace && overallEf.supportedSpaces) {
if (configuration?.hnsw === undefined && configuration?.spann === undefined) {
configuration.hnsw = { space: overallEf.defaultSpace() };
}
}
The system validates that configured spaces are supported by the embedding function and warns if mismatches occur:
Space 'cosine' is not supported by embedding function 'openai'.
Supported spaces: cosine, euclidean, dotproduct
Sources: clients/new-js/packages/chromadb/src/collection-configuration.ts
Query Response Structure
Include Parameter
Queries support specifying which data to include in results through the Include parameter:
pub enum Include {
Distance,
Document,
Embedding,
Metadata,
Uri,
}
Default Inclusion Behavior:
| Operation | Default Includes |
|---|---|
| Query | Document, Metadata, Distance |
| Get | Document, Metadata |
Include List Methods:
| Method | Returns |
|---|---|
IncludeList::empty() | No includes |
IncludeList::default_query() | Document, Metadata, Distance |
IncludeList::default_get() | Document, Metadata |
IncludeList::all() | All five include types |
Sources: rust/types/src/api_types.rs
Usage Patterns
Basic Usage with JavaScript Client
import { ChromaClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";
const chroma = new ChromaClient();
const embeddingFunction = new OpenAIEmbeddingFunction({
apiKey: process.env.OPENAI_API_KEY
});
const collection = await chroma.createCollection({
name: "documents",
embeddingFunction: embeddingFunction
});
await collection.add({
ids: ["doc-1", "doc-2"],
documents: ["Document content here", "Another document"],
metadatas: [{ source: "notion" }, { source: "google-docs" }]
});
const results = await collection.query({
queryTexts: ["Search query"],
nResults: 2
});
Python Client Usage
import chromadb
client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.create_collection("documents")
collection.add(
documents=["Document 1", "Document 2"],
metadatas=[{"source": "notion"}, {"source": "google-docs"}],
ids=["doc1", "doc2"],
embeddings=[[1.2, 2.1, ...], [1.2, 2.1, ...]]
)
results = collection.query(
query_texts=["Query document"],
n_results=2
)
Sources: clients/new-js/packages/chromadb/README.md
Environment Detection
Embedding functions automatically detect the runtime environment to select the appropriate HTTP client:
import { isBrowser } from '@chroma-core/ai-embeddings-common';
if (isBrowser()) {
// Use browser-compatible fetch
} else {
// Use Node.js HTTP client
}
This enables packages like Ollama to work seamlessly in both browser and Node.js environments:
This package works in both Node.js and browser environments, automatically detecting the runtime and using the appropriate Ollama client.
Sources: clients/new-js/packages/ai-embeddings/ollama/README.md
Type Safety
The embedding function system provides TypeScript types and interfaces for:
- Configuration validation
- Response parsing
- Error handling
- Provider-specific options
export const getSparseEmbeddingFunction = async (
client: ChromaClient,
efConfig?: EmbeddingFunctionConfiguration
) => {
// Returns SparseEmbeddingFunction instance or undefined
};
Sources: clients/new-js/packages/chromadb/src/embedding-function.ts
Summary
Embedding Functions Integration in Chroma provides a unified, extensible system for text vectorization. Key aspects include:
- Provider Abstraction: Standardized interface across multiple embedding providers
- Dynamic Loading: Packages loaded on-demand based on collection configuration
- Schema Validation: JSON schema-based configuration validation
- Cross-Platform: Support for both Node.js and browser environments
- Flexible Configuration: Provider-specific options with sensible defaults
- Space Support: Distance metric configuration aligned with embedding provider capabilities
The plugin architecture allows Chroma to integrate new embedding providers while maintaining API consistency across the SDK.
Sources: [clients/new-js/packages/ai-embeddings/common/README.md](https://github.com/chroma-core/chroma/blob/main/clients/new-js/packages/ai-embeddings/common/README.md)
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
The project may affect permissions, credentials, data exposure, or host boundaries.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 6 source-linked risk signals. Review them before installing or handing real data to the project.
1. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:546206616 | https://github.com/chroma-core/chroma | README/documentation is current enough for a first validation pass.
2. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | last_activity_observed missing
3. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium
4. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:546206616 | https://github.com/chroma-core/chroma | no_demo; severity=medium
5. Maintenance risk: issue_or_pr_quality=unknown
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | issue_or_pr_quality=unknown
6. Maintenance risk: release_recency=unknown
- Severity: low
- Finding: release_recency=unknown。
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:546206616 | https://github.com/chroma-core/chroma | release_recency=unknown
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using chroma with real data or production workflows.
- [[Bug]: metadata filter does not work over 20 millions chunk.](https://github.com/chroma-core/chroma/issues/4089) - github / github_issue
- [[Bug]: PersistentClient second-opener hangs ~16 minutes on shared persis](https://github.com/chroma-core/chroma/issues/7040) - github / github_issue
- [[Security] Unsafe pickle.load() in PersistentLocalHnswSegment enables ar](https://github.com/chroma-core/chroma/issues/6926) - github / github_issue
- query(where=...) raises 'Error finding id' after batched adds until WAL - github / github_issue
- 1.5.9 - github / github_release
- foundation-cli-v0.1.0-alpha.3 - github / github_release
- 1.5.8 - github / github_release
- 1.5.7 - github / github_release
- 1.5.6 - github / github_release
- 1.5.5 - github / github_release
- README/documentation is current enough for a first validation pass. - GitHub / issue
Source: Project Pack community evidence and pitfall evidence