Doramagic Project Pack · Human Manual
qdrant
Qdrant's architecture follows a modular design with clear separation between storage, indexing, querying, and distributed coordination layers.
Introduction to Qdrant
Related topics: System Architecture, REST and gRPC API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, REST and gRPC API
Introduction to Qdrant
Qdrant is an open-source vector similarity search engine written in Rust, designed for high-performance nearest neighbor search in high-dimensional vector spaces. It serves as the core engine for AI applications requiring semantic search, recommendation systems, anomaly detection, and retrieval-augmented generation (RAG).
Architecture Overview
Qdrant's architecture follows a modular design with clear separation between storage, indexing, querying, and distributed coordination layers.
graph TD
subgraph "Client Layer"
REST[REST API]
gRPC[gRPC API]
end
subgraph "Core Engine"
API[lib/api]
COLLECTION[lib/collection]
end
subgraph "Storage Layer"
SHARD[lib/shard]
SEGMENT[lib/segment]
GRID[lib/gridstore]
end
subgraph "Common Utilities"
COMMON[lib/common]
TRIFIFO[lib/trififo]
end
REST --> API
gRPC --> API
API --> COLLECTION
COLLECTION --> SHARD
SHARD --> SEGMENT
SEGMENT --> GRID
COLLECTION --> COMMON
SHARD --> COMMONHigh-Level Component Responsibilities
| Component | Purpose |
|---|---|
lib/api | REST and gRPC API definitions, request validation, and schema |
lib/collection | Collection management, shard coordination, and operations |
lib/shard | Individual shard operations, WAL management, and segment holder |
lib/segment | Vector indexing (HNSW), quantization, and segment data structures |
lib/gridstore | Memory-mapped storage engine for persistent data |
lib/common | Shared utilities: memory management, mmap, CPU detection, rate limiting |
Source: lib/segment/src/lib.rs:1-15
Deployment Modes
Qdrant supports two deployment modes to accommodate different use cases.
Qdrant Server
The standard client-server deployment where Qdrant runs as a standalone service. Clients communicate via REST or gRPC APIs over HTTP.
graph LR
Client1[Python Client]
Client2[Rust Client]
Client3[Java Client]
QdrantServer[Qdrant Server<br/>:6333 REST<br/>:6334 gRPC]
Storage[(./storage)]
Client1 --> QdrantServer
Client2 --> QdrantServer
Client3 --> QdrantServer
QdrantServer --> StorageSource: README.md
Qdrant Edge
A lightweight, in-process vector search engine designed for embedded devices, autonomous systems, and mobile agents. Unlike the server mode, Edge runs inside the application process with local data storage.
from qdrant_edge import Distance, EdgeConfig, EdgeVectorParams, EdgeShard, Point, UpdateOperation
shard = EdgeShard.create("./shard", EdgeConfig(
vectors={"my-vector": EdgeVectorParams(size=4, distance=Distance.Cosine)}
))
shard.update(UpdateOperation.upsert_points([
Point(id=1, vector={"my-vector": [0.1, 0.2, 0.3, 0.4]}, payload={"color": "red"})
]))
The Edge variant is built from an amalgamation of core libraries, compiled as a single distributable package.
Source: lib/edge/publish/amalgamate.py
Core Data Structures
Points
The fundamental data unit in Qdrant is a Point, which consists of:
- ID: Unique identifier for the point
- Vector(s): One or more dense vectors associated with the point
- Payload: Optional key-value metadata for filtering and organization
classDiagram
class Point {
+id: PointId
+vectors: Vectors
+payload: Payload
}
class Vectors {
+vectors: Vec~Vector~
+named: HashMap~String, Vector~
}
class Payload {
+fields: HashMap~String, Value~
}
Point *-- Vectors
Point *-- PayloadSource: lib/segment/src/data_types/mod.rs
Segments
Segments are the fundamental storage unit within shards. They contain a portion of the collection's points and can be in different states (indexed, raw, or partially optimized).
| Segment Type | Description |
|---|---|
| Indexed | Full HNSW index built, optimized for search |
| Raw | No index, requires full scan for search |
| Indexing | Index build in progress |
| Mmap | Memory-mapped segment for memory-efficient access |
Source: lib/shard/src/lib.rs:1-30
Indexing and Search
HNSW Index
Qdrant uses Hierarchical Navigable Small World (HNSW) graphs as the primary index structure for approximate nearest neighbor (ANN) search.
Key HNSW parameters:
| Parameter | Description | Impact |
|---|---|---|
m | Number of bi-directional links per node | Memory usage, recall |
ef_construction | Search width during index build | Build time, recall |
ef | Search width during query | Search speed, recall |
full_scan_threshold | Point count threshold for switching to brute force | Small dataset optimization |
Quantization
Qdrant supports multiple quantization strategies to reduce memory footprint and improve search speed:
| Method | Compression Ratio | Use Case |
|---|---|---|
| Scalar | Up to 4× | General purpose, good accuracy |
| Binary | 32× | High-dimensional vectors (>1024d) |
| Product Quantization (PQ) | Configurable | Large datasets, trade-off accuracy |
| TurboQuant | >32× | Aggressive compression (ICLR 2026) |
Community Note: TurboQuant is an emerging feature (see GitHub Issue #8670) that addresses limitations with existing quantization methods. Current quantization options don't provide an optimal path for aggressive compression without significant accuracy trade-offs. Source: Community Context - Issue #8524
Configuration
Qdrant behavior is controlled via config.yaml. Key configuration sections include:
storage:
storage_path: ./storage
snapshots_path: ./snapshots
on_disk_payload: true # Keep payloads on disk to save RAM
telemetry:
# Telemetry collection settings
Source: config/config.yaml
Storage Settings
| Setting | Type | Default | Description |
|---|---|---|---|
storage_path | string | ./storage | Primary data directory |
snapshots_path | string | ./snapshots | Snapshot storage location |
on_disk_payload | boolean | true | Keep payloads on disk |
temp_path | string | null | Temporary file storage |
Collection Operations
Collections are top-level organizational units that group related points. The collection module (lib/collection) handles:
- Collection creation and deletion
- Shard distribution and replication
- Operation routing and coordination
- Collection state management
graph TD
CreateCollection[Create Collection] --> DefineSchema[Define Schema<br/>Vector params, indexes]
DefineSchema --> DistributeShards[Distribute Shards]
DistributeShards --> InitializeWAL[Initialize WAL]
InitializeWAL --> Ready[Collection Ready]Source: lib/collection/src/lib.rs:1-25
Query Operations
Qdrant provides multiple query types:
| Operation | Description |
|---|---|
| Search | Find nearest vectors by similarity |
| Recommend | Find similar to given points |
| Discover | Explore in direction from given points |
| Scroll | Iterate through points sequentially |
| Count | Count matching points |
| Facet | Group and count by field values |
| Filter | Apply payload-based filters |
Relevance Feedback
Introduced in v1.17.0, relevance feedback allows improving search results based on user interactions, enabling continuous learning from user behavior.
Source: Community Context - v1.17.0 Release
Client Libraries
Qdrant provides official and community client libraries:
| Language | Repository |
|---|---|
| Python | qdrant-client |
| Rust | Built-in (qdrant crate) |
| TypeScript/JS | qdrant-js |
| Java | java-client |
| .NET/C# | qdrant-dotnet |
| PHP | qdrant-php (community) |
Source: README.md
Known Issues and Limitations
Flaky Tests
The community has reported several flaky tests related to quantized HNSW search, primarily in lib/segment/tests/integration/hnsw_quantized_search_test.rs. These tests occasionally fail with score comparison assertions:
hnsw_turbo_quantization_cosine_larger_bits2_testhnsw_turbo_quantization_cosine_larger_testhnsw_quantized_search_manhattan_testhnsw_quantized_search_euclid_test
These are tracked in Issues #8735, #8801, #8834, and #8835.
Feature Requests
Notable community requests include:
- Adding new vector fields after collection creation (Issue #1132): Currently, all vector fields must be defined at collection creation time.
- Delete vectors for deleted points (Issue #2550): Requests better handling of deleted vectors for optimizer and query planning.
- ColBERT/Late Interaction support (Issue #3684): Tracking multi-vector storage integration for late-interaction retrieval models.
Recent Improvements (v1.16 - v1.18)
| Version | Key Improvements |
|---|---|
| v1.18.1 | Refactored quantized multi-vector scorers for io_uring support, vector dimension validation before WAL write |
| v1.17.1 | Non-blocking Gridstore flushes, deferred point updates optimization |
| v1.17.0 | Relevance Feedback API, optimization progress reporting |
| v1.16.3 | Request timeout handling for telemetry and metrics |
| v1.16.2 | Critical WAL bug fix, user agent headers |
| v1.16.1 | 3× faster batch queries, RocksDB to Gridstore migration |
Source: Community Context - Release Notes
Contributing
All pull requests must target the dev branch. The master branch is reserved for releases only.
For detailed contribution guidelines, see CONTRIBUTING.md.
Source: https://github.com/qdrant/qdrant / Human Manual
System Architecture
Related topics: Introduction to Qdrant, Data Flow and Update Pipeline, REST and gRPC API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction to Qdrant, Data Flow and Update Pipeline, REST and gRPC API
System Architecture
Qdrant is a vector similarity search engine designed for high-performance vector search in production environments. The system architecture follows a layered design that separates concerns between API handling, collection management, storage, and core indexing operations. This modular structure enables Qdrant to scale efficiently while supporting diverse deployment scenarios from embedded devices to distributed clusters.
Core Library Structure
The Qdrant system is built upon several foundational libraries that provide the essential functionality for vector search operations.
Segment Library (`lib/segment`)
The segment library is the core indexing and storage engine of Qdrant. It encapsulates all low-level operations related to vector storage, HNSW index construction, and query execution.
lib/segment/src/lib.rs
├── common # Shared utilities and common types
├── entry # Entry point abstractions
├── fixtures # Testing utilities (with `testing` feature)
├── id_tracker # Internal/external ID mapping
├── index # HNSW and other index implementations
├── payload_storage # Payload data storage
├── segment # Core segment implementation
├── segment_constructor # Segment building utilities
├── spaces # Vector space definitions (cosine, dot, euclidean, etc.)
├── telemetry # Performance monitoring
├── data_types # Structured data type definitions
├── json_path # JSON path parsing for payload queries
├── types # Core type definitions
├── utils # General utility functions
└── vector_storage # Vector storage implementations
The segment library manages the fundamental unit of data organization in Qdrant. Each segment contains a subset of points with their associated vectors and payloads, managed independently for parallel processing during searches.
Collection Library (`lib/collection`)
The collection library provides higher-level abstractions for managing groups of segments and coordinating distributed operations.
lib/collection/src/
├── config.rs # Collection configuration and state management
└── operations/
├── generalizer/mod.rs # Trait for removing vector details from structures
├── count.rs # Point counting operations
├── facet.rs # Faceted search operations
├── matrix.rs # Matrix-based operations
├── points.rs # Point manipulation
├── query.rs # Query execution
└── update_persisted.rs # Persistence operations
The Generalizer trait provides an interface for removing vectors and payloads from structures, making them lightweight for transmission and caching. This abstraction is essential for generalizing requests by stripping vector-specific details and replacing payloads with keys and length indications.
GridStore (`lib/gridstore`)
GridStore is Qdrant's custom storage engine designed for high-throughput vector operations with optional compression.
lib/gridstore/src/
├── pages.rs # Page-based storage management
├── config.rs # Storage configuration
└── bitmask/mod.rs # Block allocation bitmask
GridStore uses a hierarchical storage model with configurable page, block, and region sizes. The system defaults are optimized for typical vector workloads:
| Parameter | Default | Description |
|---|---|---|
| Page Size | 32MB | Size of each storage page |
| Block Size | 128 bytes | Smallest allocatable unit |
| Region Size | 8192 blocks | Management unit within pages |
| Compression | LZ4 | Default compression algorithm |
Source: lib/gridstore/src/config.rs
The bitmask system tracks block allocation within pages, with one bit per block. This enables efficient free-space tracking and allocation operations.
Storage Architecture
Memory-Mapped I/O
Qdrant extensively uses memory-mapped files for vector and payload storage, enabling efficient zero-copy data access while leveraging OS page cache for I/O optimization.
graph TD
A[Memory-Mapped File] --> B[Page Cache]
B --> C[Disk Storage]
D[Search Query] --> E[Segment]
E --> F[Vector Storage]
F --> A
G[madvise MADV_POPULATE_READ] --> H[Readahead Pages]
H --> BThe system implements several optimization strategies for memory-mapped data:
- Populate Read (
MADV_POPULATE_READon Linux): Pre-populates the page cache with expected read data before query execution, reducing page fault latency. - Readahead Control: Uses
will_need_multiple_pages()to trigger coordinated prefetching across multi-page regions, avoiding per-page I/O operations. - Sequential Access Hints: Applies
MADV_SEQUENTIALfor bulk data loading operations.
Source: lib/common/common/src/mmap/advice.rs
Page-Based Storage Model
The GridStore implementation uses a sophisticated page-based storage architecture:
graph TD
A[ValuePointer] --> B[Page ID]
A --> C[Block Offset]
A --> D[Length]
E[Pages Manager] --> F[Page 1]
E --> G[Page 2]
E --> H[Page N]
F --> I[Region 1]
F --> J[Region 2]
I --> K[Block 0..N]Values spanning multiple pages are handled through range-based writes, where the system calculates page boundaries and offset ranges for each affected page. This enables efficient storage of variable-length vectors across page boundaries.
Source: lib/gridstore/src/pages.rs
Segment Architecture
Segment Components
A segment is the fundamental unit of data organization in Qdrant. Each segment manages:
- ID Tracker: Maps between internal sequential IDs and external point IDs
- Vector Storage: Stores vector data with configurable quantization
- Payload Storage: Stores structured payload data with optional indexing
- Index Structures: HNSW graphs and payload indexes
graph LR
A[Segment] --> B[ID Tracker]
A --> C[Vector Storage]
A --> D[Payload Storage]
A --> E[Index Structures]
C --> F[Raw Vectors]
C --> G[Quantized Vectors]
D --> H[Payload Data]
D --> I[Field Indexes]Segment Operations
The segment implementation provides core operations for search and data management:
// Key segment operations from segment_ops.rs
pub fn check_data_consistency(&self) -> OperationResult<()>
pub fn create_field_index(...) -> OperationResult<bool>
Data consistency checking verifies:
- Internal IDs without external ID mappings
- External IDs without internal mappings
- Internal IDs without version information
- Internal IDs without vector data
Source: lib/segment/src/segment/segment_ops.rs
Data Types Module
The data_types module defines structured data types used throughout the segment layer:
lib/segment/src/data_types/
├── build_index_result.rs # Index construction results
├── collection_defaults.rs # Default configuration values
├── facets.rs # Faceted search data structures
├── groups.rs # Grouping operations
├── index.rs # Index-related types
├── manifest.rs # Serialization manifests
├── modifier.rs # Score modifiers
├── named_vectors.rs # Multi-vector support
├── order_by.rs # Ordering specifications
├── primitive.rs # Primitive type wrappers
├── query_context.rs # Query execution context
├── segment_record.rs # Record representations
├── tiny_map.rs # Compact map implementations
├── vector_name_config.rs # Named vector configuration
└── vectors.rs # Vector data structures
Source: lib/segment/src/data_types/mod.rs
API Layer Architecture
REST API
The REST API module provides HTTP-based access to Qdrant functionality:
lib/api/src/rest/
├── conversions.rs # gRPC to REST conversions
├── models.rs # REST API data models
├── schema.rs # OpenAPI schema definitions
└── validate.rs # Request validation
The REST layer handles:
- JSON serialization/deserialization
- Schema validation
- gRPC model conversion
- OpenAPI documentation generation
OpenAPI Specification
The OpenAPI specifications define the REST API contract using YAML templates with ytt (YAML Templating Tool):
openapi/openapi-main.ytt.yaml: Primary API endpoints including search, query, and facetsopenapi/openapi-collections.ytt.yaml: Collection management operations
Key API capabilities exposed through REST:
| Endpoint Category | Operations |
|---|---|
| Collections | Create, update, delete, list collections |
| Points | Insert, update, delete, retrieve points |
| Search | Vector similarity search with filters |
| Query | Unified query interface combining all search modes |
| Facet | Payload value distribution counts |
| Aliases | Collection alias management |
Deployment Models
Full Server Deployment
The complete Qdrant server deployment includes:
- gRPC API: High-performance binary protocol for internal and client communications
- REST API: HTTP-based access for web interfaces and cross-platform clients
- Distributed Coordination: Shard management and consensus for multi-node deployments
- Optimizer: Background optimization and compaction processes
Embedded Deployment (Qdrant Edge)
Qdrant Edge provides an amalgamated, in-process vector search engine optimized for embedded devices and autonomous systems:
# Build process from amalgamate.py
AMALGAMATION / "Cargo.toml" # Unified package manifest
AMALGAMATION / "src/lib.rs" # Re-exports from edge module
The edge variant combines all necessary components into a single library with:
- No external service dependencies
- Minimal memory footprint
- Configurable feature selection
- Simplified deployment for edge computing scenarios
Source: lib/edge/publish/amalgamate.py
Data Flow Architecture
graph TD
subgraph "Ingestion Path"
A[REST/gRPC Request] --> B[API Layer]
B --> C[Collection Manager]
C --> D[Segment Constructor]
D --> E[Write-Ahead Log]
E --> F[Mutable Segment]
end
subgraph "Query Path"
G[Query Request] --> H[Query Planner]
H --> I[Segment Selector]
I --> J[Parallel Segment Search]
J --> K[Result Merger]
K --> L[Response]
end
subgraph "Optimization Path"
M[Optimizer] --> N[Segment Compaction]
N --> O[Immutable Segments]
O --> P[Index Merging]
end
F --> |flush| O
G --> |routing| CCollection Configuration
Collections maintain configuration state including:
- Vector Configuration: Vector dimensions, distance metrics, storage options
- Optimizers: Background optimization settings
- Params: HNSW and quantization parameters
- Metadata: Application-specific information
Configuration is persisted using atomic file operations:
pub fn save(&self, path: &Path) -> CollectionResult<()>
pub fn load(path: &Path) -> CollectionResult<Self>
pub fn check(path: &Path) -> bool
Source: lib/collection/src/config.rs
Key Architectural Patterns
Module Organization
The codebase follows a consistent module structure pattern:
| Pattern | Purpose | Example |
|---|---|---|
| Feature Gates | Optional functionality | #[cfg(feature = "testing")] |
| Re-export Modules | Public API surface | pub use edge::* in lib.rs |
| Separation of Concerns | Layer isolation | API, Collection, Segment, Storage |
| Trait-based Abstractions | Polymorphism | Generalizer trait for data transformation |
Generalizer Pattern
The Generalizer trait enables efficient data transfer by stripping detailed vector information while preserving structural metadata:
pub trait Generalizer {
fn remove_details(&self) -> Self;
}
This pattern is used for:
- Caching query results with reduced memory footprint
- Transmitting metadata without full payloads
- Cross-shard communication optimization
Source: lib/collection/src/operations/generalizer/mod.rs
Related Documentation
For more information on related topics:
- Quantization: See TurboQuant tracking issue #8670 for aggressive compression options
- Multi-vector Support: Refer to ColBERT tracking issue #3684 for late interaction models
- Deployment: Consult the Qdrant Edge documentation for embedded deployment scenarios
Source: https://github.com/qdrant/qdrant / Human Manual
HNSW Index Implementation
Related topics: Vector Storage, Quantization System
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Vector Storage, Quantization System
HNSW Index Implementation
The Hierarchical Navigable Small World (HNSW) index is the primary vector similarity search algorithm in Qdrant. It provides fast approximate nearest neighbor (ANN) search with configurable accuracy/speed tradeoffs, supporting multiple distance metrics and quantization strategies.
Architecture Overview
The HNSW implementation in Qdrant is organized as a multi-layer graph structure where each vector is inserted at multiple levels of the hierarchy. The upper layers form a sparse skip list enabling fast traversal, while the bottom layer contains all vectors connected in a dense small-world graph.
graph TD
subgraph UpperLayers["Upper Layers (L3, L2, L1)"]
L3_1["Node A"]
L3_2["Node B"]
L2_1["Node A"]
L2_2["Node B"]
L2_3["Node C"]
L1_1["Node A"]
L1_2["Node B"]
L1_3["Node C"]
L1_4["Node D"]
end
subgraph BottomLayer["Bottom Layer (L0)"]
BL_A["Node A"]
BL_B["Node B"]
BL_C["Node C"]
BL_D["Node D"]
BL_E["Node E"]
end
L3_1 --> L2_1
L3_2 --> L2_2
L2_1 --> L1_1
L2_2 --> L1_2
L2_3 --> L1_3
L2_2 --> L1_4
L1_1 --> BL_A
L1_2 --> BL_B
L1_3 --> BL_C
L1_4 --> BL_D
BL_C --> BL_D
BL_D --> BL_E
BL_B --> BL_CCore Components
| Component | File | Purpose |
|---|---|---|
HNSWIndex | hnsw.rs | Main entry point, coordinates build and search |
GraphLayers | graph_layers.rs | In-memory graph representation |
GraphLayersBuilder | graph_layers_builder.rs | Constructs the HNSW graph incrementally |
SearchContext | search_context.rs | Handles search traversal and scoring |
HnswGlobalConfig | config.rs | Configuration parameters |
FilteredScorer | graph_layers.rs | Scores candidates during search |
Configuration Parameters
The HNSW index is configured via HnswGlobalConfig:
| Parameter | Type | Default | Description |
|---|---|---|---|
m | usize | 16 | Maximum connections per layer |
ef_construct | usize | 100 | Construction beam width |
full_scan_threshold | usize | 10000 | Minimum points to use HNSW instead of brute force |
on_disk | bool | None | Whether to store index on disk |
index | HnswIndexConfig | - | Index-specific settings |
Source: lib/segment/src/index/hnsw_index/config.rs
HnswM Parameter
The M parameter controls the maximum number of connections in the graph:
pub enum HnswM {
M16, // Default, 16 connections
M32, // Higher accuracy, more memory
}
Source: lib/segment/src/index/hnsw_index/hnsw/build.rs
Graph Construction
The graph is built incrementally using a modified NSW algorithm. Each inserted vector is assigned a random level l where the probability of being at level l decreases exponentially.
flowchart TD
A[Insert Vector] --> B{Generate Random Level}
B --> C[Calculate max_level]
C --> D[Search Upper Layers<br/>ef = ef_construct]
D --> E[Find Entry Point]
E --> F[For each level l from max_level to 0]
F --> G[Search Layer l<br/>ef = ef_construct]
G --> H[Connect to nearest neighbors<br/>M connections max]
H --> I{Next level?}
I -->|Yes| F
I -->|No| J[Insert Complete]Construction Algorithm
The build function in lib/segment/src/index/hnsw_index/hnsw/build.rs handles the complete construction process:
- Level Assignment: Vectors are assigned to levels using an exponential distribution
- Upper Layer Traversal: Starting from entry point, traverse upward finding closest entry point at each level
- Greedy Search: At each layer, perform greedy search connecting to
Mnearest neighbors - Heuristic Refinement: Optionally use heuristics to improve graph connectivity
Source: lib/segment/src/index/hnsw_index/hnsw/build.rs
Benchmark Configuration
The default benchmark configuration for graph construction:
| Parameter | Value | Description |
|---|---|---|
NUM_VECTORS | 10000 | Number of vectors to index |
DIM | 32 | Vector dimensionality |
M | 16 | Maximum connections |
EF_CONSTRUCT | 64 | Construction beam width |
USE_HEURISTIC | true | Enable heuristic optimization |
Source: lib/segment/benches/hnsw_build_graph.rs
Search Algorithm
HNSW search works by traversing from the top layer down to the bottom, using a best-first search strategy with an error-bounded priority queue.
sequenceDiagram
participant Query
participant SearchContext
participant GraphLayers
participant VectorStorage
Query->>SearchContext: search(query_vector, ef, filter)
SearchContext->>GraphLayers: get_entry_point()
GraphLayers-->>SearchContext: entry_point
loop For each level from top to bottom
SearchContext->>SearchContext: search_layer(entry_point, ef)
SearchContext->>VectorStorage: score_points(visited_set)
VectorStorage-->>SearchContext: distances
SearchContext->>SearchContext: update_candidates(distances)
end
SearchContext-->>Query: Top-k resultsSearch Parameters
| Parameter | Description |
|---|---|
hnsw_ef | Search beam width (default: from config or 128) |
exact | If true, perform brute force exact search |
use_filters | Apply payload filters during search |
Source: lib/segment/src/index/hnsw_index/search_context.rs
GPU Acceleration
Qdrant supports GPU-accelerated HNSW indexing with NVIDIA and AMD GPUs:
#[cfg(feature = "gpu")]
use crate::index::hnsw_index::gpu::gpu_graph_builder::GPU_MAX_VISITED_FLAGS_FACTOR;
| Component | Purpose |
|---|---|
GpuInsertContext | GPU-based vector insertion |
gpu_graph_builder | GPU-accelerated graph construction |
get_gpu_groups_count | Determines available GPU resources |
Source: lib/segment/src/index/hnsw_index/hnsw/build.rs
Quantization Integration
HNSW integrates with Qdrant's quantization subsystem to enable compressed vector storage while maintaining search capability:
- Scalar Quantization: 4× compression with minimal accuracy loss
- Product Quantization (PQ): High compression with codebook-based scoring
- Binary Quantization: Maximum compression for high-dimensional vectors
- TurboQuant: Aggressive compression for extreme memory reduction
Known Issues
Multiple flaky tests exist in the quantized search test suite, primarily in lib/segment/tests/integration/hnsw_quantized_search_test.rs. These tests verify that quantized search returns scores consistent with full-precision search:
| Test Name | Issue |
|---|---|
hnsw_turbo_quantization_cosine_larger_bits2_test | Flaky: best_2.score >= best_1.score assertion |
hnsw_turbo_quantization_cosine_larger_test | Flaky: best_2.score >= best_1.score assertion |
hnsw_quantized_search_manhattan_test | Flaky: best_2.score >= best_1.score assertion |
hnsw_quantized_search_euclid_test | Flaky: best_2.score >= best_1.score assertion |
hnsw_turbo_quantization_dot_test | Flaky: best_2.score >= best_1.score assertion |
hnsw_turbo_quantization_manhattan_test | Flaky: best_2.score >= best_1.score assertion |
These tests may fail intermittently when quantization introduces numerical precision differences that cause the score ordering to differ slightly from full-precision results.
Source: lib/segment/src/index/hnsw_index/graph_layers.rs
Vector Index Implementation
The VectorIndex trait provides the public interface for HNSW operations:
pub trait VectorIndex {
fn search(
&self,
vectors: &QueryContext,
top: usize,
filter: Option<&Filter>,
search_runtime: &SearchRuntimeConfig,
timeout: StopCondition,
) -> OperationResult<Vec<Vec<PointId>>;
fn build_index(&mut self, args: VectorIndexBuildArgs) -> OperationResult<BuildIndexResult>;
}
Source: lib/segment/src/index/hnsw_index/hnsw/vector_index_impl.rs
Memory Management
HNSW indexes in Qdrant can be configured for different storage backends:
| Storage Type | Configuration | Use Case |
|---|---|---|
| In-Memory | Default | Maximum performance |
| Memory-Mapped | on_disk: true with mmap | Large indexes that exceed RAM |
| GridStore | New default (v1.16+) | Reduced tail latencies |
The GridStore backend provides non-blocking flushes to reduce search tail latencies, a feature introduced in v1.17.1.
Source: lib/segment/src/index/hnsw_index/hnsw.rs
Payload Filtering
HNSW search supports payload-based filtering through the Filter condition system:
pub trait PayloadIndex {
fn build_index(
&self,
field: PayloadKeyTypeRef,
payload_schema: &PayloadFieldSchema,
hw_counter: &HardwareCounterCell,
) -> OperationResult<BuildIndexResult>;
}
Payload indexes allow efficient filtering by indexing common payload fields (keywords, integers, geo, text, datetime) before or during HNSW search.
Source: lib/segment/src/index/payload_index_base.rs
Performance Considerations
Build Performance
- CPU: Multi-threaded construction with configurable parallelism
- GPU: Optional GPU acceleration for large-scale indexing
- Memory:
GPU_MAX_VISITED_FLAGS_FACTORcontrols GPU memory allocation
Search Performance
- ef Parameter: Higher values = more accurate but slower
- Quantization: Enables larger datasets in memory at cost of precision
- Payload Filters: Can significantly reduce effective search space
Known Limitations
- Adding new vector fields after collection creation is not supported (Issue #1132)
- Deleted vectors are not marked as deleted in the index (Issue #2550), which can affect optimizer and query planner efficiency
Source: https://github.com/qdrant/qdrant / Human Manual
Vector Storage
Related topics: HNSW Index Implementation, Quantization System, Storage Engine and Persistence
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: HNSW Index Implementation, Quantization System, Storage Engine and Persistence
Vector Storage
Vector Storage is a core subsystem within Qdrant's segment layer responsible for storing, managing, and querying vector embeddings. It provides the low-level infrastructure that enables efficient similarity search across dense, sparse, quantized, and multi-vector data types.
Overview
The Vector Storage module (lib/segment/src/vector_storage/) implements a layered architecture that separates storage concerns from query execution. This design enables Qdrant to support multiple vector representations while sharing common scoring logic.
graph TD
A[Vector Storage Module] --> B[Dense Vector Storage]
A --> C[Sparse Vector Storage]
A --> D[Quantized Vector Storage]
A --> E[Multi-Dense Vector Storage]
B --> F[Chunked Vector Storage]
B --> G[Volatile Chunked Vectors]
F --> H[Common Layer]
G --> H
D --> H
E --> H
H --> I[Raw Scorer]
H --> J[Query Scorer]
I --> K[Vector Storage Base]
J --> KModule Exports (mod.rs)
| Module | Purpose |
|---|---|
chunked_vectors | Fixed-size chunked vector storage |
common | Shared utilities and constants |
dense | Dense float vector storage implementation |
memory_reporter | Memory usage tracking |
multi_dense | Multi-vector (e.g., ColBERT) storage |
prefill_deleted | Deleted vector tracking for prefetching |
quantized | Quantized/compressed vector storage |
query | Query construction and processing |
query_scorer | Scorer implementations for queries |
raw_scorer | Raw scoring without post-processing |
read_only | Read-only vector storage variants |
sparse | Sparse vector storage implementation |
vector_storage_base | Core trait definitions |
volatile_chunked_vectors | Ephemeral chunked vector storage |
Source: https://github.com/qdrant/qdrant / Human Manual
Quantization System
Related topics: HNSW Index Implementation, Vector Storage
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: HNSW Index Implementation, Vector Storage
Quantization System
Overview
The Quantization System in Qdrant provides vector compression capabilities to reduce memory footprint and accelerate similarity search operations. It encodes high-dimensional floating-point vectors into compact binary or low-precision representations, enabling efficient storage and fast approximate nearest neighbor (ANN) queries on resource-constrained deployments.
Quantization is a core performance optimization mechanism that trades a small amount of recall accuracy for significant gains in memory usage and query throughput. The system supports multiple quantization strategies and integrates with Qdrant's HNSW index structure for accelerated retrieval.
Architecture
Core Components
The quantization system is implemented as a dedicated library module located in lib/quantization/. The architecture follows a trait-based design pattern that allows different quantization methods to share a common interface.
graph TD
A[Quantization Module] --> B[EncodedVectors Trait]
A --> C[Scalar Quantization]
A --> D[Product Quantization]
A --> E[Binary Quantization]
A --> F[TurboQuant]
B --> G[EncodedVectorsPQ]
B --> H[EncodedVectorsTurboQuant]
F --> I[Quantization Algorithm]
F --> J[Lookup Tables]
K[HNSW Index] --> L[Quantized Segment Search]
L --> BQuantization Library Structure
lib/quantization/src/
├── lib.rs # Module root, traits, and public API
├── encoded_vectors.rs # Core trait for encoded vector representations
├── encoded_vectors_pq.rs # Product Quantization implementation
└── turboquant/
├── mod.rs # TurboQuant module
└── quantization.rs # TurboQuant encoding algorithm
Quantization Types
Scalar Quantization
Scalar quantization converts each vector component from 32-bit float to a lower precision integer representation. This provides up to 4× compression (32-bit → 8-bit) while maintaining reasonable search quality.
| Compression | Bits per Component | Memory Reduction |
|---|---|---|
| Full Float | 32 bits | 1× |
| Int8 | 8 bits | 4× |
| Int4 | 4 bits | 8× |
| Int2 | 2 bits | 16× |
Product Quantization (PQ)
Product Quantization divides vectors into subvectors and clusters each subspace independently, encoding each with a codebook index. This approach is particularly effective for high-dimensional vectors.
| Parameter | Description |
|---|---|
| Codebook Size | Number of centroids per subspace (typical: 256) |
| Subspace Count | Number of divisions of the original vector |
| Compression Ratio | Determined by codebook size and subspace count |
Binary Quantization
Binary quantization converts vectors to binary strings (0/1), providing extreme compression. It works best with high-dimensional vectors (≥1024 dimensions) where the Hamming distance can approximate cosine similarity.
TurboQuant (ICLR 2026)
TurboQuant represents an advanced quantization approach designed for aggressive compression without significant quality degradation. The system implements novel encoding techniques that maintain search quality even at extreme compression ratios.
TurboQuant is currently under active development with ongoing improvements to multi-vector scorer support and io_uring integration. Source: lib/quantization/src/turboquant/mod.rs
Core API
EncodedVectors Trait
The EncodedVectors trait defines the interface for all quantized vector implementations:
pub trait EncodedVectors: VectorStorageEnum {
fn storage_size_bytes(&self) -> usize;
fn len(&self) -> usize;
fn get_quantized_vector(&self, key: PointOffsetType) -> &QuantizedVector;
fn from_offsets_and_typed_data(
offsets: ByteStoredVec<usize>,
data: ByteStorageType,
) -> Self;
}
Vector Storage Integration
Quantized vectors integrate with the segment's vector storage layer through the following hierarchy:
Source: lib/segment/src/data_types/mod.rs
graph LR
A[VectorStorage] --> B[VectorStorageEnum]
B --> C[PlainVectorStorage]
B --> D[QuantizedVectorStorage]
D --> E[EncodedVectorsPQ]
D --> F[EncodedVectorsTurboQuant]Configuration
Quantization Parameters
Quantization is configured at the collection level through the QuantizationConfig structure:
| Parameter | Type | Default | Description |
|---|---|---|---|
quantization | Enum | None | Quantization type selection |
vector_storage | VectorParams | Per-vector | Storage configuration |
hnsw | HnswConfigDiff | System default | Index parameters |
Search Configuration
During search operations, quantization behavior can be controlled:
| Parameter | Description |
|---|---|
quantization | Search-time quantization settings |
rescore | Enable/disable rescoring with full vectors |
oversampling | Search more candidates for better recall |
Integration with HNSW
Quantized HNSW Search
The HNSW index can leverage quantized vectors for both the graph structure and the candidates themselves. This enables:
- Memory-Efficient Graph Navigation: The HNSW graph stores quantized entry points
- Fast Candidate Scoring: Distances computed against quantized representations
- Optional Rescoring: Full-precision rescoring of top candidates
Source: lib/segment/tests/integration/hnsw_quantized_search_test.rs
Scoring with Quantized Vectors
The system implements specialized scorers for quantized multi-vector data, with recent improvements for io_uring support:
Source: lib/quantization/src/encoded_vectors_pq.rs
sequenceDiagram
participant Query as Query Vector
participant HNSW as HNSW Index
participant Quantized as Quantized Storage
participant Rescorer as Rescorer (Optional)
Query->>HNSW: Navigate graph
HNSW->>Quantized: Get candidates
Quantized-->>HNSW: Quantized distances
HNSW-->>Rescorer: Top-K candidates
alt Rescoring enabled
Rescorer->>Rescorer: Compute full-precision scores
Rescorer-->>Query: Final ranked results
else Direct return
Quantized-->>Query: Final ranked results
endPerformance Characteristics
Memory Savings
Quantization provides significant memory savings depending on the method:
| Method | Compression Ratio | Quality Retention |
|---|---|---|
| Scalar (Int8) | 4× | ~95-99% |
| Product Quantization | 8-64× | ~90-97% |
| Binary | 32× | ~85-95% (high-dim only) |
| TurboQuant | Variable | To be documented |
Query Latency
Quantized search typically reduces latency through:
- Reduced Memory Bandwidth: Smaller data to transfer from storage
- SIMD Optimization: Vectorized distance calculations
- Cache Efficiency: Better cache utilization with compressed data
Known Issues and Limitations
Flaky Tests
Several flaky tests have been reported in the HNSW quantized search test suite, particularly with TurboQuant:
hnsw_turbo_quantization_cosine_larger_bits2_test(Issue #8835)hnsw_turbo_quantization_cosine_larger_test(Issue #8801)hnsw_turbo_quantization_dot_test(Issue #8906)hnsw_turbo_quantization_manhattan_test(Issue #8834)hnsw_quantized_search_manhattan_test(Issue #8806)hnsw_quantized_search_euclid_test(Issue #8735)
These tests occasionally fail with the assertion best_2.score >= best_1.score, indicating potential issues with score ordering in quantized search results. The tests are located at lib/segment/tests/integration/hnsw_quantized_search_test.rs:314.
Quality vs Compression Tradeoff
As noted in community discussions (Issue #8524), the current quantization options present tradeoffs:
- Scalar quantization: Solid but tops out at 4× compression
- Binary quantization: Falls apart below 1024 dimensions
- Product Quantization: Requires codebook training and may underperform at high compression
TurboQuant aims to address these limitations with a novel approach designed for aggressive compression without major quality degradation.
Future Development
TurboQuant Tracking
Issue #8670 tracks the TurboQuant implementation progress. Current development focuses on:
- Improving multi-vector scorer compatibility
- Enhanced io_uring support for async I/O
- Validation of quantization parameters
Design documentation is available in the internal TurboQuant Design Doc.
Best Practices
When to Use Quantization
- Memory-Constrained Environments: When dataset exceeds available RAM
- High-Dimensional Vectors: When vectors have >512 dimensions
- Latency-Critical Applications: When search latency is prioritized over exact recall
- Cold Storage Optimization: For archived or infrequently accessed data
Configuration Recommendations
- Start with Scalar Quantization for a balanced tradeoff
- Use Product Quantization for high-dimensional data requiring >8× compression
- Avoid Binary Quantization for vectors under 1024 dimensions
- Enable Rescoring when recall is critical
- Monitor Quality Metrics with representative queries
Related Documentation
Source: https://github.com/qdrant/qdrant / Human Manual
Sharding and Replication
Related topics: Consensus and Cluster Coordination, System Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Consensus and Cluster Coordination, System Architecture
Sharding and Replication
Qdrant implements a distributed architecture that combines horizontal sharding with replication to achieve scalability, fault tolerance, and high availability. This document describes the sharding and replication system as implemented in the collection layer.
Overview
Sharding distributes data across multiple physical shards, each responsible for a subset of points based on a hash ring. Replication creates redundant copies of each shard across different peers to ensure durability and read availability.
The sharding system in Qdrant operates at the collection level. Each collection can be divided into N shards, with each shard having R replicas distributed across the cluster. Source: lib/collection/src/shards/mod.rs:1-50
graph TB
subgraph "Qdrant Cluster"
subgraph "Collection"
subgraph "Shard 0"
RS0[Replica Set<br/>Peer A:Active<br/>Peer B:Recovery]
RS1[Replica Set<br/>Peer C:Active]
end
subgraph "Shard 1"
RS2[Replica Set<br/>Peer A:Active<br/>Peer C:Recovery]
end
end
end
Client([Client Request])
Client --> RS0Core Components
Shard Types
Qdrant defines several shard types to handle different scenarios in distributed operations: Source: lib/collection/src/shards/mod.rs:1-50
| Shard Type | Purpose |
|---|---|
LocalShard | Primary storage for data on a peer; handles read/write operations |
RemoteShard | Proxy to a shard located on another peer |
ProxyShard | Wrapper that delegates operations to underlying shards |
QueueProxyShard | Proxy that queues operations for batch processing |
ForwardProxyShard | Proxy that forwards write operations |
DummyShard | Placeholder for shards not present on current peer |
Replica Set
The ReplicaSet manages multiple replicas of a single shard across different peers. It coordinates read/write distribution, replica health monitoring, and failover behavior.
Key responsibilities include:
- Tracking peer states for each replica
- Routing operations to appropriate replicas based on consistency requirements
- Managing replica state transitions
- Handling peer failures and recovery
Source: lib/collection/src/shards/replica_set/mod.rs
ShardHolder
The ShardHolder is the central coordinator for all shards within a collection. It maintains the mapping between shard IDs and their replica sets, handles shard operations, and provides the interface for collection-level operations. Source: lib/collection/src/shards/shard_holder/mod.rs:1-100
classDiagram
class ShardHolder {
+shards: HashMap~ShardId, ReplicaSet~
+hash_ring: HashRingRouter
+add_shard(shard_id, replica_set)
+remove_shard(shard_id)
+get_shard(shard_id)
+split_by_shard(operation)
}
class ReplicaSet {
+shard_id: ShardId
+peer_states: HashMap~PeerId, ReplicaState~
+this_peer_id: PeerId
+update_peer_state(peer, state)
+is_local(): bool
}
class Shard {
+shard_id: ShardId
+peer_id: PeerId
}
ShardHolder --> ReplicaSet
ReplicaSet --> ShardReplica States
Each replica in a replica set has a state that determines its role and readiness. The state machine ensures proper initialization, recovery, and failover handling. Source: lib/collection/src/shards/replica_set/mod.rs
| State | Description |
|---|---|
Active | Fully operational; accepts reads and writes |
Initializing | Being created or recovered from snapshot |
Dead | Peer is unreachable; replica unavailable |
PartialSnapshot | Partial snapshot received; incomplete data |
Recovery | Receiving updates to catch up |
ListenerOnly | Receives updates but not eligible for writes |
Resharding | Participating in resharding operation |
State Transitions
stateDiagram-v2
[*] --> Initializing
Initializing --> Recovery: Data transfer starts
Initializing --> Active: Immediate activation
Recovery --> Active: Sync complete
Recovery --> PartialSnapshot: Interrupted sync
Active --> Dead: Peer failure
Dead --> Recovery: Peer recovers
Active --> ListenerOnly: Demotion
ListenerOnly --> Active: Promotion
Active --> Resharding: Resharding begins
Resharding --> [*]: Resharding completesLocal Shard Initialization Handling
When a local shard is stuck in Initializing state on a single-node (non-distributed) deployment, the system automatically transitions it to Active state: Source: lib/collection/src/shards/shard_holder/mod.rs:40-60
// Change local shards stuck in Initializing state to Active
let not_distributed = !shared_storage_config.is_distributed;
let is_local = replica_set.this_peer_id() == local_peer_id && replica_set.is_local().await;
let is_initializing = replica_set.peer_state(local_peer_id) == Some(ReplicaState::Initializing);
if not_distributed && is_local && is_initializing {
log::warn!(
"Local shard {collection_id}:{} stuck in Initializing state, changing to Active",
replica_set.shard_id,
);
replica_set.set_replica_state(local_peer_id, ReplicaState::Active).await?;
}
Shard Transfer Operations
Shard transfers move data between peers, supporting cluster rebalancing and node replacement. The transfer mechanism handles three recovery stages: Source: lib/collection/src/shards/transfer/mod.rs
| Recovery Stage | Description |
|---|---|
Snapshot | Transfer via snapshot file (full copy) |
WalDelta | Transfer via Write-Ahead Log delta |
StreamRecords | Transfer via streaming records |
Transfer Workflow
sequenceDiagram
participant Coordinator
participant SourcePeer
participant TargetPeer
participant Consensus
Coordinator->>Consensus: Initiate transfer
Consensus-->>Coordinator: Transfer registered
Coordinator->>TargetPeer: Create shard (Initializing)
TargetPeer-->>Coordinator: Shard created
Coordinator->>SourcePeer: Start snapshot/stream
loop Transfer data
SourcePeer->>TargetPeer: Send records/snapshot
end
TargetPeer->>SourcePeer: Confirm sync complete
SourcePeer->>Consensus: Notify completion
Consensus->>TargetPeer: Set Active state
Coordinator->>SourcePeer: Set Dead stateResharding
Resharding changes the number of shards in a collection, either increasing (scale up) or decreasing (scale down) the shard count. This operation redistributes data across the hash ring.
Resharding Operations
The resharding process uses a dedicated state machine: Source: lib/collection/src/operations/cluster_ops.rs
| Operation | Description |
|---|---|
CreateShard | Create a new shard during scale-up |
MoveShard | Move shard from one peer to another |
MoveShardKey | Move all shards with specific key |
ReplicateShardKey | Add replicas for a shard key |
ReplicatePoints | Replicate points between shard keys |
FinishResharding | Complete resharding operation |
AbortResharding | Cancel resharding operation |
Resharding State
#[derive(Copy, Clone, Debug, Deserialize, Serialize)]
pub enum ReshardingStage {
/// Scale up, add a new shard
Up,
/// Scale down, remove a shard
Down,
}
Source: lib/collection/src/operations/cluster_ops.rs:1-50
Operation Distribution
Operations are distributed across shards based on the hash ring. The SplitByShard trait defines how each operation type is split: Source: lib/collection/src/operations/mod.rs:30-60
graph LR
Operation([Operation]) --> HashRing{Hash Ring Router}
HashRing -->|Point ID Hash| Shard0[Shard 0]
HashRing -->|Point ID Hash| Shard1[Shard 1]
HashRing -->|Point ID Hash| ShardN[Shard N]
Shard0 --> Result0[Result]
Shard1 --> Result1[Result]
ShardN --> ResultN[Result]
Result0 --> Merged[Merged Result]
Result1 --> Merged
ResultN --> MergedSplitByShard Implementation
impl SplitByShard for CollectionUpdateOperations {
fn split_by_shard(self, ring: &HashRingRouter) -> OperationToShard<Self> {
match self {
CollectionUpdateOperations::PointOperation(operation) => operation
.split_by_shard(ring)
.map(CollectionUpdateOperations::PointOperation),
CollectionUpdateOperations::VectorOperation(operation) => operation
.split_by_shard(ring)
.map(CollectionUpdateOperations::VectorOperation),
CollectionUpdateOperations::PayloadOperation(operation) => operation
.split_by_shard(ring)
.map(CollectionUpdateOperations::PayloadOperation),
}
}
}
Source: lib/collection/src/operations/mod.rs:30-60
Consistency Parameters
Qdrant supports configurable read and write consistency levels per request:
| Parameter | Description |
|---|---|
write_consistency_factor | Number of replicas that must acknowledge writes (default: 1) |
read_fan_out_factor | Number of replicas to query for reads |
read_fan_out_delay_ms | Delay before reading from non-primary replicas |
These parameters are defined in the collection configuration and can be validated against cluster state: Source: lib/collection/src/config.rs
Consensus Operations
Distributed operations that affect cluster state are coordinated through Raft consensus: Source: lib/storage/src/content_manager/mod.rs
| Consensus Operation | Purpose |
|---|---|
CollectionMeta | Collection-level metadata changes |
AddPeer | Register new peer |
RemovePeer | Remove peer from cluster |
UpdatePeerMetadata | Update peer information |
UpdateClusterMetadata | Update cluster-wide metadata |
RequestSnapshot | Request state snapshot |
ReportSnapshot | Report snapshot status |
Transfer Consensus Operations
impl ConsensusOperations {
pub fn abort_transfer(
collection_id: CollectionId,
transfer: ShardTransfer,
reason: &str,
) -> Self {
ConsensusOperations::CollectionMeta(Box::new(
CollectionMetaOperations::TransferShard(
collection_id,
ShardTransferOperations::Abort {
transfer: transfer.key(),
reason: reason.to_string(),
},
)
))
}
pub fn finish_transfer(
collection_id: CollectionId,
transfer: ShardTransfer,
) -> Self {
ConsensusOperations::CollectionMeta(Box::new(
CollectionMetaOperations::TransferShard(
collection_id,
ShardTransferOperations::Finish(transfer),
)
))
}
}
Source: lib/storage/src/content_manager/mod.rs:100-150
Shard Path Management
Shards are stored on disk with deterministic paths based on collection and shard identifiers:
/// Path to a shard directory
pub fn shard_path(collection_path: &Path, shard_id: ShardId) -> PathBuf {
collection_path.join(shard_id.to_string())
}
/// Path to a shard directory
pub fn shard_initializing_flag_path(collection_path: &Path, shard_id: ShardId) -> PathBuf {
collection_path.join(format!("shard_{shard_id}.initializing"))
}
Source: lib/collection/src/shards/mod.rs:40-55
Shard Snapshots
Snapshots can be created for individual shards, supporting point-in-time recovery and migration:
- Snapshot creation captures all data and WAL state
- Snapshots can be restored to any peer with matching configuration
- Shard snapshots are listed and managed via REST API endpoints
- Recovery type is recorded in snapshot manifest (
RecoveryType)
Source: lib/collection/src/shards/local_shard/mod.rs
Related Documentation
Source: https://github.com/qdrant/qdrant / Human Manual
Consensus and Cluster Coordination
Related topics: Sharding and Replication, Storage Engine and Persistence
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Sharding and Replication, Storage Engine and Persistence
Consensus and Cluster Coordination
Overview
Qdrant implements a distributed cluster architecture that enables horizontal scaling of vector search operations across multiple nodes. The cluster coordination system ensures data consistency, fault tolerance, and reliable state management through a consensus-based approach.
The consensus mechanism in Qdrant is built to handle:
- Collection management: Creating, updating, and deleting collections across the cluster
- Shard distribution: Distributing and migrating vector data shards between nodes
- Peer coordination: Managing node membership, peer metadata, and cluster topology
- State synchronization: Ensuring all nodes agree on the current cluster state
High-Level Architecture
Qdrant uses a Raft-based consensus protocol for cluster coordination. All state changes that affect the cluster (collection operations, shard transfers, peer updates) are communicated through the consensus layer.
graph TD
Client --> API[REST/gRPC API]
API --> CollectionMgr[Collection Manager]
CollectionMgr --> ConsensusLayer[Consensus Layer]
ConsensusLayer --> WAL[Consensus WAL]
ConsensusLayer --> RaftNode[Raft Node]
RaftNode <--> Peer1[Peer Node 1]
RaftNode <--> Peer2[Peer Node 2]
RaftNode <--> Peer3[Peer Node 3]
WAL --> PersistentState[Persistent State]
PersistentState --> ClusterMeta[Cluster Metadata]Consensus Operations
All cluster-level operations that require coordination are represented as ConsensusOperations. These operations are logged to the Consensus WAL and replicated across the cluster before being applied.
Operation Types
The ConsensusOperations enum defines all operations that pass through consensus:
| Operation | Description | Parameters |
|---|---|---|
CollectionMeta | Collection create/update/delete operations | Box<CollectionMetaOperations> |
AddPeer | Register a new peer in the cluster | peer_id, uri |
RemovePeer | Remove a peer from the cluster | peer_id |
UpdatePeerMetadata | Update metadata for a peer | peer_id, PeerMetadata |
UpdateClusterMetadata | Update cluster-wide metadata | key, value |
RequestSnapshot | Request a consensus state snapshot | - |
ReportSnapshot | Report snapshot status to peers | peer_id, SnapshotStatus |
Source: lib/storage/src/content_manager/mod.rs:1-50
Collection Meta Operations
Collection operations are wrapped in CollectionMetaOperations and include:
- CreateCollection: Initialize a new collection with specified vector parameters
- UpdateCollection: Modify collection configuration
- DeleteCollection: Remove a collection and its data
- TransferShard: Initiate shard migration between nodes
- AbortTransfer: Cancel an ongoing shard transfer
- FinishTransfer: Complete a shard transfer operation
#[derive(Debug, Deserialize, Serialize, PartialEq, Eq, Hash, Clone)]
pub enum ConsensusOperations {
CollectionMeta(Box<CollectionMetaOperations>),
AddPeer {
peer_id: PeerId,
uri: String,
},
RemovePeer(PeerId),
UpdatePeerMetadata {
peer_id: PeerId,
metadata: PeerMetadata,
},
UpdateClusterMetadata {
key: String,
value: serde_json::Value,
},
RequestSnapshot,
ReportSnapshot {
peer_id: PeerId,
status: SnapshotStatus,
},
}
Source: lib/storage/src/content_manager/mod.rs:25-45
Peer Metadata
Each peer maintains metadata that describes its properties:
#[derive(Clone, Debug, Eq, PartialEq, Hash, Deserialize, Serialize, JsonSchema)]
pub struct PeerMetadata {
/// Peer Qdrant version
pub(crate) version: Version,
}
impl PeerMetadata {
pub fn current() -> Self {
Self {
version: defaults::QDRANT_VERSION.clone(),
}
}
/// Whether this metadata has a different version than our current Qdrant instance.
pub fn is_different_version(&self) -> bool {
self.version != *defaults::QDRANT_VERSION
}
}
Source: lib/collection/src/operations/types.rs:1-40
Replica Set State Machine
Each shard in Qdrant is replicated across multiple nodes as part of a replica set. Replica sets implement a state machine that manages shard lifecycle and handles various scenarios like transfers, failures, and recovery.
Shard Roles
| Role | Description |
|---|---|
| Active | Fully operational replica, accepts read and write operations |
| Listener | Read-only replica used for scaling read operations |
| Dead | Replica that is unreachable or failed |
State Transitions
graph TD
Initializing -->|Report created| Active
Active -->|User Promote| Active
Active -->|Transfer Finished| Listener
Active -->|Update Failure| Dead
Active -->|Transfer Started| Partial
Partial -->|Transfer Finished| Listener
Partial -->|Transfer Started| Dead
Listener -->|Update Failure| Dead
Listener -->|Transfer| Partial
Dead -->|Transfer| PartialThe state machine handles:
- Initialization: New replicas start in
Initializingstate - Activation: Replicas become
Activeafter synchronization - Demotion:
Activeshards can be demoted toListenerafter transfers - Failure Handling:
Deadstate marks unreachable replicas - Recovery:
Deadreplicas can be recovered through transfers
Source: lib/collection/src/shards/replica_set/mod.rs:1-50
Read Consistency
Qdrant provides configurable read consistency levels to balance between consistency and availability. These settings control how many replicas must respond before returning results.
Consistency Types
| Type | Behavior |
|---|---|
Majority | Send N/2+1 random requests, return points present on all responses |
Quorum | Send requests to all nodes, return points present on majority |
All | Send requests to all nodes, return only points present on all nodes |
#[derive(Debug, Deserialize, Serialize, JsonSchema, Copy, Clone, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum ReadConsistencyType {
// send N/2+1 random request and return points, which present on all of them
Majority,
// send requests to all nodes and return points which present on majority of nodes
Quorum,
// send requests to all nodes and return points which present on all nodes
All,
}
Source: lib/collection/src/operations/consistency_params.rs:1-35
Consistency Parameter Mapping
The gRPC protocol maps integer values to consistency types:
impl TryFrom<i32> for ReadConsistencyType {
type Error = tonic::Status;
fn try_from(consistency: i32) -> Result<Self, Self::Error> {
let consistency = ReadConsistencyTypeGrpc::try_from(consistency).map_err(|_| {
tonic::Status::invalid_argument(format!(
"invalid read consistency type value {consistency}",
))
})?;
Ok(consistency.into())
}
}
Cluster Telemetry
The cluster telemetry system provides visibility into the distributed state of the system.
Cluster Info API
The REST API exposes cluster information through the /collections/{collection_name}/cluster endpoint:
/collections/{collection_name}/cluster:
get:
tags:
- Distributed
summary: Collection cluster info
description: Get cluster information for a collection
operationId: collection_cluster_info
parameters:
- name: collection_name
in: path
description: Name of the collection to retrieve the cluster info for
required: true
schema:
type: string
responses:
200:
description: Successful response
content:
application/json:
schema:
$ref: "#/components/schemas/CollectionClusterInfo"
Source: openapi/openapi-collections.ytt.yaml:1-80
Cluster Metadata
Cluster telemetry includes:
- Cluster metadata: Distributed cluster configuration and state
- Peer information: Peer IDs and connection states
- Resharding status: Whether resharding operations are enabled
.telemetry_level >= DetailsLevel::Level1)
.then(|| {
dispatcher
.consensus_state()
.map(|state| state.persistent.read().cluster_metadata.clone())
.filter(|metadata| !metadata.is_empty())
})
.flatten(),
resharding_enabled: Some(settings.cluster.resharding_enabled),
Source: src/common/telemetry_ops/cluster_telemetry.rs:1-30
Write-Ahead Logging for Consensus
Consensus operations are persisted to a dedicated WAL (Write-Ahead Log) to ensure durability and crash recovery. The Consensus WAL differs from the segment WAL used for point operations.
WAL Entry Serialization
Consensus operations are serialized using CBOR for efficiency:
impl TryFrom<&RaftEntry> for ConsensusOperations {
type Error = serde_cbor::Error;
fn try_from(entry: &RaftEntry) -> Result<Self, Self::Error> {
serde_cbor::from_slice(entry.get_data())
}
}
Operation Abort Mechanism
The consensus layer provides methods to safely abort ongoing operations:
impl ConsensusOperations {
pub fn abort_transfer(
collection_id: CollectionId,
transfer: ShardTransfer,
reason: &str,
) -> Self {
ConsensusOperations::CollectionMeta(Box::new(
CollectionMetaOperations::TransferShard(
collection_id,
ShardTransferOperations::Abort {
transfer: transfer.key(),
reason: reason.to_string(),
},
),
))
}
pub fn finish_transfer(collection_id: CollectionId, transfer: ShardTransfer) -> Self {
ConsensusOperations::CollectionMeta(Box::new(
CollectionMetaOperations::TransferShard(
collection_id,
ShardTransferOperations::Finish(transfer),
),
))
}
}
Source: lib/storage/src/content_manager/mod.rs:60-90
Snapshot Application
When applying snapshots from other peers, the system must notify pending consensus operations to ensure consistency:
# Bug Fix in v1.18.1
- https://github.com/qdrant/qdrant/pull/8990 - Notify pending consensus ops on snapshot apply
This fix ensures that when a snapshot is applied, any pending consensus operations are properly synchronized to maintain cluster state integrity.
Optimization Progress Tracking
Cluster operations include tracking optimization progress for collections:
/collections/{collection_name}/optimizations:
get:
tags:
- Collections
summary: Get optimization progress
description: Get progress of ongoing and completed optimizations for a collection
operationId: get_optimizations
parameters:
- name: collection_name
in: path
description: Name of the collection
required: true
schema:
type: string
- name: with
in: query
description: |-
Comma-separated list of optional fields to include in the response.
Possible values: queued, completed, idle_segments.
required: false
schema:
type: string
- name: completed_limit
in: query
description: Maximum number of completed optimizations to return.
required: false
schema:
type: integer
minimum: 0
default: 16
responses:
200:
description: Successful response
content:
application/json:
schema:
$ref: "#/components/schemas/OptimizationsResponse"
Source: openapi/openapi-collections.ytt.yaml:80-150
Related Configuration
Key configuration parameters for cluster coordination:
| Parameter | Description | Default |
|---|---|---|
cluster.resharding_enabled | Enable shard resharding operations | Platform-specific |
| Consensus WAL size | Maximum entries in consensus WAL | Platform-specific |
| Snapshot interval | Frequency of consensus snapshots | Platform-specific |
| Heartbeat timeout | Peer heartbeat detection interval | Platform-specific |
See Also
Source: https://github.com/qdrant/qdrant / Human Manual
REST and gRPC API
Related topics: System Architecture, Introduction to Qdrant
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Introduction to Qdrant
REST and gRPC API
Overview
Qdrant provides a dual-layer API architecture that exposes vector search capabilities through both REST (HTTP/JSON) and gRPC (Protocol Buffers) interfaces. This dual approach offers flexibility for different client environments and performance requirements.
The REST API provides broad accessibility with JSON serialization, making it ideal for web clients, scripting, and debugging. The gRPC API offers lower latency and more efficient bandwidth usage for high-throughput production workloads.
Source: lib/api/src/lib.rs:1-7
pub mod conversions;
pub mod grpc;
pub mod rest;
pub const HTTP_HEADER_API_KEY: &str = "api-key";
API Architecture
Dual-Interface Design
The Qdrant API layer is organized around a unified internal representation, with conversion layers that transform between external formats and internal types.
graph TD
subgraph "Client Layer"
REST_CLIENT[REST Client<br/>JSON/HTTP]
GRPC_CLIENT[gRPC Client<br/>Protocol Buffers]
end
subgraph "API Layer"
REST_API[REST API Handler]
GRPC_API[gRPC Service Handler]
CONVERSIONS[Conversion Layer]
end
subgraph "Internal Layer"
INFERENCE[Inference Service]
OPERATIONS[Collection Operations]
SEGMENT[Segment Management]
end
REST_CLIENT -->|HTTP/JSON| REST_API
GRPC_CLIENT -->|gRPC/Protobuf| GRPC_API
REST_API --> CONVERSIONS
GRPC_API --> CONVERSIONS
CONVERSIONS --> INFERENCE
CONVERSIONS --> OPERATIONS
CONVERSIONS --> SEGMENTModule Structure
The API implementation resides in lib/api/src/ with the following organization:
| Module | Purpose |
|---|---|
rest/ | REST API models, handlers, and JSON serialization |
grpc/ | gRPC service definitions and Protocol Buffer types |
conversions/ | Bidirectional conversion between REST/gRPC and internal types |
Source: lib/api/src/lib.rs:1-7
REST API
OpenAPI Specification
The REST API is defined using OpenAPI 3.0 specifications, generated from YAML templates in the openapi/ directory. These specifications provide comprehensive documentation and can be used to generate client SDKs.
Source: openapi/openapi.lib.yml:1-50
Core Endpoints
The REST API covers several functional areas:
#### Points Operations
Points are the fundamental data units in Qdrant, containing vectors and optional payloads.
| Endpoint | Method | Description |
|---|---|---|
/collections/{name}/points/query | POST | Universal query endpoint |
/collections/{name}/points/query/batch | POST | Batch query endpoint |
/collections/{name}/points/batch | POST | Batch update operations |
/collections/{name}/points/payload/clear | POST | Clear payload from points |
Source: openapi/openapi-points.ytt.yaml:1-100
#### Query Operations
The universal query endpoint provides access to all search capabilities including search, recommend, discover, and hybrid queries.
/collections/{collection_name}/points/query:
post:
tags:
- Search
summary: Query points
description: Universally query points. This endpoint covers all capabilities
of search, recommend, discover, filters. But also enables
hybrid and multi-stage queries.
Source: openapi/openapi-main.ytt.yaml:1-50
#### Facet Operations
Faceted search allows counting points by unique payload values:
| Parameter | Location | Description |
|---|---|---|
collection_name | path | Name of the collection to facet in |
consistency | query | Read consistency guarantees |
timeout | query | Request timeout override (seconds) |
Source: openapi/openapi-main.ytt.yaml:50-100
Document and Image Support
The REST API supports structured inference objects through the Document and Image types:
impl From<rest::Document> for grpc::Document {
fn from(document: rest::Document) -> Self {
let rest::Document {
text,
model,
options,
} = document;
Self {
text,
model,
options: options
.map(DocumentOptions::into_options)
.map(dict_to_proto)
.unwrap_or_default(),
}
}
}
Source: lib/api/src/conversions/inference.rs:1-35
gRPC API
Service Architecture
The gRPC API is built on Protocol Buffers with service definitions that mirror REST functionality. The telemetry_wrapper.rs module provides thin wrappers around gRPC service traits that extract collection names and attach telemetry extensions.
use api::grpc::qdrant::points_server::Points;
use api::grpc::qdrant::shard_snapshots_server::ShardSnapshots;
use api::grpc::qdrant::snapshots_server::Snapshots;
Source: src/tonic/api/telemetry_wrapper.rs:1-20
Server Implementations
The gRPC API exposes the following services:
| Service | Purpose |
|---|---|
Points | Point operations, search, query |
Snapshots | Collection snapshot management |
ShardSnapshots | Shard-level snapshot operations |
Raft | Consensus communication |
Source: src/tonic/api/telemetry_wrapper.rs:1-50
Telemetry Integration
The telemetry wrapper pattern allows per-collection metrics collection without individual handlers needing to know about telemetry:
graph LR
A[gRPC Request] --> B[Telemetry Wrapper]
B --> C[Extract collection_name]
C --> D[Attach as Extension]
D --> E[Service Handler]
E --> F[Tower Layer Reads Extension]
F --> G[Record Metrics]Source: src/tonic/api/telemetry_wrapper.rs:1-30
Conversion Layer
Conversion Architecture
The conversion layer handles bidirectional transformations between REST JSON types, gRPC Protocol Buffer types, and internal Rust types. This separation allows the internal representation to remain stable while external APIs evolve.
graph TD
subgraph "External Formats"
REST_JSON[REST JSON]
GRPC_PROTO[gRPC Protobuf]
end
subgraph "Conversion Layer"
REST_TO_INTERNAL[rest → Internal]
INTERNAL_TO_REST[Internal → rest]
GRPC_TO_INTERNAL[gRPC → Internal]
INTERNAL_TO_GRPC[Internal → gRPC]
end
subgraph "Internal Types"
INTERNAL[Internal Operations]
end
REST_JSON --> REST_TO_INTERNAL
GRPC_PROTO --> GRPC_TO_INTERNAL
REST_TO_INTERNAL --> INTERNAL
GRPC_TO_INTERNAL --> INTERNAL
INTERNAL --> INTERNAL_TO_REST
INTERNAL --> INTERNAL_TO_GRPC
INTERNAL_TO_REST --> REST_JSON
INTERNAL_TO_GRPC --> GRPC_PROTOQuery Request Conversions
Query requests undergo multiple conversion stages:
use collection::operations::universal_query::collection_query::{
CollectionPrefetch, CollectionQueryGroupsRequest, CollectionQueryRequest,
FeedbackInternal, FeedbackStrategy, Mmr, NearestWithMmr, Query,
VectorInputInternal, VectorQuery,
};
Source: src/common/inference/query_requests_grpc.rs:1-25
Batch Processing
The batch processing system accumulates inference objects across multiple requests:
pub struct BatchAccumGrpc {
pub(crate) objects: HashSet<InferenceData>,
}
impl BatchAccumGrpc {
pub fn new() -> Self {
Self {
objects: HashSet::new(),
}
}
pub fn add(&mut self, data: InferenceData) {
self.objects.insert(data);
}
pub fn extend(&mut self, other: BatchAccumGrpc) {
self.objects.extend(other.objects);
}
}
Source: src/common/inference/batch_processing_grpc.rs:1-55
Operation Conversions
The conversions.rs module handles complex operation conversions, such as the DiscoverRequest:
let api::grpc::qdrant::DiscoverPoints {
collection_name,
target,
context,
filter,
limit,
offset,
with_payload,
params,
using,
with_vectors,
lookup_from,
read_consistency,
timeout,
shard_key_selector,
} = value;
let target = target.map(RecommendExample::try_from).transpose()?;
let context = context
.into_iter()
.map(|pair| {
match (
pair.positive.map(|p| p.try_into()),
pair.negative.map(|n| n.try_into()),
) {
(Some(Ok(positive)), Some(Ok(negative))) => {
Ok(ContextExamplePair { positive, negative })
}
(Some(Err(e)), _) | (_, Some(Err(e))) => Err(e),
(None, _) | (_, None) => Err(Status::invalid_argument(
"Both positive and negative are required in a context pair",
)),
}
})
.try_collect()?;
Source: lib/collection/src/operations/conversions.rs:1-80
Request Flow
REST Request Path
- HTTP request arrives at Actix web handler
- Request body deserialized from JSON
- REST model converted to internal operation type
- Operation executed against collection/shard
- Internal result converted back to REST response
- Response serialized to JSON and returned
gRPC Request Path
- Protobuf message received by Tonic service
- Message validated using protobuf validation attributes
- gRPC model converted to internal operation type
- Operation executed with telemetry extension attached
- Internal result converted back to gRPC response
- Protobuf message serialized and returned
Configuration and Constraints
The API respects system constraints defined in StrictModeConfig:
| Parameter | Purpose |
|---|---|
max_query_limit | Maximum number of results |
max_timeout | Maximum request timeout |
search_max_hnsw_ef | Maximum HNSW ef parameter |
search_allow_exact | Allow exact match searches |
search_max_oversampling | Maximum oversampling factor |
upsert_max_batchsize | Maximum upsert batch size |
search_max_batchsize | Maximum search batch size |
Source: lib/storage/src/content_manager/conversions.rs:1-60
Authentication
The REST API uses an API key header for authentication:
pub const HTTP_HEADER_API_KEY: &str = "api-key";
Clients must include this header in all requests to authenticated endpoints.
Source: lib/api/src/lib.rs:7
Build Process
The API layer is generated during the build process using build.rs:
// Fetch git commit ID and pass it to the compiler
let git_commit_id = option_env!("GIT_COMMIT_ID").map(String::from).or_else(|| {
match Command::new("git").args(["rev-parse", "HEAD"]).output() {
Ok(output) if output.status.success() => {
Some(str::from_utf8(&output.stdout).unwrap().trim().to_string())
}
_ => {
println!("cargo:warning=current git commit hash could not be determined");
None
}
}
});
Source: lib/api/build.rs:1-50
Key Files Reference
| Path | Purpose |
|---|---|
lib/api/src/lib.rs | API module entry point |
lib/api/src/rest/mod.rs | REST API implementation |
lib/api/src/grpc/mod.rs | gRPC service definitions |
lib/api/src/conversions/*.rs | Type conversion implementations |
src/actix/api/mod.rs | Actix HTTP handlers |
src/tonic/api/mod.rs | Tonic gRPC handlers |
openapi/*.ytt.yaml | OpenAPI specifications |
Source: https://github.com/qdrant/qdrant / Human Manual
Storage Engine and Persistence
Related topics: Vector Storage
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Vector Storage
Storage Engine and Persistence
Qdrant implements a multi-layered storage architecture designed for high-throughput vector search operations with strong persistence guarantees. The storage system combines a custom key-value store called Gridstore, memory-mapped file handling, and Write-Ahead Logging to ensure data durability while maintaining low-latency access patterns.
Architecture Overview
The storage engine consists of three primary layers:
| Layer | Purpose | Location |
|---|---|---|
| Gridstore | Primary key-value storage for vectors, payloads, and indexes | lib/gridstore/ |
| Write-Ahead Log (WAL) | Transaction logging and crash recovery | lib/wal/ |
| Memory-Mapped Files | Efficient file I/O with OS page cache integration | lib/common/common/src/mmap/ |
graph TD
subgraph "Client Operations"
A[Upsert] --> B[Write-Ahead Log]
A --> C[Collection Config]
end
subgraph "Persistence Layer"
B --> D[Gridstore]
D --> E[Memory-Mapped Pages]
E --> F[Disk Files]
end
subgraph "Recovery"
G[Startup] --> H[Replay WAL]
H --> D
end
subgraph "Query Path"
I[Search Query] --> D
D --> J[Mmap Populate]
J --> K[Result Scoring]
endGridstore: Primary Storage Engine
Gridstore is Qdrant's custom-built key-value storage engine optimized for vector data and payloads. It replaces the previous RocksDB-based storage starting from v1.16.1, offering improved performance and simplified maintenance.
Storage Hierarchy
Gridstore organizes data using a three-level hierarchy:
| Level | Default Size | Description |
|---|---|---|
| Block | 128 bytes | Smallest allocatable unit |
| Region | 8192 blocks (1 MB) | Unit of free space tracking |
| Page | 32 MB | OS I/O unit, file on disk |
Source: lib/gridstore/src/config.rs:15-33
graph LR
subgraph "Page (32MB)"
subgraph "Region 0 (1MB)"
B1[Block 0]
B2[Block 1]
B3[...]
B4[Block 8191]
end
subgraph "Region 1 (1MB)"
B5[Block 0]
B6[...]
end
subgraph "..."
B7[...]
end
endConfiguration
Gridstore accepts configuration via StorageOptions:
| Parameter | Default | Description |
|---|---|---|
page_size_bytes | 32 MB | Size of each page file |
block_size_bytes | 128 bytes | Size of individual blocks |
region_size_blocks | 8192 | Number of blocks per region |
compression | LZ4 | Compression algorithm |
Source: lib/gridstore/src/config.rs:10-35
// Configuration validation from config.rs
fn try_from(options: StorageOptions) -> Result<Self, Self::Error> {
// ...
if block_size_bytes == 0 {
return Err("Block size must be greater than 0");
}
if region_size_blocks == 0 {
return Err("Region size must be greater than 0");
}
if page_size_bytes == 0 {
return Err("Page size must be greater than 0");
}
let region_size_bytes = block_size_bytes * region_size_blocks;
if page_size_bytes < region_size_bytes {
return Err("Page size must be greater than region size");
}
}
Free Space Management: Bitmask
Gridstore tracks free blocks using bitmasks stored in memory and persisted to disk.
Source: lib/gridstore/src/bitmask/mod.rs:25-30
The bitmask system works as follows:
| Component | Description |
|---|---|
| Page Bitmask | One bit per block (128 bytes), stored as StoredBitSlice |
| RegionGaps | Tracks which blocks are free within each region |
| BitmaskGaps | Manages all region gaps for a page |
// Bitmask length calculation
let bits = config.page_size_bytes / config.block_size_bytes; // blocks per page
let length = bits / u8::BITS as usize; // bytes needed for bitmask
Write Operations
Gridstore implements multi-page writes when values exceed single-page capacity:
Source: lib/gridstore/src/pages.rs:85-105
pub fn write_to_pages(
&mut self,
pointer: ValuePointer,
value: &[u8],
config: &StorageConfig,
) -> Result<()> {
let writes = Self::get_page_value_ranges(pointer, config)
.map(|(buf_offset, page, range)| {
let data = &value[buf_offset..buf_offset + range.length as usize];
(page as FileIndex, range.byte_offset, data)
});
// Execute writes to multiple pages if needed
S::write_multi(self.pages.as_mut_slice(), writes)?;
Ok(())
}
Values spanning multiple pages are handled by the ValuePointer struct, which tracks:
page_id: Starting pageblock_offset: Starting block within pagelength: Total byte length of value
Non-Blocking Flushes
Gridstore implements a flusher mechanism that defers disk synchronization:
Source: lib/gridstore/src/pages.rs:107-117
pub fn flusher(&self) -> Flusher {
let mut flushers = Vec::with_capacity(self.pages.len());
for page in &self.pages {
flushers.push(page.flusher());
}
Box::new(move || {
for flusher in flushers {
flusher()?;
}
Ok(())
})
}
This design was improved in v1.17.1 to make flushes non-blocking, reducing tail latencies during search operations.
Live Reload Support
Gridstore readers support live reload to access newly written data without reopening:
Source: lib/gridstore/src/gridstore/tests.rs:180-220
// Writer creates new pages
for i in first_batch..(first_batch + second_batch) {
storage.put_value(i, &payload, hw_counter_ref).unwrap();
}
storage.flusher()().unwrap();
// Reader detects new pages automatically
reader.live_reload().unwrap();
assert_eq!(reader.max_point_offset(), first_batch + second_batch);
Memory-Mapped File Handling
Qdrant uses memory-mapped files extensively for efficient I/O. The mmap/advice.rs module provides platform-specific optimizations:
Page Population
On Linux, Qdrant uses madvise(MADV_POPULATE_READ) to proactively populate the page cache before reads:
Source: lib/common/common/src/mmap/advice.rs:45-55
pub fn populate(&self) -> OperationResult<()> {
self.storage.populate()?;
Ok(())
}
Cache Management
The system provides explicit cache control:
| Method | System Call | Purpose |
|---|---|---|
populate() | madvise(MADV_POPULATE_READ) | Pre-populate RAM cache |
clear_cache() | madvise(MADV_PAGEOUT) | Evict pages from RAM |
Source: lib/common/common/src/mmap/advice.rs:55-65
Fallback for Non-Linux Systems
On older Linux kernels or non-Unix platforms, a fallback strategy reads every 512th byte:
Source: lib/common/common/src/mmap/advice.rs:60-75
fn populate_simple(slice: &[u8]) {
black_box(
slice
.iter()
.copied()
.map(Wrapping)
.step_by(512)
.sum::<Wrapping<u8>>(),
);
}
Readahead Optimization
For sequential reads within MADV_RANDOM regions, explicit readahead is triggered:
Source: lib/common/common/src/mmap/advice.rs:80-95
#[cfg(unix)]
pub fn will_need_multiple_pages(region: &[u8]) {
// madvise(MADV_WILLNEED) on region spanning multiple 4KiB pages
// Avoids multiple page faults during sequential access
}
Collection Configuration Persistence
Collection configuration is persisted as JSON using atomic file operations:
Source: lib/collection/src/config.rs:30-55
Save and Load Operations
pub fn save(&self, path: &Path) -> CollectionResult<()> {
let config_path = path.join(COLLECTION_CONFIG_FILE);
let af = AtomicFile::new(&config_path, AllowOverwrite);
let state_bytes = serde_json::to_vec(self).unwrap();
af.write(|f| f.write_all(&state_bytes)).map_err(|err| {
CollectionError::service_error(format!("Can't write {config_path:?}, error: {err}"))
})?;
Ok(())
}
pub fn load(path: &Path) -> CollectionResult<Self> {
let config_path = path.join(COLLECTION_CONFIG_FILE);
let mut contents = String::new();
let mut file = File::open(config_path)?;
file.read_to_string(&mut contents)?;
Ok(serde_json::from_str(&contents)?)
}
Atomic Write Pattern
Configuration updates use AtomicFile to ensure:
- New data is written to a temporary file
- Temporary file is atomically renamed over the old file
- No partial writes are visible to readers
Write-Ahead Logging
Write-Ahead Logging (WAL) ensures durability by logging operations before applying them to the main storage.
Key Properties
| Property | Implementation |
|---|---|
| Durability | Operations logged to WAL before acknowledgment |
| Crash Recovery | WAL replayed on startup to recover uncommitted operations |
| Consistency | Prevents data loss during power failures |
Source: README.md (WAL mentioned in features)
Critical Bug Fix (v1.16.2)
A critical WAL bug was discovered in v1.16.1 that could break consensus or cause data inconsistency. This was fixed in PR #7674.
Snapshot and Backup
Qdrant supports collection and storage snapshots for backup and migration.
Snapshot Features
| Feature | Description |
|---|---|
| Collection Snapshots | Full point and payload data export |
| Storage Snapshots | Complete system state including WAL |
| Remote Storage | Snapshots can be stored in object storage |
Source: lib/collection/src/collection/snapshots.rs
Performance Characteristics
Recent Improvements
| Version | Improvement | Impact |
|---|---|---|
| v1.18.1 | Validate vector dimensions before WAL write | Faster async upserts |
| v1.17.1 | Non-blocking Gridstore flushes | Reduced tail latencies |
| v1.16.1 | Gridstore migration from RocksDB | 3x faster batch queries |
Async I/O with io_uring
Qdrant leverages Linux's io_uring interface for asynchronous disk operations, maximizing throughput on network-attached storage.
Source: README.md
Related Modules
The storage engine integrates with these core modules:
graph TD
SG[Segment] --> GS[Gridstore]
SG --> WAL[Write-Ahead Log]
WAL --> GS
GC[Collection] --> SG
GC --> CFG[Collection Config]
CFG -.->|Atomic Write| DISK[Disk Files]
GS -.->|Mmap| DISK| Module | File | Role |
|---|---|---|
| Segment | lib/segment/src/lib.rs | Core indexing and search logic |
| Collection | lib/collection/ | Collection management |
| Gridstore | lib/gridstore/ | Key-value persistence |
| WAL | lib/wal/ | Transaction logging |
Known Issues
Flaky Quantized Search Tests
Several HNSW quantized search tests exhibit flakiness when scoring assertions fail:
hnsw_turbo_quantization_cosine_larger_bits2_test(Issue #8835)hnsw_turbo_quantization_cosine_larger_test(Issue #8801)hnsw_quantized_search_manhattan_test(Issue #8806)hnsw_quantized_search_euclid_test(Issue #8735)hnsw_turbo_quantization_dot_test(Issue #8906)hnsw_turbo_quantization_manhattan_test(Issue #8834)
These tests fail with assertion failed: best_2.score >= best_1.score, indicating edge cases in quantization scoring. Related: Issue #8704 for discover precision tests.
Source: https://github.com/qdrant/qdrant / Human Manual
Data Flow and Update Pipeline
Related topics: System Architecture, Storage Engine and Persistence
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture, Storage Engine and Persistence
Data Flow and Update Pipeline
This document describes how data flows through Qdrant from client requests to persisted storage, covering the update pipeline architecture, data structures, and key components involved in processing point operations.
Overview
Qdrant's update pipeline handles the lifecycle of vector data from insertion through persistence. The system uses a multi-layered approach combining Write-Ahead Logging (WAL), memory-mapped storage (Gridstore), and segment-based organization to ensure data durability while maintaining high throughput.
Source: lib/segment/src/lib.rs:1-18
Core Data Structures
Point Operations
The system supports multiple point operation types through the PointOperations enum:
pub enum PointOperations {
UpsertPoints(PointInsertOperationsInternal),
Upsert,
DeletePoints,
DeletePointsByFilter,
SetPayload,
OverwritePayload,
DeletePayload,
ClearPayload,
UpdateBatch,
}
Source: lib/shard/src/operations/point_ops.rs:1-50
Value Pointer System
The ValuePointer struct tracks where data is stored within the Gridstore pages:
pub struct ValuePointer {
pub page_id: PageId,
pub block_offset: BlockOffset,
pub length: u32,
}
Source: lib/gridstore/src/tracker.rs:1-20
The PointerUpdates structure manages pointer lifecycle during updates:
pub(crate) struct PointerUpdates {
current: Option<ValuePointer>,
to_free: Vec<ValuePointer>,
}
Source: lib/gridstore/src/tracker.rs:30-50
Generalizer Trait
The Generalizer trait provides an interface for removing vectors and payloads from structures, creating lightweight copies for generalizing requests:
pub trait Generalizer {
fn remove_details(&self) -> Self;
}
Source: lib/collection/src/operations/generalizer/mod.rs:1-20
Write-Ahead Logging
Qdrant implements Write-Ahead Logging to ensure data persistence even during power outages. Before any update is applied to segments, it is first recorded in the WAL.
WAL Workflow
graph TD
A[Client Request] --> B[Parse PointOperations]
B --> C[Write to WAL]
C --> D[Update In-Memory Structures]
D --> E[Acknowledge to Client]
E --> F[Background Flush to Gridstore]
F --> G[Mark WAL Entries as Persisted]WAL Bug Fix (v1.16.2)
Version v1.16.2 addressed a critical WAL bug that could break consensus or cause data inconsistency. This fix ensured that WAL entries are properly synchronized with segment updates.
Source: v1.16.2 Release Notes
Gridstore Architecture
Gridstore is Qdrant's storage layer that provides memory-mapped file access with non-blocking flush operations.
Page-Based Storage
Gridstore divides storage into fixed-size pages, with each value identified by a ValuePointer. When a value spans multiple pages, the system tracks the ranges across consecutive pages.
Source: lib/gridstore/src/pages.rs:1-30
Page Writing Mechanism
The write_to_pages method handles multi-page writes:
pub fn write_to_pages(
&mut self,
pointer: ValuePointer,
value: &[u8],
config: &StorageConfig,
) -> Result<()> {
let writes = Self::get_page_value_ranges(pointer, config)
.map(|(buf_offset, page, range)| {
let data = &value[buf_offset..buf_offset + range.length as usize];
(page as FileIndex, range.byte_offset, data)
});
S::write_multi(self.pages.as_mut_slice(), writes)?;
Ok(())
}
Source: lib/gridstore/src/pages.rs:50-75
Non-Blocking Flushes
Gridstore implements non-blocking flushes to reduce search tail latencies, introduced in v1.17.1. The flusher mechanism aggregates page flushers:
pub fn flusher(&self) -> Flusher {
let mut flushers = Vec::with_capacity(self.pages.len());
for page in &self.pages {
flushers.push(page.flusher());
}
Box::new(move || {
for flusher in flushers {
flusher()?;
}
Ok(())
})
}
Source: lib/gridstore/src/pages.rs:80-95
Update Pipeline Flow
graph TD
A[UpsertPoints Request] --> B[Vector Dimension Validation]
B --> C[Write to WAL]
C --> D[Update Tracker]
D --> E[Apply to Memory Structures]
E --> F[Acknowledge Client]
F --> G[Background Optimization]
G --> H[Gridstore Flush]
H --> I[Segment Compaction]Async Upsert Validation (v1.18.1)
Vector dimensions are validated before WAL write for async upserts, ensuring early detection of invalid data:
Source: v1.18.1 Release Notes
Deferred Point Updates (v1.17.1)
With prevent_unoptimized=true, point updates can be deferred and efficiently applied during optimization phases:
Source: v1.17.1 Release Notes
Collection Configuration
Collection configuration is persisted to disk and includes metadata, schema, and optimization settings:
pub struct CollectionConfigInternal {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub metadata: Option<Payload>,
}
Source: lib/collection/src/config.rs:1-30
Configuration Persistence
Configuration is saved atomically using AtomicFile:
pub fn save(&self, path: &Path) -> CollectionResult<()> {
let config_path = path.join(COLLECTION_CONFIG_FILE);
let af = AtomicFile::new(&config_path, AllowOverwrite);
let state_bytes = serde_json::to_vec(self).unwrap();
af.write(|f| f.write_all(&state_bytes)).map_err(|err| {
CollectionError::service_error(format!("Can't write {config_path:?}, error: {err}"))
})?;
Ok(())
}
Source: lib/collection/src/config.rs:30-55
Segment Operations
Data Consistency Checking
The check_data_consistency method validates segment integrity:
- Internal ID without external ID
- External ID without internal ID
- Internal ID without version
- Internal ID without vector
Source: lib/segment/src/segment/segment_ops.rs:1-50
Payload Index Management
Segment operations handle payload index creation and recreation when configuration changes:
for (key, schema) in schema_config {
match schema_applied.get(key) {
Some(existing_schema) if existing_schema == schema => continue,
Some(existing_schema) => log::warn!(
"Segment has incorrect payload index for {key}, recreating it now"
),
None => log::warn!(
"Segment is missing payload index for {key}, creating it now"
),
}
let created = self.create_field_index(...)?;
}
Source: lib/segment/src/segment/segment_ops.rs:50-80
Memory Management
Memory-Mapped File Advice
Qdrant uses madvise system calls for memory management on Unix platforms:
#[cfg(unix)]
pub fn will_need_multiple_pages(region: &[u8]) {
let Some(page_mask) = page_size().map(|s| s - 1) else { return };
// Trigger readahead for memory-mapped regions
}
Source: lib/common/common/src/mmap/advice.rs:1-30
Page Cache Population
On older Linux systems, page cache population uses a step-by-step approach:
fn populate_simple(slice: &[u8]) {
black_box(
slice
.iter()
.copied()
.map(Wrapping)
.step_by(512)
.sum::<Wrapping<u8>>(),
);
}
Source: lib/common/common/src/mmap/advice.rs:40-55
API Layer
REST Schema Processing
The REST API layer uses unagged enums for flexible deserialization:
#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema)]
#[serde(untagged, rename_all = "snake_case")]
pub enum DocumentOptions {
Common(HashMap<String, JsonValue>),
Bm25(Bm25Config),
}
Source: lib/api/src/rest/schema.rs:1-25
Point Operations Conversion
The conversion layer transforms between API representations:
pub fn try_points_selector_from_grpc(
value: api::grpc::qdrant::PointsSelector,
shard_key_selector: Option<api::grpc::qdrant::ShardKeySelector>,
) -> Result<PointsSelector, Status>
Source: lib/collection/src/operations/conversions.rs:1-50
Known Issues and Community Topics
Related Community Issues
| Issue | Topic | Status |
|---|---|---|
| #2550 | Delete vectors for deleted points | Open |
| #1132 | Adding new vector field after collection creation | Open |
Flaky Tests
Several HNSW quantization tests show intermittent failures related to score ordering:
hnsw_turbo_quantization_cosine_larger_bits2_test(Issue #8835)hnsw_turbo_quantization_cosine_larger_test(Issue #8801)hnsw_quantized_search_manhattan_test(Issue #8806)hnsw_quantized_search_euclid_test(Issue #8735)
These tests fail with assertion failed: best_2.score >= best_1.score in the quantized search path.
Summary
The Qdrant update pipeline implements a robust data flow architecture:
- Request Reception: Point operations received via REST/gRPC API
- Validation: Vector dimensions and payload schema validation
- WAL Persistence: Atomic write to Write-Ahead Log
- Memory Update: In-memory structures updated immediately
- Client Acknowledgment: Fast response after WAL write
- Background Processing: Gridstore flush and segment optimization
- Consistency Verification: Periodic data consistency checks
This architecture ensures durability through WAL, performance through non-blocking operations, and consistency through validation and verification stages.
Source: https://github.com/qdrant/qdrant / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: identity.distribution | github_repo:268163609 | https://github.com/qdrant/qdrant
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:268163609 | https://github.com/qdrant/qdrant
3. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: packet_text.keyword_scan | github_repo:268163609 | https://github.com/qdrant/qdrant
4. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | github_repo:268163609 | https://github.com/qdrant/qdrant
6. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | github_repo:268163609 | https://github.com/qdrant/qdrant
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant
8. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using qdrant with real data or production workflows.
- Flaky test `hnsw_quantized_search_test::hnsw_turbo_quantization_cosine_l - github / github_issue
- Flaky test `hnsw_quantized_search_test::hnsw_turbo_quantization_cosine_l - github / github_issue
- Flaky test `hnsw_quantized_search_test::hnsw_quantized_search_manhattan_ - github / github_issue
- Flaky test `hnsw_quantized_search_test::hnsw_quantized_search_euclid_tes - github / github_issue
- Flaky test `hnsw_quantized_search_test::hnsw_turbo_quantization_dot_test - github / github_issue
- Flaky test `hnsw_quantized_search_test::hnsw_turbo_quantization_manhatta - github / github_issue
- Flaky test
hnsw_discover_test::filtered_hnsw_discover_precision- github / github_issue - v1.18.1 - github / github_release
- v1.18.0 - github / github_release
- v1.17.1 - github / github_release
- Installation risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence