Doramagic Project Pack · Human Manual

qdrant

Qdrant's architecture follows a modular design with clear separation between storage, indexing, querying, and distributed coordination layers.

Introduction to Qdrant

Related topics: System Architecture, REST and gRPC API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Component Responsibilities

Continue reading this section for the full explanation and source context.

Section Qdrant Server

Continue reading this section for the full explanation and source context.

Section Qdrant Edge

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, REST and gRPC API

Introduction to Qdrant

Qdrant is an open-source vector similarity search engine written in Rust, designed for high-performance nearest neighbor search in high-dimensional vector spaces. It serves as the core engine for AI applications requiring semantic search, recommendation systems, anomaly detection, and retrieval-augmented generation (RAG).

Architecture Overview

Qdrant's architecture follows a modular design with clear separation between storage, indexing, querying, and distributed coordination layers.

graph TD
    subgraph "Client Layer"
        REST[REST API]
        gRPC[gRPC API]
    end
    
    subgraph "Core Engine"
        API[lib/api]
        COLLECTION[lib/collection]
    end
    
    subgraph "Storage Layer"
        SHARD[lib/shard]
        SEGMENT[lib/segment]
        GRID[lib/gridstore]
    end
    
    subgraph "Common Utilities"
        COMMON[lib/common]
        TRIFIFO[lib/trififo]
    end
    
    REST --> API
    gRPC --> API
    API --> COLLECTION
    COLLECTION --> SHARD
    SHARD --> SEGMENT
    SEGMENT --> GRID
    COLLECTION --> COMMON
    SHARD --> COMMON

High-Level Component Responsibilities

ComponentPurpose
lib/apiREST and gRPC API definitions, request validation, and schema
lib/collectionCollection management, shard coordination, and operations
lib/shardIndividual shard operations, WAL management, and segment holder
lib/segmentVector indexing (HNSW), quantization, and segment data structures
lib/gridstoreMemory-mapped storage engine for persistent data
lib/commonShared utilities: memory management, mmap, CPU detection, rate limiting

Source: lib/segment/src/lib.rs:1-15

Deployment Modes

Qdrant supports two deployment modes to accommodate different use cases.

Qdrant Server

The standard client-server deployment where Qdrant runs as a standalone service. Clients communicate via REST or gRPC APIs over HTTP.

graph LR
    Client1[Python Client]
    Client2[Rust Client]
    Client3[Java Client]
    
    QdrantServer[Qdrant Server<br/>:6333 REST<br/>:6334 gRPC]
    
    Storage[(./storage)]
    
    Client1 --> QdrantServer
    Client2 --> QdrantServer
    Client3 --> QdrantServer
    QdrantServer --> Storage

Source: README.md

Qdrant Edge

A lightweight, in-process vector search engine designed for embedded devices, autonomous systems, and mobile agents. Unlike the server mode, Edge runs inside the application process with local data storage.

from qdrant_edge import Distance, EdgeConfig, EdgeVectorParams, EdgeShard, Point, UpdateOperation

shard = EdgeShard.create("./shard", EdgeConfig(
    vectors={"my-vector": EdgeVectorParams(size=4, distance=Distance.Cosine)}
))
shard.update(UpdateOperation.upsert_points([
    Point(id=1, vector={"my-vector": [0.1, 0.2, 0.3, 0.4]}, payload={"color": "red"})
]))

The Edge variant is built from an amalgamation of core libraries, compiled as a single distributable package.

Source: lib/edge/publish/amalgamate.py

Core Data Structures

Points

The fundamental data unit in Qdrant is a Point, which consists of:

  • ID: Unique identifier for the point
  • Vector(s): One or more dense vectors associated with the point
  • Payload: Optional key-value metadata for filtering and organization
classDiagram
    class Point {
        +id: PointId
        +vectors: Vectors
        +payload: Payload
    }
    
    class Vectors {
        +vectors: Vec~Vector~
        +named: HashMap~String, Vector~
    }
    
    class Payload {
        +fields: HashMap~String, Value~
    }
    
    Point *-- Vectors
    Point *-- Payload

Source: lib/segment/src/data_types/mod.rs

Segments

Segments are the fundamental storage unit within shards. They contain a portion of the collection's points and can be in different states (indexed, raw, or partially optimized).

Segment TypeDescription
IndexedFull HNSW index built, optimized for search
RawNo index, requires full scan for search
IndexingIndex build in progress
MmapMemory-mapped segment for memory-efficient access

Source: lib/shard/src/lib.rs:1-30

HNSW Index

Qdrant uses Hierarchical Navigable Small World (HNSW) graphs as the primary index structure for approximate nearest neighbor (ANN) search.

Key HNSW parameters:

ParameterDescriptionImpact
mNumber of bi-directional links per nodeMemory usage, recall
ef_constructionSearch width during index buildBuild time, recall
efSearch width during querySearch speed, recall
full_scan_thresholdPoint count threshold for switching to brute forceSmall dataset optimization

Quantization

Qdrant supports multiple quantization strategies to reduce memory footprint and improve search speed:

MethodCompression RatioUse Case
ScalarUp to 4×General purpose, good accuracy
Binary32×High-dimensional vectors (>1024d)
Product Quantization (PQ)ConfigurableLarge datasets, trade-off accuracy
TurboQuant>32×Aggressive compression (ICLR 2026)
Community Note: TurboQuant is an emerging feature (see GitHub Issue #8670) that addresses limitations with existing quantization methods. Current quantization options don't provide an optimal path for aggressive compression without significant accuracy trade-offs. Source: Community Context - Issue #8524

Configuration

Qdrant behavior is controlled via config.yaml. Key configuration sections include:

storage:
  storage_path: ./storage
  snapshots_path: ./snapshots
  on_disk_payload: true  # Keep payloads on disk to save RAM

telemetry:
  # Telemetry collection settings

Source: config/config.yaml

Storage Settings

SettingTypeDefaultDescription
storage_pathstring./storagePrimary data directory
snapshots_pathstring./snapshotsSnapshot storage location
on_disk_payloadbooleantrueKeep payloads on disk
temp_pathstringnullTemporary file storage

Collection Operations

Collections are top-level organizational units that group related points. The collection module (lib/collection) handles:

  • Collection creation and deletion
  • Shard distribution and replication
  • Operation routing and coordination
  • Collection state management
graph TD
    CreateCollection[Create Collection] --> DefineSchema[Define Schema<br/>Vector params, indexes]
    DefineSchema --> DistributeShards[Distribute Shards]
    DistributeShards --> InitializeWAL[Initialize WAL]
    InitializeWAL --> Ready[Collection Ready]

Source: lib/collection/src/lib.rs:1-25

Query Operations

Qdrant provides multiple query types:

OperationDescription
SearchFind nearest vectors by similarity
RecommendFind similar to given points
DiscoverExplore in direction from given points
ScrollIterate through points sequentially
CountCount matching points
FacetGroup and count by field values
FilterApply payload-based filters

Relevance Feedback

Introduced in v1.17.0, relevance feedback allows improving search results based on user interactions, enabling continuous learning from user behavior.

Source: Community Context - v1.17.0 Release

Client Libraries

Qdrant provides official and community client libraries:

LanguageRepository
Pythonqdrant-client
RustBuilt-in (qdrant crate)
TypeScript/JSqdrant-js
Javajava-client
.NET/C#qdrant-dotnet
PHPqdrant-php (community)

Source: README.md

Known Issues and Limitations

Flaky Tests

The community has reported several flaky tests related to quantized HNSW search, primarily in lib/segment/tests/integration/hnsw_quantized_search_test.rs. These tests occasionally fail with score comparison assertions:

  • hnsw_turbo_quantization_cosine_larger_bits2_test
  • hnsw_turbo_quantization_cosine_larger_test
  • hnsw_quantized_search_manhattan_test
  • hnsw_quantized_search_euclid_test

These are tracked in Issues #8735, #8801, #8834, and #8835.

Feature Requests

Notable community requests include:

  1. Adding new vector fields after collection creation (Issue #1132): Currently, all vector fields must be defined at collection creation time.
  1. Delete vectors for deleted points (Issue #2550): Requests better handling of deleted vectors for optimizer and query planning.
  1. ColBERT/Late Interaction support (Issue #3684): Tracking multi-vector storage integration for late-interaction retrieval models.

Recent Improvements (v1.16 - v1.18)

VersionKey Improvements
v1.18.1Refactored quantized multi-vector scorers for io_uring support, vector dimension validation before WAL write
v1.17.1Non-blocking Gridstore flushes, deferred point updates optimization
v1.17.0Relevance Feedback API, optimization progress reporting
v1.16.3Request timeout handling for telemetry and metrics
v1.16.2Critical WAL bug fix, user agent headers
v1.16.13× faster batch queries, RocksDB to Gridstore migration

Source: Community Context - Release Notes

Contributing

All pull requests must target the dev branch. The master branch is reserved for releases only.

For detailed contribution guidelines, see CONTRIBUTING.md.

Source: https://github.com/qdrant/qdrant / Human Manual

System Architecture

Related topics: Introduction to Qdrant, Data Flow and Update Pipeline, REST and gRPC API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Segment Library (lib/segment)

Continue reading this section for the full explanation and source context.

Section Collection Library (lib/collection)

Continue reading this section for the full explanation and source context.

Section GridStore (lib/gridstore)

Continue reading this section for the full explanation and source context.

Related topics: Introduction to Qdrant, Data Flow and Update Pipeline, REST and gRPC API

System Architecture

Qdrant is a vector similarity search engine designed for high-performance vector search in production environments. The system architecture follows a layered design that separates concerns between API handling, collection management, storage, and core indexing operations. This modular structure enables Qdrant to scale efficiently while supporting diverse deployment scenarios from embedded devices to distributed clusters.

Core Library Structure

The Qdrant system is built upon several foundational libraries that provide the essential functionality for vector search operations.

Segment Library (`lib/segment`)

The segment library is the core indexing and storage engine of Qdrant. It encapsulates all low-level operations related to vector storage, HNSW index construction, and query execution.

lib/segment/src/lib.rs
├── common          # Shared utilities and common types
├── entry           # Entry point abstractions
├── fixtures        # Testing utilities (with `testing` feature)
├── id_tracker      # Internal/external ID mapping
├── index           # HNSW and other index implementations
├── payload_storage # Payload data storage
├── segment         # Core segment implementation
├── segment_constructor # Segment building utilities
├── spaces          # Vector space definitions (cosine, dot, euclidean, etc.)
├── telemetry       # Performance monitoring
├── data_types      # Structured data type definitions
├── json_path       # JSON path parsing for payload queries
├── types           # Core type definitions
├── utils           # General utility functions
└── vector_storage  # Vector storage implementations

The segment library manages the fundamental unit of data organization in Qdrant. Each segment contains a subset of points with their associated vectors and payloads, managed independently for parallel processing during searches.

Collection Library (`lib/collection`)

The collection library provides higher-level abstractions for managing groups of segments and coordinating distributed operations.

lib/collection/src/
├── config.rs           # Collection configuration and state management
└── operations/
    ├── generalizer/mod.rs   # Trait for removing vector details from structures
    ├── count.rs            # Point counting operations
    ├── facet.rs            # Faceted search operations
    ├── matrix.rs           # Matrix-based operations
    ├── points.rs           # Point manipulation
    ├── query.rs            # Query execution
    └── update_persisted.rs # Persistence operations

The Generalizer trait provides an interface for removing vectors and payloads from structures, making them lightweight for transmission and caching. This abstraction is essential for generalizing requests by stripping vector-specific details and replacing payloads with keys and length indications.

GridStore (`lib/gridstore`)

GridStore is Qdrant's custom storage engine designed for high-throughput vector operations with optional compression.

lib/gridstore/src/
├── pages.rs        # Page-based storage management
├── config.rs       # Storage configuration
└── bitmask/mod.rs  # Block allocation bitmask

GridStore uses a hierarchical storage model with configurable page, block, and region sizes. The system defaults are optimized for typical vector workloads:

ParameterDefaultDescription
Page Size32MBSize of each storage page
Block Size128 bytesSmallest allocatable unit
Region Size8192 blocksManagement unit within pages
CompressionLZ4Default compression algorithm

Source: lib/gridstore/src/config.rs

The bitmask system tracks block allocation within pages, with one bit per block. This enables efficient free-space tracking and allocation operations.

Storage Architecture

Memory-Mapped I/O

Qdrant extensively uses memory-mapped files for vector and payload storage, enabling efficient zero-copy data access while leveraging OS page cache for I/O optimization.

graph TD
    A[Memory-Mapped File] --> B[Page Cache]
    B --> C[Disk Storage]
    
    D[Search Query] --> E[Segment]
    E --> F[Vector Storage]
    F --> A
    
    G[madvise MADV_POPULATE_READ] --> H[Readahead Pages]
    H --> B

The system implements several optimization strategies for memory-mapped data:

  • Populate Read (MADV_POPULATE_READ on Linux): Pre-populates the page cache with expected read data before query execution, reducing page fault latency.
  • Readahead Control: Uses will_need_multiple_pages() to trigger coordinated prefetching across multi-page regions, avoiding per-page I/O operations.
  • Sequential Access Hints: Applies MADV_SEQUENTIAL for bulk data loading operations.

Source: lib/common/common/src/mmap/advice.rs

Page-Based Storage Model

The GridStore implementation uses a sophisticated page-based storage architecture:

graph TD
    A[ValuePointer] --> B[Page ID]
    A --> C[Block Offset]
    A --> D[Length]
    
    E[Pages Manager] --> F[Page 1]
    E --> G[Page 2]
    E --> H[Page N]
    
    F --> I[Region 1]
    F --> J[Region 2]
    I --> K[Block 0..N]

Values spanning multiple pages are handled through range-based writes, where the system calculates page boundaries and offset ranges for each affected page. This enables efficient storage of variable-length vectors across page boundaries.

Source: lib/gridstore/src/pages.rs

Segment Architecture

Segment Components

A segment is the fundamental unit of data organization in Qdrant. Each segment manages:

  • ID Tracker: Maps between internal sequential IDs and external point IDs
  • Vector Storage: Stores vector data with configurable quantization
  • Payload Storage: Stores structured payload data with optional indexing
  • Index Structures: HNSW graphs and payload indexes
graph LR
    A[Segment] --> B[ID Tracker]
    A --> C[Vector Storage]
    A --> D[Payload Storage]
    A --> E[Index Structures]
    
    C --> F[Raw Vectors]
    C --> G[Quantized Vectors]
    
    D --> H[Payload Data]
    D --> I[Field Indexes]

Segment Operations

The segment implementation provides core operations for search and data management:

// Key segment operations from segment_ops.rs
pub fn check_data_consistency(&self) -> OperationResult<()>
pub fn create_field_index(...) -> OperationResult<bool>

Data consistency checking verifies:

  • Internal IDs without external ID mappings
  • External IDs without internal mappings
  • Internal IDs without version information
  • Internal IDs without vector data

Source: lib/segment/src/segment/segment_ops.rs

Data Types Module

The data_types module defines structured data types used throughout the segment layer:

lib/segment/src/data_types/
├── build_index_result.rs    # Index construction results
├── collection_defaults.rs   # Default configuration values
├── facets.rs                # Faceted search data structures
├── groups.rs                # Grouping operations
├── index.rs                 # Index-related types
├── manifest.rs              # Serialization manifests
├── modifier.rs              # Score modifiers
├── named_vectors.rs         # Multi-vector support
├── order_by.rs              # Ordering specifications
├── primitive.rs             # Primitive type wrappers
├── query_context.rs         # Query execution context
├── segment_record.rs        # Record representations
├── tiny_map.rs              # Compact map implementations
├── vector_name_config.rs    # Named vector configuration
└── vectors.rs               # Vector data structures

Source: lib/segment/src/data_types/mod.rs

API Layer Architecture

REST API

The REST API module provides HTTP-based access to Qdrant functionality:

lib/api/src/rest/
├── conversions.rs   # gRPC to REST conversions
├── models.rs        # REST API data models
├── schema.rs        # OpenAPI schema definitions
└── validate.rs      # Request validation

The REST layer handles:

  • JSON serialization/deserialization
  • Schema validation
  • gRPC model conversion
  • OpenAPI documentation generation

OpenAPI Specification

The OpenAPI specifications define the REST API contract using YAML templates with ytt (YAML Templating Tool):

Key API capabilities exposed through REST:

Endpoint CategoryOperations
CollectionsCreate, update, delete, list collections
PointsInsert, update, delete, retrieve points
SearchVector similarity search with filters
QueryUnified query interface combining all search modes
FacetPayload value distribution counts
AliasesCollection alias management

Deployment Models

Full Server Deployment

The complete Qdrant server deployment includes:

  • gRPC API: High-performance binary protocol for internal and client communications
  • REST API: HTTP-based access for web interfaces and cross-platform clients
  • Distributed Coordination: Shard management and consensus for multi-node deployments
  • Optimizer: Background optimization and compaction processes

Embedded Deployment (Qdrant Edge)

Qdrant Edge provides an amalgamated, in-process vector search engine optimized for embedded devices and autonomous systems:

# Build process from amalgamate.py
AMALGAMATION / "Cargo.toml"  # Unified package manifest
AMALGAMATION / "src/lib.rs"   # Re-exports from edge module

The edge variant combines all necessary components into a single library with:

  • No external service dependencies
  • Minimal memory footprint
  • Configurable feature selection
  • Simplified deployment for edge computing scenarios

Source: lib/edge/publish/amalgamate.py

Data Flow Architecture

graph TD
    subgraph "Ingestion Path"
        A[REST/gRPC Request] --> B[API Layer]
        B --> C[Collection Manager]
        C --> D[Segment Constructor]
        D --> E[Write-Ahead Log]
        E --> F[Mutable Segment]
    end
    
    subgraph "Query Path"
        G[Query Request] --> H[Query Planner]
        H --> I[Segment Selector]
        I --> J[Parallel Segment Search]
        J --> K[Result Merger]
        K --> L[Response]
    end
    
    subgraph "Optimization Path"
        M[Optimizer] --> N[Segment Compaction]
        N --> O[Immutable Segments]
        O --> P[Index Merging]
    end
    
    F --> |flush| O
    G --> |routing| C

Collection Configuration

Collections maintain configuration state including:

  • Vector Configuration: Vector dimensions, distance metrics, storage options
  • Optimizers: Background optimization settings
  • Params: HNSW and quantization parameters
  • Metadata: Application-specific information

Configuration is persisted using atomic file operations:

pub fn save(&self, path: &Path) -> CollectionResult<()>
pub fn load(path: &Path) -> CollectionResult<Self>
pub fn check(path: &Path) -> bool

Source: lib/collection/src/config.rs

Key Architectural Patterns

Module Organization

The codebase follows a consistent module structure pattern:

PatternPurposeExample
Feature GatesOptional functionality#[cfg(feature = "testing")]
Re-export ModulesPublic API surfacepub use edge::* in lib.rs
Separation of ConcernsLayer isolationAPI, Collection, Segment, Storage
Trait-based AbstractionsPolymorphismGeneralizer trait for data transformation

Generalizer Pattern

The Generalizer trait enables efficient data transfer by stripping detailed vector information while preserving structural metadata:

pub trait Generalizer {
    fn remove_details(&self) -> Self;
}

This pattern is used for:

  • Caching query results with reduced memory footprint
  • Transmitting metadata without full payloads
  • Cross-shard communication optimization

Source: lib/collection/src/operations/generalizer/mod.rs

For more information on related topics:

Source: https://github.com/qdrant/qdrant / Human Manual

HNSW Index Implementation

Related topics: Vector Storage, Quantization System

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section HnswM Parameter

Continue reading this section for the full explanation and source context.

Section Construction Algorithm

Continue reading this section for the full explanation and source context.

Related topics: Vector Storage, Quantization System

HNSW Index Implementation

The Hierarchical Navigable Small World (HNSW) index is the primary vector similarity search algorithm in Qdrant. It provides fast approximate nearest neighbor (ANN) search with configurable accuracy/speed tradeoffs, supporting multiple distance metrics and quantization strategies.

Architecture Overview

The HNSW implementation in Qdrant is organized as a multi-layer graph structure where each vector is inserted at multiple levels of the hierarchy. The upper layers form a sparse skip list enabling fast traversal, while the bottom layer contains all vectors connected in a dense small-world graph.

graph TD
    subgraph UpperLayers["Upper Layers (L3, L2, L1)"]
        L3_1["Node A"]
        L3_2["Node B"]
        L2_1["Node A"]
        L2_2["Node B"]
        L2_3["Node C"]
        L1_1["Node A"]
        L1_2["Node B"]
        L1_3["Node C"]
        L1_4["Node D"]
    end

    subgraph BottomLayer["Bottom Layer (L0)"]
        BL_A["Node A"]
        BL_B["Node B"]
        BL_C["Node C"]
        BL_D["Node D"]
        BL_E["Node E"]
    end

    L3_1 --> L2_1
    L3_2 --> L2_2
    L2_1 --> L1_1
    L2_2 --> L1_2
    L2_3 --> L1_3
    L2_2 --> L1_4
    L1_1 --> BL_A
    L1_2 --> BL_B
    L1_3 --> BL_C
    L1_4 --> BL_D
    BL_C --> BL_D
    BL_D --> BL_E
    BL_B --> BL_C

Core Components

ComponentFilePurpose
HNSWIndexhnsw.rsMain entry point, coordinates build and search
GraphLayersgraph_layers.rsIn-memory graph representation
GraphLayersBuildergraph_layers_builder.rsConstructs the HNSW graph incrementally
SearchContextsearch_context.rsHandles search traversal and scoring
HnswGlobalConfigconfig.rsConfiguration parameters
FilteredScorergraph_layers.rsScores candidates during search

Configuration Parameters

The HNSW index is configured via HnswGlobalConfig:

ParameterTypeDefaultDescription
musize16Maximum connections per layer
ef_constructusize100Construction beam width
full_scan_thresholdusize10000Minimum points to use HNSW instead of brute force
on_diskboolNoneWhether to store index on disk
indexHnswIndexConfig-Index-specific settings

Source: lib/segment/src/index/hnsw_index/config.rs

HnswM Parameter

The M parameter controls the maximum number of connections in the graph:

pub enum HnswM {
    M16,  // Default, 16 connections
    M32,  // Higher accuracy, more memory
}

Source: lib/segment/src/index/hnsw_index/hnsw/build.rs

Graph Construction

The graph is built incrementally using a modified NSW algorithm. Each inserted vector is assigned a random level l where the probability of being at level l decreases exponentially.

flowchart TD
    A[Insert Vector] --> B{Generate Random Level}
    B --> C[Calculate max_level]
    C --> D[Search Upper Layers<br/>ef = ef_construct]
    D --> E[Find Entry Point]
    E --> F[For each level l from max_level to 0]
    F --> G[Search Layer l<br/>ef = ef_construct]
    G --> H[Connect to nearest neighbors<br/>M connections max]
    H --> I{Next level?}
    I -->|Yes| F
    I -->|No| J[Insert Complete]

Construction Algorithm

The build function in lib/segment/src/index/hnsw_index/hnsw/build.rs handles the complete construction process:

  1. Level Assignment: Vectors are assigned to levels using an exponential distribution
  2. Upper Layer Traversal: Starting from entry point, traverse upward finding closest entry point at each level
  3. Greedy Search: At each layer, perform greedy search connecting to M nearest neighbors
  4. Heuristic Refinement: Optionally use heuristics to improve graph connectivity

Source: lib/segment/src/index/hnsw_index/hnsw/build.rs

Benchmark Configuration

The default benchmark configuration for graph construction:

ParameterValueDescription
NUM_VECTORS10000Number of vectors to index
DIM32Vector dimensionality
M16Maximum connections
EF_CONSTRUCT64Construction beam width
USE_HEURISTICtrueEnable heuristic optimization

Source: lib/segment/benches/hnsw_build_graph.rs

Search Algorithm

HNSW search works by traversing from the top layer down to the bottom, using a best-first search strategy with an error-bounded priority queue.

sequenceDiagram
    participant Query
    participant SearchContext
    participant GraphLayers
    participant VectorStorage

    Query->>SearchContext: search(query_vector, ef, filter)
    SearchContext->>GraphLayers: get_entry_point()
    GraphLayers-->>SearchContext: entry_point

    loop For each level from top to bottom
        SearchContext->>SearchContext: search_layer(entry_point, ef)
        SearchContext->>VectorStorage: score_points(visited_set)
        VectorStorage-->>SearchContext: distances
        SearchContext->>SearchContext: update_candidates(distances)
    end

    SearchContext-->>Query: Top-k results

Search Parameters

ParameterDescription
hnsw_efSearch beam width (default: from config or 128)
exactIf true, perform brute force exact search
use_filtersApply payload filters during search

Source: lib/segment/src/index/hnsw_index/search_context.rs

GPU Acceleration

Qdrant supports GPU-accelerated HNSW indexing with NVIDIA and AMD GPUs:

#[cfg(feature = "gpu")]
use crate::index::hnsw_index::gpu::gpu_graph_builder::GPU_MAX_VISITED_FLAGS_FACTOR;
ComponentPurpose
GpuInsertContextGPU-based vector insertion
gpu_graph_builderGPU-accelerated graph construction
get_gpu_groups_countDetermines available GPU resources

Source: lib/segment/src/index/hnsw_index/hnsw/build.rs

Quantization Integration

HNSW integrates with Qdrant's quantization subsystem to enable compressed vector storage while maintaining search capability:

  • Scalar Quantization: 4× compression with minimal accuracy loss
  • Product Quantization (PQ): High compression with codebook-based scoring
  • Binary Quantization: Maximum compression for high-dimensional vectors
  • TurboQuant: Aggressive compression for extreme memory reduction

Known Issues

Multiple flaky tests exist in the quantized search test suite, primarily in lib/segment/tests/integration/hnsw_quantized_search_test.rs. These tests verify that quantized search returns scores consistent with full-precision search:

Test NameIssue
hnsw_turbo_quantization_cosine_larger_bits2_testFlaky: best_2.score >= best_1.score assertion
hnsw_turbo_quantization_cosine_larger_testFlaky: best_2.score >= best_1.score assertion
hnsw_quantized_search_manhattan_testFlaky: best_2.score >= best_1.score assertion
hnsw_quantized_search_euclid_testFlaky: best_2.score >= best_1.score assertion
hnsw_turbo_quantization_dot_testFlaky: best_2.score >= best_1.score assertion
hnsw_turbo_quantization_manhattan_testFlaky: best_2.score >= best_1.score assertion

These tests may fail intermittently when quantization introduces numerical precision differences that cause the score ordering to differ slightly from full-precision results.

Source: lib/segment/src/index/hnsw_index/graph_layers.rs

Vector Index Implementation

The VectorIndex trait provides the public interface for HNSW operations:

pub trait VectorIndex {
    fn search(
        &self,
        vectors: &QueryContext,
        top: usize,
        filter: Option<&Filter>,
        search_runtime: &SearchRuntimeConfig,
        timeout: StopCondition,
    ) -> OperationResult<Vec<Vec<PointId>>;

    fn build_index(&mut self, args: VectorIndexBuildArgs) -> OperationResult<BuildIndexResult>;
}

Source: lib/segment/src/index/hnsw_index/hnsw/vector_index_impl.rs

Memory Management

HNSW indexes in Qdrant can be configured for different storage backends:

Storage TypeConfigurationUse Case
In-MemoryDefaultMaximum performance
Memory-Mappedon_disk: true with mmapLarge indexes that exceed RAM
GridStoreNew default (v1.16+)Reduced tail latencies

The GridStore backend provides non-blocking flushes to reduce search tail latencies, a feature introduced in v1.17.1.

Source: lib/segment/src/index/hnsw_index/hnsw.rs

Payload Filtering

HNSW search supports payload-based filtering through the Filter condition system:

pub trait PayloadIndex {
    fn build_index(
        &self,
        field: PayloadKeyTypeRef,
        payload_schema: &PayloadFieldSchema,
        hw_counter: &HardwareCounterCell,
    ) -> OperationResult<BuildIndexResult>;
}

Payload indexes allow efficient filtering by indexing common payload fields (keywords, integers, geo, text, datetime) before or during HNSW search.

Source: lib/segment/src/index/payload_index_base.rs

Performance Considerations

Build Performance

  • CPU: Multi-threaded construction with configurable parallelism
  • GPU: Optional GPU acceleration for large-scale indexing
  • Memory: GPU_MAX_VISITED_FLAGS_FACTOR controls GPU memory allocation

Search Performance

  • ef Parameter: Higher values = more accurate but slower
  • Quantization: Enables larger datasets in memory at cost of precision
  • Payload Filters: Can significantly reduce effective search space

Known Limitations

  • Adding new vector fields after collection creation is not supported (Issue #1132)
  • Deleted vectors are not marked as deleted in the index (Issue #2550), which can affect optimizer and query planner efficiency

Source: https://github.com/qdrant/qdrant / Human Manual

Vector Storage

Related topics: HNSW Index Implementation, Quantization System, Storage Engine and Persistence

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: HNSW Index Implementation, Quantization System, Storage Engine and Persistence

Vector Storage

Vector Storage is a core subsystem within Qdrant's segment layer responsible for storing, managing, and querying vector embeddings. It provides the low-level infrastructure that enables efficient similarity search across dense, sparse, quantized, and multi-vector data types.

Overview

The Vector Storage module (lib/segment/src/vector_storage/) implements a layered architecture that separates storage concerns from query execution. This design enables Qdrant to support multiple vector representations while sharing common scoring logic.

graph TD
    A[Vector Storage Module] --> B[Dense Vector Storage]
    A --> C[Sparse Vector Storage]
    A --> D[Quantized Vector Storage]
    A --> E[Multi-Dense Vector Storage]
    
    B --> F[Chunked Vector Storage]
    B --> G[Volatile Chunked Vectors]
    
    F --> H[Common Layer]
    G --> H
    D --> H
    E --> H
    
    H --> I[Raw Scorer]
    H --> J[Query Scorer]
    I --> K[Vector Storage Base]
    J --> K

Module Exports (mod.rs)

ModulePurpose
chunked_vectorsFixed-size chunked vector storage
commonShared utilities and constants
denseDense float vector storage implementation
memory_reporterMemory usage tracking
multi_denseMulti-vector (e.g., ColBERT) storage
prefill_deletedDeleted vector tracking for prefetching
quantizedQuantized/compressed vector storage
queryQuery construction and processing
query_scorerScorer implementations for queries
raw_scorerRaw scoring without post-processing
read_onlyRead-only vector storage variants
sparseSparse vector storage implementation
vector_storage_baseCore trait definitions
volatile_chunked_vectorsEphemeral chunked vector storage

Source: lib/segment/src/vector_storage/mod.rs:1-24

Source: https://github.com/qdrant/qdrant / Human Manual

Quantization System

Related topics: HNSW Index Implementation, Vector Storage

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Quantization Library Structure

Continue reading this section for the full explanation and source context.

Section Scalar Quantization

Continue reading this section for the full explanation and source context.

Related topics: HNSW Index Implementation, Vector Storage

Quantization System

Overview

The Quantization System in Qdrant provides vector compression capabilities to reduce memory footprint and accelerate similarity search operations. It encodes high-dimensional floating-point vectors into compact binary or low-precision representations, enabling efficient storage and fast approximate nearest neighbor (ANN) queries on resource-constrained deployments.

Quantization is a core performance optimization mechanism that trades a small amount of recall accuracy for significant gains in memory usage and query throughput. The system supports multiple quantization strategies and integrates with Qdrant's HNSW index structure for accelerated retrieval.

Architecture

Core Components

The quantization system is implemented as a dedicated library module located in lib/quantization/. The architecture follows a trait-based design pattern that allows different quantization methods to share a common interface.

graph TD
    A[Quantization Module] --> B[EncodedVectors Trait]
    A --> C[Scalar Quantization]
    A --> D[Product Quantization]
    A --> E[Binary Quantization]
    A --> F[TurboQuant]
    
    B --> G[EncodedVectorsPQ]
    B --> H[EncodedVectorsTurboQuant]
    
    F --> I[Quantization Algorithm]
    F --> J[Lookup Tables]
    
    K[HNSW Index] --> L[Quantized Segment Search]
    L --> B

Quantization Library Structure

lib/quantization/src/
├── lib.rs              # Module root, traits, and public API
├── encoded_vectors.rs  # Core trait for encoded vector representations
├── encoded_vectors_pq.rs    # Product Quantization implementation
└── turboquant/
    ├── mod.rs          # TurboQuant module
    └── quantization.rs  # TurboQuant encoding algorithm

Quantization Types

Scalar Quantization

Scalar quantization converts each vector component from 32-bit float to a lower precision integer representation. This provides up to 4× compression (32-bit → 8-bit) while maintaining reasonable search quality.

CompressionBits per ComponentMemory Reduction
Full Float32 bits
Int88 bits
Int44 bits
Int22 bits16×

Product Quantization (PQ)

Product Quantization divides vectors into subvectors and clusters each subspace independently, encoding each with a codebook index. This approach is particularly effective for high-dimensional vectors.

ParameterDescription
Codebook SizeNumber of centroids per subspace (typical: 256)
Subspace CountNumber of divisions of the original vector
Compression RatioDetermined by codebook size and subspace count

Binary Quantization

Binary quantization converts vectors to binary strings (0/1), providing extreme compression. It works best with high-dimensional vectors (≥1024 dimensions) where the Hamming distance can approximate cosine similarity.

TurboQuant (ICLR 2026)

TurboQuant represents an advanced quantization approach designed for aggressive compression without significant quality degradation. The system implements novel encoding techniques that maintain search quality even at extreme compression ratios.

TurboQuant is currently under active development with ongoing improvements to multi-vector scorer support and io_uring integration. Source: lib/quantization/src/turboquant/mod.rs

Core API

EncodedVectors Trait

The EncodedVectors trait defines the interface for all quantized vector implementations:

pub trait EncodedVectors: VectorStorageEnum {
    fn storage_size_bytes(&self) -> usize;
    fn len(&self) -> usize;
    fn get_quantized_vector(&self, key: PointOffsetType) -> &QuantizedVector;
    fn from_offsets_and_typed_data(
        offsets: ByteStoredVec<usize>,
        data: ByteStorageType,
    ) -> Self;
}

Vector Storage Integration

Quantized vectors integrate with the segment's vector storage layer through the following hierarchy:

Source: lib/segment/src/data_types/mod.rs

graph LR
    A[VectorStorage] --> B[VectorStorageEnum]
    B --> C[PlainVectorStorage]
    B --> D[QuantizedVectorStorage]
    D --> E[EncodedVectorsPQ]
    D --> F[EncodedVectorsTurboQuant]

Configuration

Quantization Parameters

Quantization is configured at the collection level through the QuantizationConfig structure:

ParameterTypeDefaultDescription
quantizationEnumNoneQuantization type selection
vector_storageVectorParamsPer-vectorStorage configuration
hnswHnswConfigDiffSystem defaultIndex parameters

Search Configuration

During search operations, quantization behavior can be controlled:

ParameterDescription
quantizationSearch-time quantization settings
rescoreEnable/disable rescoring with full vectors
oversamplingSearch more candidates for better recall

Integration with HNSW

The HNSW index can leverage quantized vectors for both the graph structure and the candidates themselves. This enables:

  1. Memory-Efficient Graph Navigation: The HNSW graph stores quantized entry points
  2. Fast Candidate Scoring: Distances computed against quantized representations
  3. Optional Rescoring: Full-precision rescoring of top candidates

Source: lib/segment/tests/integration/hnsw_quantized_search_test.rs

Scoring with Quantized Vectors

The system implements specialized scorers for quantized multi-vector data, with recent improvements for io_uring support:

Source: lib/quantization/src/encoded_vectors_pq.rs

sequenceDiagram
    participant Query as Query Vector
    participant HNSW as HNSW Index
    participant Quantized as Quantized Storage
    participant Rescorer as Rescorer (Optional)
    
    Query->>HNSW: Navigate graph
    HNSW->>Quantized: Get candidates
    Quantized-->>HNSW: Quantized distances
    HNSW-->>Rescorer: Top-K candidates
    alt Rescoring enabled
        Rescorer->>Rescorer: Compute full-precision scores
        Rescorer-->>Query: Final ranked results
    else Direct return
        Quantized-->>Query: Final ranked results
    end

Performance Characteristics

Memory Savings

Quantization provides significant memory savings depending on the method:

MethodCompression RatioQuality Retention
Scalar (Int8)~95-99%
Product Quantization8-64×~90-97%
Binary32×~85-95% (high-dim only)
TurboQuantVariableTo be documented

Query Latency

Quantized search typically reduces latency through:

  • Reduced Memory Bandwidth: Smaller data to transfer from storage
  • SIMD Optimization: Vectorized distance calculations
  • Cache Efficiency: Better cache utilization with compressed data

Known Issues and Limitations

Flaky Tests

Several flaky tests have been reported in the HNSW quantized search test suite, particularly with TurboQuant:

  • hnsw_turbo_quantization_cosine_larger_bits2_test (Issue #8835)
  • hnsw_turbo_quantization_cosine_larger_test (Issue #8801)
  • hnsw_turbo_quantization_dot_test (Issue #8906)
  • hnsw_turbo_quantization_manhattan_test (Issue #8834)
  • hnsw_quantized_search_manhattan_test (Issue #8806)
  • hnsw_quantized_search_euclid_test (Issue #8735)

These tests occasionally fail with the assertion best_2.score >= best_1.score, indicating potential issues with score ordering in quantized search results. The tests are located at lib/segment/tests/integration/hnsw_quantized_search_test.rs:314.

Quality vs Compression Tradeoff

As noted in community discussions (Issue #8524), the current quantization options present tradeoffs:

  • Scalar quantization: Solid but tops out at 4× compression
  • Binary quantization: Falls apart below 1024 dimensions
  • Product Quantization: Requires codebook training and may underperform at high compression

TurboQuant aims to address these limitations with a novel approach designed for aggressive compression without major quality degradation.

Future Development

TurboQuant Tracking

Issue #8670 tracks the TurboQuant implementation progress. Current development focuses on:

  • Improving multi-vector scorer compatibility
  • Enhanced io_uring support for async I/O
  • Validation of quantization parameters

Design documentation is available in the internal TurboQuant Design Doc.

Best Practices

When to Use Quantization

  • Memory-Constrained Environments: When dataset exceeds available RAM
  • High-Dimensional Vectors: When vectors have >512 dimensions
  • Latency-Critical Applications: When search latency is prioritized over exact recall
  • Cold Storage Optimization: For archived or infrequently accessed data

Configuration Recommendations

  1. Start with Scalar Quantization for a balanced tradeoff
  2. Use Product Quantization for high-dimensional data requiring >8× compression
  3. Avoid Binary Quantization for vectors under 1024 dimensions
  4. Enable Rescoring when recall is critical
  5. Monitor Quality Metrics with representative queries

Source: https://github.com/qdrant/qdrant / Human Manual

Sharding and Replication

Related topics: Consensus and Cluster Coordination, System Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Shard Types

Continue reading this section for the full explanation and source context.

Section Replica Set

Continue reading this section for the full explanation and source context.

Section ShardHolder

Continue reading this section for the full explanation and source context.

Related topics: Consensus and Cluster Coordination, System Architecture

Sharding and Replication

Qdrant implements a distributed architecture that combines horizontal sharding with replication to achieve scalability, fault tolerance, and high availability. This document describes the sharding and replication system as implemented in the collection layer.

Overview

Sharding distributes data across multiple physical shards, each responsible for a subset of points based on a hash ring. Replication creates redundant copies of each shard across different peers to ensure durability and read availability.

The sharding system in Qdrant operates at the collection level. Each collection can be divided into N shards, with each shard having R replicas distributed across the cluster. Source: lib/collection/src/shards/mod.rs:1-50

graph TB
    subgraph "Qdrant Cluster"
        subgraph "Collection"
            subgraph "Shard 0"
                RS0[Replica Set<br/>Peer A:Active<br/>Peer B:Recovery]
                RS1[Replica Set<br/>Peer C:Active]
            end
            subgraph "Shard 1"
                RS2[Replica Set<br/>Peer A:Active<br/>Peer C:Recovery]
            end
        end
    end
    Client([Client Request])
    Client --> RS0

Core Components

Shard Types

Qdrant defines several shard types to handle different scenarios in distributed operations: Source: lib/collection/src/shards/mod.rs:1-50

Shard TypePurpose
LocalShardPrimary storage for data on a peer; handles read/write operations
RemoteShardProxy to a shard located on another peer
ProxyShardWrapper that delegates operations to underlying shards
QueueProxyShardProxy that queues operations for batch processing
ForwardProxyShardProxy that forwards write operations
DummyShardPlaceholder for shards not present on current peer

Replica Set

The ReplicaSet manages multiple replicas of a single shard across different peers. It coordinates read/write distribution, replica health monitoring, and failover behavior.

Key responsibilities include:

  • Tracking peer states for each replica
  • Routing operations to appropriate replicas based on consistency requirements
  • Managing replica state transitions
  • Handling peer failures and recovery

Source: lib/collection/src/shards/replica_set/mod.rs

ShardHolder

The ShardHolder is the central coordinator for all shards within a collection. It maintains the mapping between shard IDs and their replica sets, handles shard operations, and provides the interface for collection-level operations. Source: lib/collection/src/shards/shard_holder/mod.rs:1-100

classDiagram
    class ShardHolder {
        +shards: HashMap~ShardId, ReplicaSet~
        +hash_ring: HashRingRouter
        +add_shard(shard_id, replica_set)
        +remove_shard(shard_id)
        +get_shard(shard_id)
        +split_by_shard(operation)
    }
    class ReplicaSet {
        +shard_id: ShardId
        +peer_states: HashMap~PeerId, ReplicaState~
        +this_peer_id: PeerId
        +update_peer_state(peer, state)
        +is_local(): bool
    }
    class Shard {
        +shard_id: ShardId
        +peer_id: PeerId
    }
    ShardHolder --> ReplicaSet
    ReplicaSet --> Shard

Replica States

Each replica in a replica set has a state that determines its role and readiness. The state machine ensures proper initialization, recovery, and failover handling. Source: lib/collection/src/shards/replica_set/mod.rs

StateDescription
ActiveFully operational; accepts reads and writes
InitializingBeing created or recovered from snapshot
DeadPeer is unreachable; replica unavailable
PartialSnapshotPartial snapshot received; incomplete data
RecoveryReceiving updates to catch up
ListenerOnlyReceives updates but not eligible for writes
ReshardingParticipating in resharding operation

State Transitions

stateDiagram-v2
    [*] --> Initializing
    Initializing --> Recovery: Data transfer starts
    Initializing --> Active: Immediate activation
    Recovery --> Active: Sync complete
    Recovery --> PartialSnapshot: Interrupted sync
    Active --> Dead: Peer failure
    Dead --> Recovery: Peer recovers
    Active --> ListenerOnly: Demotion
    ListenerOnly --> Active: Promotion
    Active --> Resharding: Resharding begins
    Resharding --> [*]: Resharding completes

Local Shard Initialization Handling

When a local shard is stuck in Initializing state on a single-node (non-distributed) deployment, the system automatically transitions it to Active state: Source: lib/collection/src/shards/shard_holder/mod.rs:40-60

// Change local shards stuck in Initializing state to Active
let not_distributed = !shared_storage_config.is_distributed;
let is_local = replica_set.this_peer_id() == local_peer_id && replica_set.is_local().await;
let is_initializing = replica_set.peer_state(local_peer_id) == Some(ReplicaState::Initializing);
if not_distributed && is_local && is_initializing {
    log::warn!(
        "Local shard {collection_id}:{} stuck in Initializing state, changing to Active",
        replica_set.shard_id,
    );
    replica_set.set_replica_state(local_peer_id, ReplicaState::Active).await?;
}

Shard Transfer Operations

Shard transfers move data between peers, supporting cluster rebalancing and node replacement. The transfer mechanism handles three recovery stages: Source: lib/collection/src/shards/transfer/mod.rs

Recovery StageDescription
SnapshotTransfer via snapshot file (full copy)
WalDeltaTransfer via Write-Ahead Log delta
StreamRecordsTransfer via streaming records

Transfer Workflow

sequenceDiagram
    participant Coordinator
    participant SourcePeer
    participant TargetPeer
    participant Consensus
    
    Coordinator->>Consensus: Initiate transfer
    Consensus-->>Coordinator: Transfer registered
    Coordinator->>TargetPeer: Create shard (Initializing)
    TargetPeer-->>Coordinator: Shard created
    Coordinator->>SourcePeer: Start snapshot/stream
    loop Transfer data
        SourcePeer->>TargetPeer: Send records/snapshot
    end
    TargetPeer->>SourcePeer: Confirm sync complete
    SourcePeer->>Consensus: Notify completion
    Consensus->>TargetPeer: Set Active state
    Coordinator->>SourcePeer: Set Dead state

Resharding

Resharding changes the number of shards in a collection, either increasing (scale up) or decreasing (scale down) the shard count. This operation redistributes data across the hash ring.

Resharding Operations

The resharding process uses a dedicated state machine: Source: lib/collection/src/operations/cluster_ops.rs

OperationDescription
CreateShardCreate a new shard during scale-up
MoveShardMove shard from one peer to another
MoveShardKeyMove all shards with specific key
ReplicateShardKeyAdd replicas for a shard key
ReplicatePointsReplicate points between shard keys
FinishReshardingComplete resharding operation
AbortReshardingCancel resharding operation

Resharding State

#[derive(Copy, Clone, Debug, Deserialize, Serialize)]
pub enum ReshardingStage {
    /// Scale up, add a new shard
    Up,
    /// Scale down, remove a shard
    Down,
}

Source: lib/collection/src/operations/cluster_ops.rs:1-50

Operation Distribution

Operations are distributed across shards based on the hash ring. The SplitByShard trait defines how each operation type is split: Source: lib/collection/src/operations/mod.rs:30-60

graph LR
    Operation([Operation]) --> HashRing{Hash Ring Router}
    HashRing -->|Point ID Hash| Shard0[Shard 0]
    HashRing -->|Point ID Hash| Shard1[Shard 1]
    HashRing -->|Point ID Hash| ShardN[Shard N]
    
    Shard0 --> Result0[Result]
    Shard1 --> Result1[Result]
    ShardN --> ResultN[Result]
    
    Result0 --> Merged[Merged Result]
    Result1 --> Merged
    ResultN --> Merged

SplitByShard Implementation

impl SplitByShard for CollectionUpdateOperations {
    fn split_by_shard(self, ring: &HashRingRouter) -> OperationToShard<Self> {
        match self {
            CollectionUpdateOperations::PointOperation(operation) => operation
                .split_by_shard(ring)
                .map(CollectionUpdateOperations::PointOperation),
            CollectionUpdateOperations::VectorOperation(operation) => operation
                .split_by_shard(ring)
                .map(CollectionUpdateOperations::VectorOperation),
            CollectionUpdateOperations::PayloadOperation(operation) => operation
                .split_by_shard(ring)
                .map(CollectionUpdateOperations::PayloadOperation),
        }
    }
}

Source: lib/collection/src/operations/mod.rs:30-60

Consistency Parameters

Qdrant supports configurable read and write consistency levels per request:

ParameterDescription
write_consistency_factorNumber of replicas that must acknowledge writes (default: 1)
read_fan_out_factorNumber of replicas to query for reads
read_fan_out_delay_msDelay before reading from non-primary replicas

These parameters are defined in the collection configuration and can be validated against cluster state: Source: lib/collection/src/config.rs

Consensus Operations

Distributed operations that affect cluster state are coordinated through Raft consensus: Source: lib/storage/src/content_manager/mod.rs

Consensus OperationPurpose
CollectionMetaCollection-level metadata changes
AddPeerRegister new peer
RemovePeerRemove peer from cluster
UpdatePeerMetadataUpdate peer information
UpdateClusterMetadataUpdate cluster-wide metadata
RequestSnapshotRequest state snapshot
ReportSnapshotReport snapshot status

Transfer Consensus Operations

impl ConsensusOperations {
    pub fn abort_transfer(
        collection_id: CollectionId,
        transfer: ShardTransfer,
        reason: &str,
    ) -> Self {
        ConsensusOperations::CollectionMeta(Box::new(
            CollectionMetaOperations::TransferShard(
                collection_id,
                ShardTransferOperations::Abort {
                    transfer: transfer.key(),
                    reason: reason.to_string(),
                },
            )
        ))
    }
    
    pub fn finish_transfer(
        collection_id: CollectionId,
        transfer: ShardTransfer,
    ) -> Self {
        ConsensusOperations::CollectionMeta(Box::new(
            CollectionMetaOperations::TransferShard(
                collection_id,
                ShardTransferOperations::Finish(transfer),
            )
        ))
    }
}

Source: lib/storage/src/content_manager/mod.rs:100-150

Shard Path Management

Shards are stored on disk with deterministic paths based on collection and shard identifiers:

/// Path to a shard directory
pub fn shard_path(collection_path: &Path, shard_id: ShardId) -> PathBuf {
    collection_path.join(shard_id.to_string())
}

/// Path to a shard directory
pub fn shard_initializing_flag_path(collection_path: &Path, shard_id: ShardId) -> PathBuf {
    collection_path.join(format!("shard_{shard_id}.initializing"))
}

Source: lib/collection/src/shards/mod.rs:40-55

Shard Snapshots

Snapshots can be created for individual shards, supporting point-in-time recovery and migration:

  • Snapshot creation captures all data and WAL state
  • Snapshots can be restored to any peer with matching configuration
  • Shard snapshots are listed and managed via REST API endpoints
  • Recovery type is recorded in snapshot manifest (RecoveryType)

Source: lib/collection/src/shards/local_shard/mod.rs

Source: https://github.com/qdrant/qdrant / Human Manual

Consensus and Cluster Coordination

Related topics: Sharding and Replication, Storage Engine and Persistence

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Operation Types

Continue reading this section for the full explanation and source context.

Section Collection Meta Operations

Continue reading this section for the full explanation and source context.

Section Peer Metadata

Continue reading this section for the full explanation and source context.

Related topics: Sharding and Replication, Storage Engine and Persistence

Consensus and Cluster Coordination

Overview

Qdrant implements a distributed cluster architecture that enables horizontal scaling of vector search operations across multiple nodes. The cluster coordination system ensures data consistency, fault tolerance, and reliable state management through a consensus-based approach.

The consensus mechanism in Qdrant is built to handle:

  • Collection management: Creating, updating, and deleting collections across the cluster
  • Shard distribution: Distributing and migrating vector data shards between nodes
  • Peer coordination: Managing node membership, peer metadata, and cluster topology
  • State synchronization: Ensuring all nodes agree on the current cluster state

High-Level Architecture

Qdrant uses a Raft-based consensus protocol for cluster coordination. All state changes that affect the cluster (collection operations, shard transfers, peer updates) are communicated through the consensus layer.

graph TD
    Client --> API[REST/gRPC API]
    API --> CollectionMgr[Collection Manager]
    CollectionMgr --> ConsensusLayer[Consensus Layer]
    ConsensusLayer --> WAL[Consensus WAL]
    ConsensusLayer --> RaftNode[Raft Node]
    RaftNode <--> Peer1[Peer Node 1]
    RaftNode <--> Peer2[Peer Node 2]
    RaftNode <--> Peer3[Peer Node 3]
    WAL --> PersistentState[Persistent State]
    PersistentState --> ClusterMeta[Cluster Metadata]

Consensus Operations

All cluster-level operations that require coordination are represented as ConsensusOperations. These operations are logged to the Consensus WAL and replicated across the cluster before being applied.

Operation Types

The ConsensusOperations enum defines all operations that pass through consensus:

OperationDescriptionParameters
CollectionMetaCollection create/update/delete operationsBox<CollectionMetaOperations>
AddPeerRegister a new peer in the clusterpeer_id, uri
RemovePeerRemove a peer from the clusterpeer_id
UpdatePeerMetadataUpdate metadata for a peerpeer_id, PeerMetadata
UpdateClusterMetadataUpdate cluster-wide metadatakey, value
RequestSnapshotRequest a consensus state snapshot-
ReportSnapshotReport snapshot status to peerspeer_id, SnapshotStatus

Source: lib/storage/src/content_manager/mod.rs:1-50

Collection Meta Operations

Collection operations are wrapped in CollectionMetaOperations and include:

  • CreateCollection: Initialize a new collection with specified vector parameters
  • UpdateCollection: Modify collection configuration
  • DeleteCollection: Remove a collection and its data
  • TransferShard: Initiate shard migration between nodes
  • AbortTransfer: Cancel an ongoing shard transfer
  • FinishTransfer: Complete a shard transfer operation
#[derive(Debug, Deserialize, Serialize, PartialEq, Eq, Hash, Clone)]
pub enum ConsensusOperations {
    CollectionMeta(Box<CollectionMetaOperations>),
    AddPeer {
        peer_id: PeerId,
        uri: String,
    },
    RemovePeer(PeerId),
    UpdatePeerMetadata {
        peer_id: PeerId,
        metadata: PeerMetadata,
    },
    UpdateClusterMetadata {
        key: String,
        value: serde_json::Value,
    },
    RequestSnapshot,
    ReportSnapshot {
        peer_id: PeerId,
        status: SnapshotStatus,
    },
}

Source: lib/storage/src/content_manager/mod.rs:25-45

Peer Metadata

Each peer maintains metadata that describes its properties:

#[derive(Clone, Debug, Eq, PartialEq, Hash, Deserialize, Serialize, JsonSchema)]
pub struct PeerMetadata {
    /// Peer Qdrant version
    pub(crate) version: Version,
}

impl PeerMetadata {
    pub fn current() -> Self {
        Self {
            version: defaults::QDRANT_VERSION.clone(),
        }
    }

    /// Whether this metadata has a different version than our current Qdrant instance.
    pub fn is_different_version(&self) -> bool {
        self.version != *defaults::QDRANT_VERSION
    }
}

Source: lib/collection/src/operations/types.rs:1-40

Replica Set State Machine

Each shard in Qdrant is replicated across multiple nodes as part of a replica set. Replica sets implement a state machine that manages shard lifecycle and handles various scenarios like transfers, failures, and recovery.

Shard Roles

RoleDescription
ActiveFully operational replica, accepts read and write operations
ListenerRead-only replica used for scaling read operations
DeadReplica that is unreachable or failed

State Transitions

graph TD
    Initializing -->|Report created| Active
    Active -->|User Promote| Active
    Active -->|Transfer Finished| Listener
    Active -->|Update Failure| Dead
    Active -->|Transfer Started| Partial
    Partial -->|Transfer Finished| Listener
    Partial -->|Transfer Started| Dead
    Listener -->|Update Failure| Dead
    Listener -->|Transfer| Partial
    Dead -->|Transfer| Partial

The state machine handles:

  1. Initialization: New replicas start in Initializing state
  2. Activation: Replicas become Active after synchronization
  3. Demotion: Active shards can be demoted to Listener after transfers
  4. Failure Handling: Dead state marks unreachable replicas
  5. Recovery: Dead replicas can be recovered through transfers

Source: lib/collection/src/shards/replica_set/mod.rs:1-50

Read Consistency

Qdrant provides configurable read consistency levels to balance between consistency and availability. These settings control how many replicas must respond before returning results.

Consistency Types

TypeBehavior
MajoritySend N/2+1 random requests, return points present on all responses
QuorumSend requests to all nodes, return points present on majority
AllSend requests to all nodes, return only points present on all nodes
#[derive(Debug, Deserialize, Serialize, JsonSchema, Copy, Clone, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum ReadConsistencyType {
    // send N/2+1 random request and return points, which present on all of them
    Majority,
    // send requests to all nodes and return points which present on majority of nodes
    Quorum,
    // send requests to all nodes and return points which present on all nodes
    All,
}

Source: lib/collection/src/operations/consistency_params.rs:1-35

Consistency Parameter Mapping

The gRPC protocol maps integer values to consistency types:

impl TryFrom<i32> for ReadConsistencyType {
    type Error = tonic::Status;

    fn try_from(consistency: i32) -> Result<Self, Self::Error> {
        let consistency = ReadConsistencyTypeGrpc::try_from(consistency).map_err(|_| {
            tonic::Status::invalid_argument(format!(
                "invalid read consistency type value {consistency}",
            ))
        })?;

        Ok(consistency.into())
    }
}

Cluster Telemetry

The cluster telemetry system provides visibility into the distributed state of the system.

Cluster Info API

The REST API exposes cluster information through the /collections/{collection_name}/cluster endpoint:

/collections/{collection_name}/cluster:
    get:
      tags:
        - Distributed
      summary: Collection cluster info
      description: Get cluster information for a collection
      operationId: collection_cluster_info
      parameters:
        - name: collection_name
          in: path
          description: Name of the collection to retrieve the cluster info for
          required: true
          schema:
            type: string
      responses:
        200:
          description: Successful response
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/CollectionClusterInfo"

Source: openapi/openapi-collections.ytt.yaml:1-80

Cluster Metadata

Cluster telemetry includes:

  • Cluster metadata: Distributed cluster configuration and state
  • Peer information: Peer IDs and connection states
  • Resharding status: Whether resharding operations are enabled
.telemetry_level >= DetailsLevel::Level1)
    .then(|| {
        dispatcher
            .consensus_state()
            .map(|state| state.persistent.read().cluster_metadata.clone())
            .filter(|metadata| !metadata.is_empty())
    })
    .flatten(),
resharding_enabled: Some(settings.cluster.resharding_enabled),

Source: src/common/telemetry_ops/cluster_telemetry.rs:1-30

Write-Ahead Logging for Consensus

Consensus operations are persisted to a dedicated WAL (Write-Ahead Log) to ensure durability and crash recovery. The Consensus WAL differs from the segment WAL used for point operations.

WAL Entry Serialization

Consensus operations are serialized using CBOR for efficiency:

impl TryFrom<&RaftEntry> for ConsensusOperations {
    type Error = serde_cbor::Error;

    fn try_from(entry: &RaftEntry) -> Result<Self, Self::Error> {
        serde_cbor::from_slice(entry.get_data())
    }
}

Operation Abort Mechanism

The consensus layer provides methods to safely abort ongoing operations:

impl ConsensusOperations {
    pub fn abort_transfer(
        collection_id: CollectionId,
        transfer: ShardTransfer,
        reason: &str,
    ) -> Self {
        ConsensusOperations::CollectionMeta(Box::new(
            CollectionMetaOperations::TransferShard(
                collection_id,
                ShardTransferOperations::Abort {
                    transfer: transfer.key(),
                    reason: reason.to_string(),
                },
            ),
        ))
    }

    pub fn finish_transfer(collection_id: CollectionId, transfer: ShardTransfer) -> Self {
        ConsensusOperations::CollectionMeta(Box::new(
            CollectionMetaOperations::TransferShard(
                collection_id,
                ShardTransferOperations::Finish(transfer),
            ),
        ))
    }
}

Source: lib/storage/src/content_manager/mod.rs:60-90

Snapshot Application

When applying snapshots from other peers, the system must notify pending consensus operations to ensure consistency:

# Bug Fix in v1.18.1
- https://github.com/qdrant/qdrant/pull/8990 - Notify pending consensus ops on snapshot apply

This fix ensures that when a snapshot is applied, any pending consensus operations are properly synchronized to maintain cluster state integrity.

Optimization Progress Tracking

Cluster operations include tracking optimization progress for collections:

/collections/{collection_name}/optimizations:
    get:
      tags:
        - Collections
      summary: Get optimization progress
      description: Get progress of ongoing and completed optimizations for a collection
      operationId: get_optimizations
      parameters:
        - name: collection_name
          in: path
          description: Name of the collection
          required: true
          schema:
            type: string
        - name: with
          in: query
          description: |-
            Comma-separated list of optional fields to include in the response.
            Possible values: queued, completed, idle_segments.
          required: false
          schema:
            type: string
        - name: completed_limit
          in: query
          description: Maximum number of completed optimizations to return.
          required: false
          schema:
            type: integer
            minimum: 0
            default: 16
      responses:
        200:
          description: Successful response
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/OptimizationsResponse"

Source: openapi/openapi-collections.ytt.yaml:80-150

Key configuration parameters for cluster coordination:

ParameterDescriptionDefault
cluster.resharding_enabledEnable shard resharding operationsPlatform-specific
Consensus WAL sizeMaximum entries in consensus WALPlatform-specific
Snapshot intervalFrequency of consensus snapshotsPlatform-specific
Heartbeat timeoutPeer heartbeat detection intervalPlatform-specific

See Also

Source: https://github.com/qdrant/qdrant / Human Manual

REST and gRPC API

Related topics: System Architecture, Introduction to Qdrant

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Dual-Interface Design

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section OpenAPI Specification

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Introduction to Qdrant

REST and gRPC API

Overview

Qdrant provides a dual-layer API architecture that exposes vector search capabilities through both REST (HTTP/JSON) and gRPC (Protocol Buffers) interfaces. This dual approach offers flexibility for different client environments and performance requirements.

The REST API provides broad accessibility with JSON serialization, making it ideal for web clients, scripting, and debugging. The gRPC API offers lower latency and more efficient bandwidth usage for high-throughput production workloads.

Source: lib/api/src/lib.rs:1-7

pub mod conversions;
pub mod grpc;
pub mod rest;

pub const HTTP_HEADER_API_KEY: &str = "api-key";

API Architecture

Dual-Interface Design

The Qdrant API layer is organized around a unified internal representation, with conversion layers that transform between external formats and internal types.

graph TD
    subgraph "Client Layer"
        REST_CLIENT[REST Client<br/>JSON/HTTP]
        GRPC_CLIENT[gRPC Client<br/>Protocol Buffers]
    end
    
    subgraph "API Layer"
        REST_API[REST API Handler]
        GRPC_API[gRPC Service Handler]
        CONVERSIONS[Conversion Layer]
    end
    
    subgraph "Internal Layer"
        INFERENCE[Inference Service]
        OPERATIONS[Collection Operations]
        SEGMENT[Segment Management]
    end
    
    REST_CLIENT -->|HTTP/JSON| REST_API
    GRPC_CLIENT -->|gRPC/Protobuf| GRPC_API
    REST_API --> CONVERSIONS
    GRPC_API --> CONVERSIONS
    CONVERSIONS --> INFERENCE
    CONVERSIONS --> OPERATIONS
    CONVERSIONS --> SEGMENT

Module Structure

The API implementation resides in lib/api/src/ with the following organization:

ModulePurpose
rest/REST API models, handlers, and JSON serialization
grpc/gRPC service definitions and Protocol Buffer types
conversions/Bidirectional conversion between REST/gRPC and internal types

Source: lib/api/src/lib.rs:1-7

REST API

OpenAPI Specification

The REST API is defined using OpenAPI 3.0 specifications, generated from YAML templates in the openapi/ directory. These specifications provide comprehensive documentation and can be used to generate client SDKs.

Source: openapi/openapi.lib.yml:1-50

Core Endpoints

The REST API covers several functional areas:

#### Points Operations

Points are the fundamental data units in Qdrant, containing vectors and optional payloads.

EndpointMethodDescription
/collections/{name}/points/queryPOSTUniversal query endpoint
/collections/{name}/points/query/batchPOSTBatch query endpoint
/collections/{name}/points/batchPOSTBatch update operations
/collections/{name}/points/payload/clearPOSTClear payload from points

Source: openapi/openapi-points.ytt.yaml:1-100

#### Query Operations

The universal query endpoint provides access to all search capabilities including search, recommend, discover, and hybrid queries.

/collections/{collection_name}/points/query:
  post:
    tags:
      - Search
    summary: Query points
    description: Universally query points. This endpoint covers all capabilities 
                 of search, recommend, discover, filters. But also enables 
                 hybrid and multi-stage queries.

Source: openapi/openapi-main.ytt.yaml:1-50

#### Facet Operations

Faceted search allows counting points by unique payload values:

ParameterLocationDescription
collection_namepathName of the collection to facet in
consistencyqueryRead consistency guarantees
timeoutqueryRequest timeout override (seconds)

Source: openapi/openapi-main.ytt.yaml:50-100

Document and Image Support

The REST API supports structured inference objects through the Document and Image types:

impl From<rest::Document> for grpc::Document {
    fn from(document: rest::Document) -> Self {
        let rest::Document {
            text,
            model,
            options,
        } = document;
        Self {
            text,
            model,
            options: options
                .map(DocumentOptions::into_options)
                .map(dict_to_proto)
                .unwrap_or_default(),
        }
    }
}

Source: lib/api/src/conversions/inference.rs:1-35

gRPC API

Service Architecture

The gRPC API is built on Protocol Buffers with service definitions that mirror REST functionality. The telemetry_wrapper.rs module provides thin wrappers around gRPC service traits that extract collection names and attach telemetry extensions.

use api::grpc::qdrant::points_server::Points;
use api::grpc::qdrant::shard_snapshots_server::ShardSnapshots;
use api::grpc::qdrant::snapshots_server::Snapshots;

Source: src/tonic/api/telemetry_wrapper.rs:1-20

Server Implementations

The gRPC API exposes the following services:

ServicePurpose
PointsPoint operations, search, query
SnapshotsCollection snapshot management
ShardSnapshotsShard-level snapshot operations
RaftConsensus communication

Source: src/tonic/api/telemetry_wrapper.rs:1-50

Telemetry Integration

The telemetry wrapper pattern allows per-collection metrics collection without individual handlers needing to know about telemetry:

graph LR
    A[gRPC Request] --> B[Telemetry Wrapper]
    B --> C[Extract collection_name]
    C --> D[Attach as Extension]
    D --> E[Service Handler]
    E --> F[Tower Layer Reads Extension]
    F --> G[Record Metrics]

Source: src/tonic/api/telemetry_wrapper.rs:1-30

Conversion Layer

Conversion Architecture

The conversion layer handles bidirectional transformations between REST JSON types, gRPC Protocol Buffer types, and internal Rust types. This separation allows the internal representation to remain stable while external APIs evolve.

graph TD
    subgraph "External Formats"
        REST_JSON[REST JSON]
        GRPC_PROTO[gRPC Protobuf]
    end
    
    subgraph "Conversion Layer"
        REST_TO_INTERNAL[rest → Internal]
        INTERNAL_TO_REST[Internal → rest]
        GRPC_TO_INTERNAL[gRPC → Internal]
        INTERNAL_TO_GRPC[Internal → gRPC]
    end
    
    subgraph "Internal Types"
        INTERNAL[Internal Operations]
    end
    
    REST_JSON --> REST_TO_INTERNAL
    GRPC_PROTO --> GRPC_TO_INTERNAL
    REST_TO_INTERNAL --> INTERNAL
    GRPC_TO_INTERNAL --> INTERNAL
    INTERNAL --> INTERNAL_TO_REST
    INTERNAL --> INTERNAL_TO_GRPC
    INTERNAL_TO_REST --> REST_JSON
    INTERNAL_TO_GRPC --> GRPC_PROTO

Query Request Conversions

Query requests undergo multiple conversion stages:

use collection::operations::universal_query::collection_query::{
    CollectionPrefetch, CollectionQueryGroupsRequest, CollectionQueryRequest, 
    FeedbackInternal, FeedbackStrategy, Mmr, NearestWithMmr, Query, 
    VectorInputInternal, VectorQuery,
};

Source: src/common/inference/query_requests_grpc.rs:1-25

Batch Processing

The batch processing system accumulates inference objects across multiple requests:

pub struct BatchAccumGrpc {
    pub(crate) objects: HashSet<InferenceData>,
}

impl BatchAccumGrpc {
    pub fn new() -> Self {
        Self {
            objects: HashSet::new(),
        }
    }

    pub fn add(&mut self, data: InferenceData) {
        self.objects.insert(data);
    }

    pub fn extend(&mut self, other: BatchAccumGrpc) {
        self.objects.extend(other.objects);
    }
}

Source: src/common/inference/batch_processing_grpc.rs:1-55

Operation Conversions

The conversions.rs module handles complex operation conversions, such as the DiscoverRequest:

let api::grpc::qdrant::DiscoverPoints {
    collection_name,
    target,
    context,
    filter,
    limit,
    offset,
    with_payload,
    params,
    using,
    with_vectors,
    lookup_from,
    read_consistency,
    timeout,
    shard_key_selector,
} = value;

let target = target.map(RecommendExample::try_from).transpose()?;

let context = context
    .into_iter()
    .map(|pair| {
        match (
            pair.positive.map(|p| p.try_into()),
            pair.negative.map(|n| n.try_into()),
        ) {
            (Some(Ok(positive)), Some(Ok(negative))) => {
                Ok(ContextExamplePair { positive, negative })
            }
            (Some(Err(e)), _) | (_, Some(Err(e))) => Err(e),
            (None, _) | (_, None) => Err(Status::invalid_argument(
                "Both positive and negative are required in a context pair",
            )),
        }
    })
    .try_collect()?;

Source: lib/collection/src/operations/conversions.rs:1-80

Request Flow

REST Request Path

  1. HTTP request arrives at Actix web handler
  2. Request body deserialized from JSON
  3. REST model converted to internal operation type
  4. Operation executed against collection/shard
  5. Internal result converted back to REST response
  6. Response serialized to JSON and returned

gRPC Request Path

  1. Protobuf message received by Tonic service
  2. Message validated using protobuf validation attributes
  3. gRPC model converted to internal operation type
  4. Operation executed with telemetry extension attached
  5. Internal result converted back to gRPC response
  6. Protobuf message serialized and returned

Configuration and Constraints

The API respects system constraints defined in StrictModeConfig:

ParameterPurpose
max_query_limitMaximum number of results
max_timeoutMaximum request timeout
search_max_hnsw_efMaximum HNSW ef parameter
search_allow_exactAllow exact match searches
search_max_oversamplingMaximum oversampling factor
upsert_max_batchsizeMaximum upsert batch size
search_max_batchsizeMaximum search batch size

Source: lib/storage/src/content_manager/conversions.rs:1-60

Authentication

The REST API uses an API key header for authentication:

pub const HTTP_HEADER_API_KEY: &str = "api-key";

Clients must include this header in all requests to authenticated endpoints.

Source: lib/api/src/lib.rs:7

Build Process

The API layer is generated during the build process using build.rs:

// Fetch git commit ID and pass it to the compiler
let git_commit_id = option_env!("GIT_COMMIT_ID").map(String::from).or_else(|| {
    match Command::new("git").args(["rev-parse", "HEAD"]).output() {
        Ok(output) if output.status.success() => {
            Some(str::from_utf8(&output.stdout).unwrap().trim().to_string())
        }
        _ => {
            println!("cargo:warning=current git commit hash could not be determined");
            None
        }
    }
});

Source: lib/api/build.rs:1-50

Key Files Reference

PathPurpose
lib/api/src/lib.rsAPI module entry point
lib/api/src/rest/mod.rsREST API implementation
lib/api/src/grpc/mod.rsgRPC service definitions
lib/api/src/conversions/*.rsType conversion implementations
src/actix/api/mod.rsActix HTTP handlers
src/tonic/api/mod.rsTonic gRPC handlers
openapi/*.ytt.yamlOpenAPI specifications

Source: https://github.com/qdrant/qdrant / Human Manual

Storage Engine and Persistence

Related topics: Vector Storage

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Storage Hierarchy

Continue reading this section for the full explanation and source context.

Section Configuration

Continue reading this section for the full explanation and source context.

Section Free Space Management: Bitmask

Continue reading this section for the full explanation and source context.

Related topics: Vector Storage

Storage Engine and Persistence

Qdrant implements a multi-layered storage architecture designed for high-throughput vector search operations with strong persistence guarantees. The storage system combines a custom key-value store called Gridstore, memory-mapped file handling, and Write-Ahead Logging to ensure data durability while maintaining low-latency access patterns.

Architecture Overview

The storage engine consists of three primary layers:

LayerPurposeLocation
GridstorePrimary key-value storage for vectors, payloads, and indexeslib/gridstore/
Write-Ahead Log (WAL)Transaction logging and crash recoverylib/wal/
Memory-Mapped FilesEfficient file I/O with OS page cache integrationlib/common/common/src/mmap/
graph TD
    subgraph "Client Operations"
        A[Upsert] --> B[Write-Ahead Log]
        A --> C[Collection Config]
    end
    
    subgraph "Persistence Layer"
        B --> D[Gridstore]
        D --> E[Memory-Mapped Pages]
        E --> F[Disk Files]
    end
    
    subgraph "Recovery"
        G[Startup] --> H[Replay WAL]
        H --> D
    end
    
    subgraph "Query Path"
        I[Search Query] --> D
        D --> J[Mmap Populate]
        J --> K[Result Scoring]
    end

Gridstore: Primary Storage Engine

Gridstore is Qdrant's custom-built key-value storage engine optimized for vector data and payloads. It replaces the previous RocksDB-based storage starting from v1.16.1, offering improved performance and simplified maintenance.

Storage Hierarchy

Gridstore organizes data using a three-level hierarchy:

LevelDefault SizeDescription
Block128 bytesSmallest allocatable unit
Region8192 blocks (1 MB)Unit of free space tracking
Page32 MBOS I/O unit, file on disk

Source: lib/gridstore/src/config.rs:15-33

graph LR
    subgraph "Page (32MB)"
        subgraph "Region 0 (1MB)"
            B1[Block 0]
            B2[Block 1]
            B3[...]
            B4[Block 8191]
        end
        subgraph "Region 1 (1MB)"
            B5[Block 0]
            B6[...]
        end
        subgraph "..."
            B7[...]
        end
    end

Configuration

Gridstore accepts configuration via StorageOptions:

ParameterDefaultDescription
page_size_bytes32 MBSize of each page file
block_size_bytes128 bytesSize of individual blocks
region_size_blocks8192Number of blocks per region
compressionLZ4Compression algorithm

Source: lib/gridstore/src/config.rs:10-35

// Configuration validation from config.rs
fn try_from(options: StorageOptions) -> Result<Self, Self::Error> {
    // ... 
    
    if block_size_bytes == 0 {
        return Err("Block size must be greater than 0");
    }
    
    if region_size_blocks == 0 {
        return Err("Region size must be greater than 0");
    }
    
    if page_size_bytes == 0 {
        return Err("Page size must be greater than 0");
    }
    
    let region_size_bytes = block_size_bytes * region_size_blocks;
    
    if page_size_bytes < region_size_bytes {
        return Err("Page size must be greater than region size");
    }
}

Free Space Management: Bitmask

Gridstore tracks free blocks using bitmasks stored in memory and persisted to disk.

Source: lib/gridstore/src/bitmask/mod.rs:25-30

The bitmask system works as follows:

ComponentDescription
Page BitmaskOne bit per block (128 bytes), stored as StoredBitSlice
RegionGapsTracks which blocks are free within each region
BitmaskGapsManages all region gaps for a page
// Bitmask length calculation
let bits = config.page_size_bytes / config.block_size_bytes;  // blocks per page
let length = bits / u8::BITS as usize;  // bytes needed for bitmask

Write Operations

Gridstore implements multi-page writes when values exceed single-page capacity:

Source: lib/gridstore/src/pages.rs:85-105

pub fn write_to_pages(
    &mut self,
    pointer: ValuePointer,
    value: &[u8],
    config: &StorageConfig,
) -> Result<()> {
    let writes = Self::get_page_value_ranges(pointer, config)
        .map(|(buf_offset, page, range)| {
            let data = &value[buf_offset..buf_offset + range.length as usize];
            (page as FileIndex, range.byte_offset, data)
        });

    // Execute writes to multiple pages if needed
    S::write_multi(self.pages.as_mut_slice(), writes)?;
    Ok(())
}

Values spanning multiple pages are handled by the ValuePointer struct, which tracks:

  • page_id: Starting page
  • block_offset: Starting block within page
  • length: Total byte length of value

Non-Blocking Flushes

Gridstore implements a flusher mechanism that defers disk synchronization:

Source: lib/gridstore/src/pages.rs:107-117

pub fn flusher(&self) -> Flusher {
    let mut flushers = Vec::with_capacity(self.pages.len());
    for page in &self.pages {
        flushers.push(page.flusher());
    }
    Box::new(move || {
        for flusher in flushers {
            flusher()?;
        }
        Ok(())
    })
}

This design was improved in v1.17.1 to make flushes non-blocking, reducing tail latencies during search operations.

Live Reload Support

Gridstore readers support live reload to access newly written data without reopening:

Source: lib/gridstore/src/gridstore/tests.rs:180-220

// Writer creates new pages
for i in first_batch..(first_batch + second_batch) {
    storage.put_value(i, &payload, hw_counter_ref).unwrap();
}
storage.flusher()().unwrap();

// Reader detects new pages automatically
reader.live_reload().unwrap();
assert_eq!(reader.max_point_offset(), first_batch + second_batch);

Memory-Mapped File Handling

Qdrant uses memory-mapped files extensively for efficient I/O. The mmap/advice.rs module provides platform-specific optimizations:

Page Population

On Linux, Qdrant uses madvise(MADV_POPULATE_READ) to proactively populate the page cache before reads:

Source: lib/common/common/src/mmap/advice.rs:45-55

pub fn populate(&self) -> OperationResult<()> {
    self.storage.populate()?;
    Ok(())
}

Cache Management

The system provides explicit cache control:

MethodSystem CallPurpose
populate()madvise(MADV_POPULATE_READ)Pre-populate RAM cache
clear_cache()madvise(MADV_PAGEOUT)Evict pages from RAM

Source: lib/common/common/src/mmap/advice.rs:55-65

Fallback for Non-Linux Systems

On older Linux kernels or non-Unix platforms, a fallback strategy reads every 512th byte:

Source: lib/common/common/src/mmap/advice.rs:60-75

fn populate_simple(slice: &[u8]) {
    black_box(
        slice
            .iter()
            .copied()
            .map(Wrapping)
            .step_by(512)
            .sum::<Wrapping<u8>>(),
    );
}

Readahead Optimization

For sequential reads within MADV_RANDOM regions, explicit readahead is triggered:

Source: lib/common/common/src/mmap/advice.rs:80-95

#[cfg(unix)]
pub fn will_need_multiple_pages(region: &[u8]) {
    // madvise(MADV_WILLNEED) on region spanning multiple 4KiB pages
    // Avoids multiple page faults during sequential access
}

Collection Configuration Persistence

Collection configuration is persisted as JSON using atomic file operations:

Source: lib/collection/src/config.rs:30-55

Save and Load Operations

pub fn save(&self, path: &Path) -> CollectionResult<()> {
    let config_path = path.join(COLLECTION_CONFIG_FILE);
    let af = AtomicFile::new(&config_path, AllowOverwrite);
    let state_bytes = serde_json::to_vec(self).unwrap();
    af.write(|f| f.write_all(&state_bytes)).map_err(|err| {
        CollectionError::service_error(format!("Can't write {config_path:?}, error: {err}"))
    })?;
    Ok(())
}

pub fn load(path: &Path) -> CollectionResult<Self> {
    let config_path = path.join(COLLECTION_CONFIG_FILE);
    let mut contents = String::new();
    let mut file = File::open(config_path)?;
    file.read_to_string(&mut contents)?;
    Ok(serde_json::from_str(&contents)?)
}

Atomic Write Pattern

Configuration updates use AtomicFile to ensure:

  1. New data is written to a temporary file
  2. Temporary file is atomically renamed over the old file
  3. No partial writes are visible to readers

Write-Ahead Logging

Write-Ahead Logging (WAL) ensures durability by logging operations before applying them to the main storage.

Key Properties

PropertyImplementation
DurabilityOperations logged to WAL before acknowledgment
Crash RecoveryWAL replayed on startup to recover uncommitted operations
ConsistencyPrevents data loss during power failures

Source: README.md (WAL mentioned in features)

Critical Bug Fix (v1.16.2)

A critical WAL bug was discovered in v1.16.1 that could break consensus or cause data inconsistency. This was fixed in PR #7674.

Snapshot and Backup

Qdrant supports collection and storage snapshots for backup and migration.

Snapshot Features

FeatureDescription
Collection SnapshotsFull point and payload data export
Storage SnapshotsComplete system state including WAL
Remote StorageSnapshots can be stored in object storage

Source: lib/collection/src/collection/snapshots.rs

Performance Characteristics

Recent Improvements

VersionImprovementImpact
v1.18.1Validate vector dimensions before WAL writeFaster async upserts
v1.17.1Non-blocking Gridstore flushesReduced tail latencies
v1.16.1Gridstore migration from RocksDB3x faster batch queries

Async I/O with io_uring

Qdrant leverages Linux's io_uring interface for asynchronous disk operations, maximizing throughput on network-attached storage.

Source: README.md

The storage engine integrates with these core modules:

graph TD
    SG[Segment] --> GS[Gridstore]
    SG --> WAL[Write-Ahead Log]
    WAL --> GS
    GC[Collection] --> SG
    GC --> CFG[Collection Config]
    CFG -.->|Atomic Write| DISK[Disk Files]
    GS -.->|Mmap| DISK
ModuleFileRole
Segmentlib/segment/src/lib.rsCore indexing and search logic
Collectionlib/collection/Collection management
Gridstorelib/gridstore/Key-value persistence
WALlib/wal/Transaction logging

Known Issues

Flaky Quantized Search Tests

Several HNSW quantized search tests exhibit flakiness when scoring assertions fail:

  • hnsw_turbo_quantization_cosine_larger_bits2_test (Issue #8835)
  • hnsw_turbo_quantization_cosine_larger_test (Issue #8801)
  • hnsw_quantized_search_manhattan_test (Issue #8806)
  • hnsw_quantized_search_euclid_test (Issue #8735)
  • hnsw_turbo_quantization_dot_test (Issue #8906)
  • hnsw_turbo_quantization_manhattan_test (Issue #8834)

These tests fail with assertion failed: best_2.score >= best_1.score, indicating edge cases in quantization scoring. Related: Issue #8704 for discover precision tests.

Source: https://github.com/qdrant/qdrant / Human Manual

Data Flow and Update Pipeline

Related topics: System Architecture, Storage Engine and Persistence

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Point Operations

Continue reading this section for the full explanation and source context.

Section Value Pointer System

Continue reading this section for the full explanation and source context.

Section Generalizer Trait

Continue reading this section for the full explanation and source context.

Related topics: System Architecture, Storage Engine and Persistence

Data Flow and Update Pipeline

This document describes how data flows through Qdrant from client requests to persisted storage, covering the update pipeline architecture, data structures, and key components involved in processing point operations.

Overview

Qdrant's update pipeline handles the lifecycle of vector data from insertion through persistence. The system uses a multi-layered approach combining Write-Ahead Logging (WAL), memory-mapped storage (Gridstore), and segment-based organization to ensure data durability while maintaining high throughput.

Source: lib/segment/src/lib.rs:1-18

Core Data Structures

Point Operations

The system supports multiple point operation types through the PointOperations enum:

pub enum PointOperations {
    UpsertPoints(PointInsertOperationsInternal),
    Upsert,
    DeletePoints,
    DeletePointsByFilter,
    SetPayload,
    OverwritePayload,
    DeletePayload,
    ClearPayload,
    UpdateBatch,
}

Source: lib/shard/src/operations/point_ops.rs:1-50

Value Pointer System

The ValuePointer struct tracks where data is stored within the Gridstore pages:

pub struct ValuePointer {
    pub page_id: PageId,
    pub block_offset: BlockOffset,
    pub length: u32,
}

Source: lib/gridstore/src/tracker.rs:1-20

The PointerUpdates structure manages pointer lifecycle during updates:

pub(crate) struct PointerUpdates {
    current: Option<ValuePointer>,
    to_free: Vec<ValuePointer>,
}

Source: lib/gridstore/src/tracker.rs:30-50

Generalizer Trait

The Generalizer trait provides an interface for removing vectors and payloads from structures, creating lightweight copies for generalizing requests:

pub trait Generalizer {
    fn remove_details(&self) -> Self;
}

Source: lib/collection/src/operations/generalizer/mod.rs:1-20

Write-Ahead Logging

Qdrant implements Write-Ahead Logging to ensure data persistence even during power outages. Before any update is applied to segments, it is first recorded in the WAL.

WAL Workflow

graph TD
    A[Client Request] --> B[Parse PointOperations]
    B --> C[Write to WAL]
    C --> D[Update In-Memory Structures]
    D --> E[Acknowledge to Client]
    E --> F[Background Flush to Gridstore]
    F --> G[Mark WAL Entries as Persisted]

WAL Bug Fix (v1.16.2)

Version v1.16.2 addressed a critical WAL bug that could break consensus or cause data inconsistency. This fix ensured that WAL entries are properly synchronized with segment updates.

Source: v1.16.2 Release Notes

Gridstore Architecture

Gridstore is Qdrant's storage layer that provides memory-mapped file access with non-blocking flush operations.

Page-Based Storage

Gridstore divides storage into fixed-size pages, with each value identified by a ValuePointer. When a value spans multiple pages, the system tracks the ranges across consecutive pages.

Source: lib/gridstore/src/pages.rs:1-30

Page Writing Mechanism

The write_to_pages method handles multi-page writes:

pub fn write_to_pages(
    &mut self,
    pointer: ValuePointer,
    value: &[u8],
    config: &StorageConfig,
) -> Result<()> {
    let writes = Self::get_page_value_ranges(pointer, config)
        .map(|(buf_offset, page, range)| {
            let data = &value[buf_offset..buf_offset + range.length as usize];
            (page as FileIndex, range.byte_offset, data)
        });
    S::write_multi(self.pages.as_mut_slice(), writes)?;
    Ok(())
}

Source: lib/gridstore/src/pages.rs:50-75

Non-Blocking Flushes

Gridstore implements non-blocking flushes to reduce search tail latencies, introduced in v1.17.1. The flusher mechanism aggregates page flushers:

pub fn flusher(&self) -> Flusher {
    let mut flushers = Vec::with_capacity(self.pages.len());
    for page in &self.pages {
        flushers.push(page.flusher());
    }
    Box::new(move || {
        for flusher in flushers {
            flusher()?;
        }
        Ok(())
    })
}

Source: lib/gridstore/src/pages.rs:80-95

Update Pipeline Flow

graph TD
    A[UpsertPoints Request] --> B[Vector Dimension Validation]
    B --> C[Write to WAL]
    C --> D[Update Tracker]
    D --> E[Apply to Memory Structures]
    E --> F[Acknowledge Client]
    F --> G[Background Optimization]
    G --> H[Gridstore Flush]
    H --> I[Segment Compaction]

Async Upsert Validation (v1.18.1)

Vector dimensions are validated before WAL write for async upserts, ensuring early detection of invalid data:

Source: v1.18.1 Release Notes

Deferred Point Updates (v1.17.1)

With prevent_unoptimized=true, point updates can be deferred and efficiently applied during optimization phases:

Source: v1.17.1 Release Notes

Collection Configuration

Collection configuration is persisted to disk and includes metadata, schema, and optimization settings:

pub struct CollectionConfigInternal {
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub metadata: Option<Payload>,
}

Source: lib/collection/src/config.rs:1-30

Configuration Persistence

Configuration is saved atomically using AtomicFile:

pub fn save(&self, path: &Path) -> CollectionResult<()> {
    let config_path = path.join(COLLECTION_CONFIG_FILE);
    let af = AtomicFile::new(&config_path, AllowOverwrite);
    let state_bytes = serde_json::to_vec(self).unwrap();
    af.write(|f| f.write_all(&state_bytes)).map_err(|err| {
        CollectionError::service_error(format!("Can't write {config_path:?}, error: {err}"))
    })?;
    Ok(())
}

Source: lib/collection/src/config.rs:30-55

Segment Operations

Data Consistency Checking

The check_data_consistency method validates segment integrity:

  • Internal ID without external ID
  • External ID without internal ID
  • Internal ID without version
  • Internal ID without vector

Source: lib/segment/src/segment/segment_ops.rs:1-50

Payload Index Management

Segment operations handle payload index creation and recreation when configuration changes:

for (key, schema) in schema_config {
    match schema_applied.get(key) {
        Some(existing_schema) if existing_schema == schema => continue,
        Some(existing_schema) => log::warn!(
            "Segment has incorrect payload index for {key}, recreating it now"
        ),
        None => log::warn!(
            "Segment is missing payload index for {key}, creating it now"
        ),
    }
    let created = self.create_field_index(...)?;
}

Source: lib/segment/src/segment/segment_ops.rs:50-80

Memory Management

Memory-Mapped File Advice

Qdrant uses madvise system calls for memory management on Unix platforms:

#[cfg(unix)]
pub fn will_need_multiple_pages(region: &[u8]) {
    let Some(page_mask) = page_size().map(|s| s - 1) else { return };
    // Trigger readahead for memory-mapped regions
}

Source: lib/common/common/src/mmap/advice.rs:1-30

Page Cache Population

On older Linux systems, page cache population uses a step-by-step approach:

fn populate_simple(slice: &[u8]) {
    black_box(
        slice
            .iter()
            .copied()
            .map(Wrapping)
            .step_by(512)
            .sum::<Wrapping<u8>>(),
    );
}

Source: lib/common/common/src/mmap/advice.rs:40-55

API Layer

REST Schema Processing

The REST API layer uses unagged enums for flexible deserialization:

#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema)]
#[serde(untagged, rename_all = "snake_case")]
pub enum DocumentOptions {
    Common(HashMap<String, JsonValue>),
    Bm25(Bm25Config),
}

Source: lib/api/src/rest/schema.rs:1-25

Point Operations Conversion

The conversion layer transforms between API representations:

pub fn try_points_selector_from_grpc(
    value: api::grpc::qdrant::PointsSelector,
    shard_key_selector: Option<api::grpc::qdrant::ShardKeySelector>,
) -> Result<PointsSelector, Status>

Source: lib/collection/src/operations/conversions.rs:1-50

Known Issues and Community Topics

IssueTopicStatus
#2550Delete vectors for deleted pointsOpen
#1132Adding new vector field after collection creationOpen

Flaky Tests

Several HNSW quantization tests show intermittent failures related to score ordering:

  • hnsw_turbo_quantization_cosine_larger_bits2_test (Issue #8835)
  • hnsw_turbo_quantization_cosine_larger_test (Issue #8801)
  • hnsw_quantized_search_manhattan_test (Issue #8806)
  • hnsw_quantized_search_euclid_test (Issue #8735)

These tests fail with assertion failed: best_2.score >= best_1.score in the quantized search path.

Summary

The Qdrant update pipeline implements a robust data flow architecture:

  1. Request Reception: Point operations received via REST/gRPC API
  2. Validation: Vector dimensions and payload schema validation
  3. WAL Persistence: Atomic write to Write-Ahead Log
  4. Memory Update: In-memory structures updated immediately
  5. Client Acknowledgment: Fast response after WAL write
  6. Background Processing: Gridstore flush and segment optimization
  7. Consistency Verification: Periodic data consistency checks

This architecture ensures durability through WAL, performance through non-blocking operations, and consistency through validation and verification stages.

Source: https://github.com/qdrant/qdrant / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Runtime risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 8 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: identity.distribution | github_repo:268163609 | https://github.com/qdrant/qdrant

2. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:268163609 | https://github.com/qdrant/qdrant

3. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: packet_text.keyword_scan | github_repo:268163609 | https://github.com/qdrant/qdrant

4. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant

5. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | github_repo:268163609 | https://github.com/qdrant/qdrant

6. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | github_repo:268163609 | https://github.com/qdrant/qdrant

7. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant

8. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | github_repo:268163609 | https://github.com/qdrant/qdrant

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using qdrant with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence