# https://github.com/RyanCodrai/turbovec Project Manual

Generated at: 2026-06-23 00:21:28 UTC

## Table of Contents

- [Architecture & TurboQuant Algorithm](#page-1)
- [Index API, File Formats & Filtered Search](#page-2)
- [Python Bindings & Framework Integrations](#page-3)
- [SIMD Search Kernels, Benchmarks & Multi-Language Bindings](#page-4)

<a id='page-1'></a>

## Architecture & TurboQuant Algorithm

### Related Pages

Related topics: [Index API, File Formats & Filtered Search](#page-2), [SIMD Search Kernels, Benchmarks & Multi-Language Bindings](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)
- [turbovec-python/README.md](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/README.md)
- [turbovec/src/lib.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/lib.rs)
- [turbovec/src/io.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/io.rs)
- [turbovec-python/python/turbovec/__init__.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/__init__.py)
- [turbovec-python/python/turbovec/langchain.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/langchain.py)
- [turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)
- [turbovec-python/python/turbovec/haystack.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/haystack.py)
- [examples/downstream-smoke/src/main.rs](https://github.com/RyanCodrai/turbovec/blob/main/examples/downstream-smoke/src/main.rs)
- [CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md)
</details>

# Architecture & TurboQuant Algorithm

## Overview

turbovec is a Rust vector search library implementing Google Research's [TurboQuant algorithm](https://arxiv.org/abs/2504.19874) (ICLR 2026). It compresses float32 embeddings to 2–4 bits per dimension with near-optimal distortion while requiring **no separate training phase** — vectors are quantized online as they arrive. The README frames the headline result: "A 10 million document corpus takes 31 GB of RAM as float32. turbovec fits it in 4 GB — and searches it faster than FAISS" ([README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)).

Two public Rust types form the core surface:

- `TurboQuantIndex` — the bare compressed index (packed codes, per-vector scales, optional TQ+ per-coord calibration).
- `IdMapIndex` — a `TurboQuantIndex` plus a `slot_to_id: Vec<u64>` table that lets callers map vector positions to opaque, caller-supplied ids (e.g. document strings) and supports O(1) deletion.

Both are re-exported at the Python layer as `from turbovec import IdMapIndex, TurboQuantIndex` ([turbovec-python/python/turbovec/__init__.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/__init__.py)).

## High-Level Architecture

The repository is organized into three concentric layers:

```mermaid
flowchart TB
    A[Language bindings<br/>Python · proposed Node · proposed Swift] --> B[Framework integrations<br/>LangChain · LlamaIndex · Haystack · Agno]
    B --> C[PyO3 / Maturin wrapper]
    C --> D[Rust core: TurboQuantIndex · IdMapIndex]
    D --> E[SIMD search kernels<br/>NEON ARM · SSE/AVX x86]
    D --> F[Bit-packing layer]
    D --> G[On-disk format<br/>.tv · .tvim v3]
```

- **Rust core** lives in [`turbovec/src/`](https://github.com/RyanCodrai/turbovec/tree/main/turbovec/src). Search is SIMD-optimized for ARM (NEON) and x86, and the encode/decode path is pure Rust with no BLAS dependency. The downstream smoke test in [`examples/downstream-smoke/src/main.rs`](https://github.com/RyanCodrai/turbovec/blob/main/examples/downstream-smoke/src/main.rs) exercises `cargo add turbovec` end-to-end, asserting that the build script propagates the right link directives so users don't see a `cblas_sgemm` error.
- **Python wrapper** is compiled via Maturin/PyO3 (the `_turbovec` module) and exposed by [`turbovec-python/python/turbovec/__init__.py`](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/__init__.py).
- **Framework integrations** (`langchain.py`, `llama_index.py`, `haystack.py`) sit on top of the wrapper. Each mirrors the canonical in-tree reference store (`InMemoryVectorStore`, `SimpleVectorStore`, `InMemoryDocumentStore`) so it is a drop-in replacement ([CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md)).

Community issue #2 originally asked for "a standalone Rust crate (no Python dependency)"; the project is already structured that way — `turbovec` is a pure Rust crate and the Python layer is a thin binding built on top. Proposed extensions (#85 Node.js via `napi-rs`, #86 Swift/macOS via UniFFI) follow the same FFI-over-core pattern.

## The TurboQuant Pipeline

TurboQuant is a **data-oblivious** quantizer: every input vector is processed independently using only its own coordinates, so there is no clustering, no codebook, and no warm-up corpus. Community question #110 explicitly contrasts this with Product Quantization, which needs an empirical training pass over representative data.

The README describes the per-vector pipeline as five steps. The critical correction at step 5 comes from the RaBitQ paper (SIGMOD 2024), which bounds quantization error when the search uses cosine/IP distance on length-renormalized vectors:

1. **Random rotation.** Each vector is multiplied by a fixed Hadamard-style rotation to decorrelate coordinates — done once at ingest.
2. **Random sign flip.** A per-vector random sign is applied so the distribution is symmetric, again data-oblivious.
3. **Binning.** Each rotated coordinate is mapped to a bin index.
4. **Bit-packing.** Bin indices are packed into 2, 3, or 4 bits per dimension.
5. **Per-vector length renormalization.** Before search, both query and database vectors are renormalized so the inner-product search approximates cosine on the original vectors — this is the RaBitQ correction that prevents quantization error from dominating on long vectors.

The on-disk format is at **version 3** as of turbovec 0.6.x, which adds optional TQ+ per-coord calibration ([turbovec/src/io.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/io.rs)). The format layout is summarized below.

| File | Magic | Header | Payload | Trailing |
|------|-------|--------|---------|----------|
| `.tv` | `"TVPI"` (4 bytes) | version, bit_width, dim, n_vectors | packed codes + per-vector scales | TQ+ shift/scale (v3+) |
| `.tvim` | `"TVIM"` (4 bytes) | version + same core payload | `TurboQuantIndex` body | `slot_to_id: Vec<u64>` |

Version 2 (turbovec 0.4.4 .. 0.6.0) loads transparently with empty calibration — recall is unchanged and TQ+ gains require re-encoding from source vectors. Version 1 (≤ 0.4.3) is intentionally refused with a rebuild hint. Rust-side consistency checks live in `TurboQuantIndex::from_parts`, which `should_panic` on `scales.len()` mismatch or a non-empty TQ+ length that doesn't equal `dim` ([turbovec/src/lib.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/lib.rs)).

## Data Flow: Ingest → Search → Persist

The end-to-end flow is identical across bindings:

1. **Construct.** `IdMapIndex(dim, bit_width)` creates a lazy index when `dim` is omitted; the first `add` call materializes it ([turbovec-python/python/turbovec/langchain.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/langchain.py)).
2. **Add.** Float32 vectors are rotated, signed, binned, packed, and stored. Caller-supplied ids are mapped to internal `u64` handles via `slot_to_id`. Duplicates are rejected loudly — e.g. the LlamaIndex wrapper raises `ValueError("duplicate node_id … in the input")` rather than silently orphaning handles ([turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)).
3. **Prepare / search.** A query vector is quantized through the same pipeline and scored with SIMD-accelerated distance against the packed database.
4. **Persist.** Index bytes go to a binary `.tvim` file; framework-level metadata (text, ids, schema version) goes to a side-car JSON. The JSON side-car is never pickle, eliminating deserialization-of-code risk ([turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)).

## Trade-offs and Known Limits

- **Lossy.** Full-precision embeddings are discarded after compression. The Haystack integration docstring explicitly warns: "The quantized index discards full-precision embeddings after compression — callers that rely on `Document.embedding` after retrieval will see `None`" ([turbovec-python/python/turbovec/haystack.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/haystack.py)).
- **No sparse search.** BM25 retrieval is not implemented in the Haystack wrapper; the docstring routes callers to a separate `InMemoryBM25Retriever`.
- **64-bit only.** The build refuses to compile on non-64-bit targets ([SECURITY.md](https://github.com/RyanCodrai/turbovec/blob/main/SECURITY.md)).
- **fsspec not supported.** `from_persist_path` raises `NotImplementedError` when a non-local filesystem is passed; only local paths are accepted today.

## See Also

- Building and benchmarking: see `README.md#building` and `README.md#running-benchmarks`.
- Reference paper: [TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate](https://arxiv.org/abs/2504.19874).
- RaBitQ background: [arXiv:2405.12497](https://arxiv.org/abs/2405.12497).
- Community proposals tracked upstream: #85 (Node.js), #86 (Swift/macOS), #2 (Rust-only usage).

---

<a id='page-2'></a>

## Index API, File Formats & Filtered Search

### Related Pages

Related topics: [Architecture & TurboQuant Algorithm](#page-1), [Python Bindings & Framework Integrations](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [turbovec/src/lib.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/lib.rs)
- [turbovec/src/io.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/io.rs)
- [turbovec-python/python/turbovec/langchain.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/langchain.py)
- [turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)
- [turbovec-python/python/turbovec/haystack.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/haystack.py)
- [turbovec-python/python/turbovec/agno.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/agno.py)
- [turbovec-python/build.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/build.rs)
- [CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)
</details>

# Index API, File Formats & Filtered Search

## Overview

turbovec exposes a small public surface for working with its quantized vector index. The Rust core provides a positional `TurboQuantIndex` and a string-keyed `IdMapIndex`; the Python wrappers layer framework integrations (LangChain, LlamaIndex, Haystack, Agno) on top. A binary `.tvim` format persists the index, a sibling JSON side-car persists the original text and metadata, and a versioned schema gates forward-compatibility of those JSON payloads.

Source: [turbovec/src/lib.rs:1-100]()
Source: [turbovec/src/io.rs:1-50]()

## Index Types and Public API

The Rust crate exports two complementary index types. `TurboQuantIndex` is the positional index, organized by numeric handle. `IdMapIndex` wraps it and adds a string-id ↔ u64-handle mapping so that application-level string identifiers can be used as primary keys. Both support `add`, `remove`, `search`, `write`, and `load`.

The index is constructed lazily: the dimension is inferred from the first batch of vectors and is not required at construction time. The underlying engine stores `dim` as an `Option<usize>`, and the `from_parts` constructor asserts structural invariants (`packed_codes` length matches `n_vectors * dim * bit_width / 8`, and `tqplus_shift` and `tqplus_scale` are the same length) so future refactors stay safe by construction.

Source: [turbovec/src/lib.rs:1-100]()
Source: [turbovec-python/python/turbovec/langchain.py:1-100]()

## File Format Layout and Versioning

turbovec uses two on-disk artifacts. The binary index is written as `.tvim` (in integration wrappers) or `.tv` (the standalone positional format). The header begins with a 4-byte magic `TV_MAGIC` followed by a single version byte. The loader transparently handles v2 (no TQ+ calibration) and v3 (with TQ+ calibration) files; a v1 file is detected by the absence of a magic header and produces a targeted error rather than the generic "wrong magic" message.

The integration layers also write a JSON side-car:

| Wrapper | Index file | Side-car file | Schema version |
|---|---|---|---|
| LangChain | `index.tvim` | `docstore.json` | 1 (Source: [turbovec-python/python/turbovec/langchain.py]()) |
| LlamaIndex | `*.tvim` | `*.nodes.json` | `_NODES_SCHEMA_VERSION` (Source: [turbovec-python/python/turbovec/llama_index.py]()) |
| Haystack | `index.tvim` | `docstore.json` | 2, compat (1, 2) (Source: [turbovec-python/python/turbovec/haystack.py]()) |
| Agno | `_index.tvim` | `_docs.json` | internal (Source: [turbovec-python/python/turbovec/agno.py]()) |

Schema versions gate forward compatibility: loaders raise `ValueError` on unknown versions, and the JSON side-cars use lists of `[handle, data]` pairs to preserve type fidelity because JSON object keys must be strings. The side-cars are plain JSON rather than pickle, eliminating deserialization-of-code risk.

Source: [turbovec/src/io.rs:1-100]()
Source: [turbovec-python/python/turbovec/haystack.py:1-100]()

## Persistence and Lazy Construction

The same persistence contract is used across all four integration wrappers. Calling `persist` / `save_to_disk` / `dump` writes a binary `index.tvim` and a JSON side-car to the target folder. `IdMapIndex.load` handles the `dim=0` (lazy-uncommitted) sentinel internally and reconstructs the index in the right state — meaning a never-written store round-trips cleanly.

The `from_texts` classmethod mirrors `InMemoryVectorStore`'s ergonomics: a fresh store has no dimension, the first `add_texts` call fixes the dimension from the embedding batch, and `bit_width` defaults to 4. The Python extension build is configured to emit the platform-correct linker arguments (`-undefined dynamic_lookup` on macOS) so a bare `cargo build` works, not just `maturin develop` — issue #92.

Source: [turbovec-python/python/turbovec/langchain.py:1-150]()
Source: [turbovec-python/build.rs:1-20]()

## Filtered Search and Metadata Handling

Filtered search is implemented per-framework on top of the framework's reference contract. The wrappers structurally compare against the canonical in-tree reference store (LangChain → `InMemoryVectorStore`, LlamaIndex → `SimpleVectorStore`, Haystack → `InMemoryDocumentStore`) — the bar for a drop-in replacement is matching the reference's surface and idioms.

For LlamaIndex, the wrapper keeps `ref_doc_id` and `metadata` at the top level for fast filter / doc-id lookup, and stores the framework's canonical `node_dict` so that `metadata_dict_to_node` can reconstruct a full `BaseNode` with `PREVIOUS` / `NEXT` / `PARENT` / `CHILD` relationships, `excluded_*_metadata_keys`, template fields, and `start/end_char_idx` preserved on retrieval. The previous narrow `{text, metadata, ref_doc_id}` schema silently lost all of these.

For Haystack, BM25 (sparse-text) retrieval is not implemented — the docs explicitly direct users to wire an `InMemoryBM25Retriever` against a separate store for keyword search alongside vector search. The quantized index discards full-precision embeddings after compression, so callers that rely on `Document.embedding` after retrieval will see `None`.

```mermaid
flowchart LR
    A[Raw vectors] --> B[TurboQuantIndex<br/>2-4 bit]
    B --> C[IdMapIndex<br/>str-id ↔ u64]
    C --> D[index.tvim<br/>binary, magic+version]
    C --> E[docstore.json<br/>side-car, schema-versioned]
    D --> F[Framework wrappers<br/>LangChain / LlamaIndex /<br/>Haystack / Agno]
    E --> F
    F --> G[Filtered search<br/>+ metadata filter]
```

Source: [CONTRIBUTING.md:1-50]()
Source: [turbovec-python/python/turbovec/llama_index.py:1-100]()
Source: [turbovec-python/python/turbovec/haystack.py:1-80]()

## Community Context

Several community proposals extend the public API surface: a Swift/macOS binding via UniFFI (issue #86) and a Node.js binding via napi-rs (issue #85, #111) both target `TurboQuantIndex` and `IdMapIndex` directly. A long-running request for a standalone Rust crate with no Python dependency (issue #2) maps onto the public types in `turbovec/src/lib.rs`. The contributing workflow is invitation-only for PRs, with good issue framing valued over implementation pull requests.

Source: [CONTRIBUTING.md:1-50]()
Source: [README.md:1-50]()

## See Also

- Building and benchmarks: see the README sections referenced in [README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)
- Framework integration contracts: [turbovec-python/python/turbovec/langchain.py](), [turbovec-python/python/turbovec/llama_index.py](), [turbovec-python/python/turbovec/haystack.py](), [turbovec-python/python/turbovec/agno.py]()
- Core Rust types: [turbovec/src/lib.rs](), [turbovec/src/io.rs]()
- Algorithm background: [TurboQuant paper (ICLR 2026)](https://arxiv.org/abs/2504.19874) and [RaBitQ (SIGMOD 2024)](https://arxiv.org/abs/2405.12497) referenced in [README.md]()

---

<a id='page-3'></a>

## Python Bindings & Framework Integrations

### Related Pages

Related topics: [Index API, File Formats & Filtered Search](#page-2), [SIMD Search Kernels, Benchmarks & Multi-Language Bindings](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [turbovec-python/src/lib.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/src/lib.rs)
- [turbovec-python/build.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/build.rs)
- [turbovec-python/python/turbovec/langchain.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/langchain.py)
- [turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)
- [turbovec-python/python/turbovec/haystack.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/haystack.py)
- [turbovec-python/python/turbovec/agno.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/agno.py)
- [CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)
</details>

# Python Bindings & Framework Integrations

## Overview and Scope

turbovec ships its quantization core as a Rust crate that is exposed to Python through PyO3 and packaged with Maturin. The Python package is the primary distribution surface, and on top of the raw `TurboQuantIndex` / `IdMapIndex` bindings the project provides four framework integration modules: LangChain, LlamaIndex, Haystack, and Agno. Each integration is implemented as a thin Python wrapper that mirrors a canonical in-tree reference store from the target framework, so the quantized backend is a drop-in replacement rather than a competing API. Source: [CONTRIBUTING.md]() ("The wrappers should match the reference's surface and idioms — that's the bar for a drop-in replacement").

Community engagement has centred on the boundaries of this Python-first packaging. Issue #2 requests a standalone Rust crate decoupled from Python, while issues #85, #86, and #111 propose adding Node.js, Swift, and TypeScript bindings alongside the existing Python surface. These proposals inform the integration guidelines: bindings should be thin FFI shims over the same Rust index types, not re-implementations.

## Core Python Bindings (Rust ↔ Python)

The Rust core is exposed via `turbovec-python/src/lib.rs`, which defines the PyO3 classes `TurboQuantIndex` and `IdMapIndex`. `IdMapIndex` adds the O(1) deletion handle mapping that framework integrations require. The `search` method returns `(scores, ids)` as `(nq, effective_k)` arrays, with optional `allowlist` filtering. Source: [turbovec-python/src/lib.rs]() (the `search` method's docstring and signature).

`build.rs` calls `pyo3_build_config::add_extension_module_link_args()` so a bare `cargo build` on macOS produces a loadable extension module. Without it, the build fails with "symbol(s) not found for architecture arm64" (issue #92). Source: [turbovec-python/build.rs]().

The Python module surface is installed as `pip install turbovec[langchain]` (or `…[llama-index]`, `…[haystack]`, `…[agno]`) to pull the framework dependencies. Source: [turbovec-python/python/turbovec/langchain.py]() (the module docstring).

## Framework Integration Architecture

The four integrations follow an identical pattern: a Python class that holds a reference to a `TurboQuantIndex` or `IdMapIndex`, plus a sidecar dictionary keyed by a u64 handle that stores the original text, metadata, and any framework-specific payload. The index owns the vectors; the sidecar owns the document objects.

| Framework | File | Reference Store Mirrored |
|-----------|------|--------------------------|
| LangChain | `langchain.py` | `InMemoryVectorStore` |
| LlamaIndex | `llama_index.py` | `SimpleVectorStore` |
| Haystack | `haystack.py` | `InMemoryDocumentStore` |
| Agno | `agno.py` | `LanceDb` |

Key behavioural notes, each backed by source:

- **LangChain:** `TurboQuantVectorStore` is a `langchain_core.vectorstores.VectorStore` subclass. `max_marginal_relevance_search` raises `NotImplementedError` with a specific message because the quantized index discards full-precision vectors, making pairwise diversity computation impossible. Source: [turbovec-python/python/turbovec/langchain.py]() (`_MMR_MSG` constant).
- **LlamaIndex:** Nodes are stored with top-level `metadata`, `ref_doc_id`, and a `node_dict` (the canonical `node_to_metadata_dict` representation) so relationships (`PREVIOUS`/`NEXT`/`PARENT`/`CHILD`), `excluded_*_metadata_keys`, and `start/end_char_idx` survive a round-trip. Source: [turbovec-python/python/turbovec/llama_index.py]() (the `_nodes[nid] = {...}` assignment).
- **Haystack:** `get_metadata_fields_info` and `get_metadata_field_min_max` are implemented by scanning the sidecar, because the quantized index cannot answer them from vectors alone. Source: [turbovec-python/python/turbovec/haystack.py]().
- **Agno:** `upsert` captures the previous generation's handles, runs `insert`, and only then drops the old vectors, so a failed insert never destroys the data being replaced (issue #89). Source: [turbovec-python/python/turbovec/agno.py]() (the `upsert` method).

## Persistence, Duplicates, and Schema Versioning

Every integration persists through a plain JSON sidecar next to the binary `index.tvim`. There is no pickle, so `load`/`load_from_disk`/`from_persist_path` is safe against code-deserialization attacks. Each schema carries an explicit version number that the loader refuses if it does not match (e.g. `_DOCSTORE_SCHEMA_VERSION = 1`, `_NODES_SCHEMA_COMPAT`). Source: [turbovec-python/python/turbovec/langchain.py]() (loader block); [turbovec-python/python/turbovec/llama_index.py]() (same pattern); [turbovec-python/python/turbovec/haystack.py]() (`_DOCSTORE_SCHEMA_COMPAT`).

Duplicate handling is centralised in a `DuplicatePolicy` enum (e.g. `FAIL`, `NONE`, `OVERWRITE`) consumed by `from_texts` and `write_documents`. Source: [turbovec-python/python/turbovec/haystack.py]() (the `policy` parameter on `write_documents`).

## Community-Proposed Additional Bindings

Three open proposals extend the binding strategy beyond Python:

- **Node.js / TypeScript** (issues #85, #111): `napi-rs` addon over `TurboQuantIndex` and `IdMapIndex`, mirroring the Python surface.
- **Swift / macOS** (issue #86): UniFFI-based wrapper, motivated by limitations in Swift's `simd` module for the required SIMD kernels.
- **Standalone Rust** (issue #2): decouple the SIMD search and bit-packing code from the PyO3 build so it can be consumed by Rust applications without the Python toolchain.

Until any of these lands, the Python module remains the only supported language binding.

## See Also

- Building and benchmarking instructions: [README.md]()
- Contribution workflow and integration bar: [CONTRIBUTING.md]()
- TurboQuant algorithm and RaBitQ correction: paper references in [README.md]()

---

<a id='page-4'></a>

## SIMD Search Kernels, Benchmarks & Multi-Language Bindings

### Related Pages

Related topics: [Architecture & TurboQuant Algorithm](#page-1), [Python Bindings & Framework Integrations](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [turbovec/src/search.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/search.rs)
- [turbovec/src/pack.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/src/pack.rs)
- [turbovec/examples/kernel_xtest.rs](https://github.com/RyanCodrai/turbovec/blob/main/turbovec/examples/kernel_xtest.rs)
- [examples/downstream-smoke/src/main.rs](https://github.com/RyanCodrai/turbovec/blob/main/examples/downstream-smoke/src/main.rs)
- [turbovec-python/python/turbovec/langchain.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/langchain.py)
- [turbovec-python/python/turbovec/llama_index.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/llama_index.py)
- [turbovec-python/python/turbovec/haystack.py](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/python/turbovec/haystack.py)
- [CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/RyanCodrai/turbovec/blob/main/README.md)
- [turbovec-python/README.md](https://github.com/RyanCodrai/turbovec/blob/main/turbovec-python/README.md)
</details>

# SIMD Search Kernels, Benchmarks & Multi-Language Bindings

## Overview

turbovec's competitive performance is delivered by a **SIMD-accelerated search pipeline** that runs the same algorithm on every supported architecture, paired with an architecture-specific layout chosen at repack time. Around the Rust core, a thin set of **framework integrations** (LangChain, LlamaIndex, Haystack) and proposed **FFI bindings** (Node.js via napi-rs, Swift/macOS via UniFFI) expose the index to ecosystems beyond Python.

This page covers three coordinated concerns:

1. The SIMD kernels inside `turbovec/src/search.rs` and the layout work in `turbovec/src/pack.rs`.
2. The cross-architecture parity test and the downstream link smoke-test that guard the public API.
3. The current and proposed multi-language bindings, framed by community proposals #2, #85, #86, and #111.

## SIMD Search Kernels

The search module scores queries against quantized database vectors using **nibble-split lookup tables** dispatched to an architecture-specific kernel. According to `turbovec/src/search.rs`, the dispatch covers:

- **NEON on `aarch64`** — sequential code layout, with `score_4bit_block_neon` operating on nibble-masked bytes.
- **AVX-512BW on `x86_64`** when available, with an **AVX2 fallback** that uses FAISS-style `perm0`-interleaved layout.
- A **scalar fallback** for any other target.

Kernel selection on x86 is performed at runtime via `is_x86_feature_detected!`. A test-only `FORCE_SCALAR_FALLBACK` flag (compiled under `cfg(test)`) forces the scalar path so that `score_query_into_heap` can be exercised even on hardware that would otherwise always pick a SIMD kernel.

A block-level early-exit optimisation is exposed for hybrid retrieval: the atomic counter `BLOCKS_SKIPPED_BY_MASK` (`turbovec/src/search.rs`) is incremented whenever a 32-vector block is short-circuited because no allowed slot falls within it. The companion functions `blocks_skipped_by_mask()` and `reset_blocks_skipped_by_mask()` are described as test-isolation hooks that production callers can also sample for telemetry.

### Layout: Why ARM and x86 Use Different Repacks

`turbovec/src/pack.rs` repacks bit-plane codes into a layout chosen by the architecture's SIMD needs:

- **x86**: FAISS-style `perm0`-interleaved for AVX2 cross-lane compatibility, with the fixed permutation `[0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15]`.
- **ARM**: Sequential layout, suitable for NEON's natural byte ordering.

A scalar-fallback correctness fix is documented in `pack.rs` via `deinterleave_x86_code_byte`. The function exists because the scalar path — taken on pre-AVX2 x86 or VMs without AVX2 — decodes one sequential byte per vector. Without de-interleaving, it would read wrong bytes and return **silently-wrong top-k** results. The function reconstructs the high/low nibble of a code byte from the interleaved planes so the scalar kernel matches the SIMD kernel bit-for-bit (this is the regression cited in issue #106).

## Benchmarking & Cross-Architecture Validation

The repository ships two complementary validation harnesses for the kernels and the public API.

**Cross-arch kernel parity (`turbovec/examples/kernel_xtest.rs`).** A deterministic smoke test that builds a `TurboQuantIndex` on a fixture parameterised by `<dim> <bits> <seed>`, then writes its results. The fixture is **fully deterministic from `seed`** so that ARM and x86 see exactly the same input — no dataset shuffling, no BLAS-backend noise — and the rotation is computed deterministically by turbovec from a fixed `ROTATION_SEED`. The example is run on each architecture and the output files are diffed: identical means the kernels agree, divergence means there's work to do.

**Downstream link smoke test (`examples/downstream-smoke/src/main.rs`).** A standalone `cargo` crate that does `use turbovec::TurboQuantIndex;` and exercises the full public surface (construct, add, prepare, search, write, load). The example exists to verify the link-directive propagation from `turbovec`'s `build.rs` — a real failure mode where downstream `cargo add turbovec` users hit a `cblas_sgemm` symbol error at link time. If this binary links and runs end-to-end, downstream users will too.

The full benchmark suite (referenced in both `README.md` and `turbovec-python/README.md`) lives under `benchmarks/suite/` and is split by category: `speed_*arm*`, `speed_*x86*`, `recall_*`, and `compression.py`. Each script is self-contained and writes JSON to `benchmarks/results/`; `python3 benchmarks/create_diagrams.py` regenerates the published charts. Fixture datasets (OpenAI DBpedia d=1536, d=3072) are pulled with `python3 benchmarks/download_data.py`.

## Multi-Language Bindings

### Python (current)

`turbovec-python` exposes `TurboQuantIndex` and `IdMapIndex` through PyO3/Maturin. Framework adapters in the same wheel mirror the **canonical in-tree reference store** for each ecosystem — this is the bar called out in `CONTRIBUTING.md` for "drop-in replacement" wrappers:

| File | Reference mirrored |
| --- | --- |
| `turbovec-python/python/turbovec/langchain.py` | `langchain_core.vectorstores.InMemoryVectorStore` |
| `turbovec-python/python/turbovec/llama_index.py` | LlamaIndex `BaseNode` round-trip via `node_to_metadata_dict` |
| `turbovec-python/python/turbovec/haystack.py` | Haystack 2.x `InMemoryDocumentStore` |

Each wrapper records deliberate deviations from the reference, for example the LangChain store's `add_documents` override that honours partial ids (the base class default drops the entire `ids` array on a single `None`).

### Standalone Rust (proposed, issue #2)

Issue #2 requests the Rust core be usable **without** the Python dependency. The current `cargo add turbovec` path is already exercised by `examples/downstream-smoke/src/main.rs`, which is the public-API smoke test for exactly this scenario.

### Node.js / TypeScript (proposed, issues #85 and #111)

Two community proposals (#85 and #111) describe a `turbovec-node` addon built on napi-rs, providing a thin wrapper over `TurboQuantIndex` and `IdMapIndex`. Both threads note that the repository is currently **invitation-only for PRs** (`CONTRIBUTING.md`) and ask for collaborator access to land the binding upstream.

### Swift / macOS (proposed, issue #86)

Issue #86 proposes `turbovec-swift` via Mozilla's UniFFI. The motivation given in the issue is that Swift's `simd` module does not provide the same primitive set the kernels need, so a UniFFI wrapper is preferred over a native Swift port.

## Community & Contribution Workflow

`CONTRIBUTING.md` describes an **issue-first, invitation-only** workflow: open or comment on an issue, discuss the design, then explicitly request contributor access before opening a PR. Only the maintainer merges to `main`. Integration contributions are expected to "structurally compare against the canonical in-tree reference store" — the same bar reflected in `langchain.py`, `llama_index.py`, and `haystack.py`. `Co-Authored-By:` trailers (e.g. for Claude-assisted commits) are explicitly fine to leave in place.

## See Also

- [Repository README](https://github.com/RyanCodrai/turbovec/blob/main/README.md) — high-level pitch, paper references, benchmark recipes
- [TurboQuant paper (arXiv 2504.19874)](https://arxiv.org/abs/2504.19874) — the algorithm implemented
- [RaBitQ paper (arXiv 2405.12497)](https://arxiv.org/abs/2405.12497) — per-vector length-renormalisation correction adapted in step 5
- [CONTRIBUTING.md](https://github.com/RyanCodrai/turbovec/blob/main/CONTRIBUTING.md) — workflow for proposing bindings #85, #86, #111

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: RyanCodrai/turbovec

Summary: Found 20 structured pitfall item(s), including 4 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/85

## 2. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/95

## 3. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/65

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/70

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/86

## 6. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/107

## 7. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/106

## 8. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/111

## 9. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/RyanCodrai/turbovec

## 10. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/104

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/101

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/94

## 13. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/RyanCodrai/turbovec

## 14. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/RyanCodrai/turbovec

## 15. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/RyanCodrai/turbovec

## 16. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/105

## 17. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/110

## 18. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/RyanCodrai/turbovec/issues/92

## 19. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/RyanCodrai/turbovec

## 20. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/RyanCodrai/turbovec

<!-- canonical_name: RyanCodrai/turbovec; human_manual_source: deepwiki_human_wiki -->