# https://github.com/jdagdelen/hyperDB Project Manual

Generated at: 2026-07-02 14:25:51 UTC

## Table of Contents

- [Introduction, Installation and Quick Start](#page-1)
- [HyperDB Core Architecture and API](#page-2)
- [Embeddings, Vector Math and Model Compatibility](#page-3)
- [Persistence, Deployment, Extensibility and Community Roadmap](#page-4)

<a id='page-1'></a>

## Introduction, Installation and Quick Start

### Related Pages

Related topics: [HyperDB Core Architecture and API](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py)
- [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt)
- [demo/demo.py](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py)
- [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl)
- [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py)
</details>

# Introduction, Installation and Quick Start

## Overview

HyperDB is a local vector database purpose-built for use with Large Language Model (LLM) agents. The project is published on PyPI under the distribution name `hyperdb-python` and exposes its public API through the top-level `hyperdb` package, with `HyperDB` re-exported from [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py) for the canonical import path used throughout the documentation.

According to the [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md), HyperDB is positioned around three headline advantages: a simple interface compatible with all LLM agents, a highly optimized C++ backend vector store with hardware-accelerated operations via Intel MKL BLAS, and a document model that natively supports `_ids_` and `_metadata_`. The README itself uses the tongue-in-cheek tagline "Not entirely a joke," signaling that while the framing is light, the technical claims are genuine.

The latest tagged release, **v0.1.2**, incorporates a community-contributed improved save format from `@parasj`, marking the first formally tagged version of the package and a stabilization point for users evaluating it.

## Installation

### Standard install

HyperDB is distributed on PyPI and can be installed with `pip`:

```bash
pip install hyperdb-python
```

The package metadata, including version, author, and core dependency declarations, lives in [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py), while the pinned runtime dependency list is enumerated in [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt). Reviewers and integrators should consult these files to confirm the exact dependency versions in use.

### Optional: local embedding support

If you plan to embed documents using a local Hugging Face model instead of a remote embedding API, install the optional dependency as documented in the [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md#optional-dependency-for-locally-embedding):

```bash
pip install sentence-transformers
```

This dependency is intentionally declared as optional so that users who only rely on remote embedders (for example, an OpenAI-based pipeline) do not pay the install cost of large PyTorch and transformer libraries. This separation is relevant to community discussion #1, which asks about LLAMA embeddings support; LLAMA embeddings are not normalized to unit length, which is a known friction point when interoperating with embeddings pipelines that assume cosine similarity semantics.

## Quick Start

The fastest way to understand HyperDB end-to-end is to walk through the canonical example. The [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) and the accompanying [demo/demo.py](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py) script demonstrate a complete load → index → save → query cycle against the 151 original Pokémon records in [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl).

```python
import json
from hyperdb import HyperDB

# 1. Load documents from the JSONL file
documents = []
with open("demo/pokemon.jsonl", "r") as f:
    for line in f:
        documents.append(json.loads(line))

# 2. Instantiate HyperDB, indexing on a nested key
db = HyperDB(documents, key="info.description")

# 3. Persist the index to disk
db.save("demo/pokemon_hyperdb.pickle.gz")

# 4. Reload it later (e.g., in another process)
db.load("demo/pokemon_hyperdb.pickle.gz")

# 5. Query with natural language
results = db.query("Likes to sleep.", top_k=5)
```

### End-to-end workflow

The flow above corresponds to the following sequence of operations, which is useful to keep in mind when reasoning about deployment topologies:

```mermaid
flowchart LR
    A[JSONL documents<br/>demo/pokemon.jsonl] --> B[HyperDB documents<br/>list of dicts]
    B --> C[HyperDB instance<br/>HyperDB documents, key]
    C --> D[Persist<br/>db.save pickle.gz]
    D --> E[Reload<br/>db.load]
    E --> F[Query<br/>db.query text, top_k]
    F --> G[Top-k results<br/>with id and metadata]
```

### Key parameters and document shape

| Parameter / concept | Where used | Purpose |
|---|---|---|
| `documents` | `HyperDB(documents, key=...)` | List of dict-like objects to index |
| `key` | `HyperDB(documents, key=...)` | Dotted path to the field whose text is embedded (e.g., `"info.description"`) |
| `top_k` | `db.query(text, top_k=5)` | Number of nearest neighbors to return |
| `_id_`, `_metadata_` | document dicts | Reserved keys supported natively by the indexer |

Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) and [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py).

### Expected query output

When queried with `"Likes to sleep."` and `top_k=5` against the Pokémon corpus, the README reports results such as Snorlax, Drowzee, Pinsir, Abra, and Venonat, with each record including the Pokédex ID, HP, Type, Weakness, and Description fields from the underlying JSONL record. This demonstrates that the returned payloads preserve the full document context rather than only the matched text.

## Common Failure Modes and Community Notes

Several recurring questions in the project's issue tracker are relevant when getting started:

- **Embedding normalization (Issue #1):** Users integrating LLAMA embeddings should note that these are not unit-normalized; if your pipeline assumes normalized vectors, results will be biased. The README does not claim LLAMA support out of the box, so plan to validate cosine vs. dot-product semantics for your use case.
- **Deployment topology (Issue #2):** Community members have asked about Kubernetes support for cloud deployments. HyperDB is currently a local library that serializes to a single `pickle.gz` file via `db.save`; there is no first-class distributed or remote-server mode in the released code.
- **Enterprise features (Issue #13):** Cloud deployment, access management, and data compliance are not part of the current surface area; they would need to be built atop the local `HyperDB` primitive.
- **Naming (Issue #11):** A rebrand to "HyperBS" has been suggested. The package on PyPI remains `hyperdb-python` and the import remains `from hyperdb import HyperDB`.

## Next Steps

After completing this quick start, the natural next pages in the wiki cover:

- The `HyperDB` class API in detail (construction, save/load, query internals).
- Embedding configuration and how to plug in custom embedding backends.
- Persistence format details (the `pickle.gz` format introduced in v0.1.2).

## See Also

- [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py)
- [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt)
- [demo/demo.py](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py)
- [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl)
- [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py)

---

<a id='page-2'></a>

## HyperDB Core Architecture and API

### Related Pages

Related topics: [Introduction, Installation and Quick Start](#page-1), [Embeddings, Vector Math and Model Compatibility](#page-3), [Persistence, Deployment, Extensibility and Community Roadmap](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [hyperdb/hyperdb.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/hyperdb.py)
- [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py)
- [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py)
- [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl)
- [demo/demo.py](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py)
- [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt)
</details>

# HyperDB Core Architecture and API

## Overview

HyperDB is a hyper-fast, local vector database packaged for use with LLM agents. The package exposes a thin Python façade over a C++-accelerated similarity backend (Intel MKL BLAS), trading distributed-server complexity for a single in-process `HyperDB` object that can index documents, persist itself to disk, and answer nearest-neighbor queries in a single call. Source: [hyperdb/hyperdb.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/hyperdb.py) defines the `HyperDB` class that is re-exported through the package's public surface. Source: [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py) imports `HyperDB` so that consumers write `from hyperdb import HyperDB`, matching the canonical snippet shown in Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md).

The library's design priorities are visible directly in the top-level docstring and in the list of advertised advantages:

| Priority | How it shows up in the source |
| --- | --- |
| Single-file deployment | `HyperDB.save()` writes a single `.pickle.gz` artifact; `HyperDB.load()` reverses it. Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) |
| Hardware-accelerated math | README cites "MKL BLAS" as the optimized backend. |
| Agent-friendly ergonomics | One class, three operations (`save`, `load`, `query`). |

## High-Level Architecture

At runtime, a HyperDB instance is composed of three cooperating pieces: the document store, the embedding matrix, and the configuration metadata. The embedding matrix is computed lazily on first `query()` and (for v0.1.2) cached inside the pickle. Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) describes the latest v0.1.2 release as incorporating an "improved save format", indicating the on-disk layout has evolved.

```mermaid
flowchart LR
    A[JSONL Documents] --> B[HyperDB.__init__]
    B --> C[Document Store\nlist of dicts]
    B --> D[Key Extractor\nkey='info.description']
    D --> E[Embedding Backend\nOpenAI or sentence-transformers]
    E --> F[Embeddings Matrix]
    F --> G[MKL BLAS Nearest Neighbor]
    G --> H[Top-k Results]
    H --> I[Return original documents]
    J[db.save] --> K[.pickle.gz file]
    K --> L[db.load]
    L --> B
```

The flow above matches the constructor signature `HyperDB(documents, key="info.description")` documented in Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md). `documents` is the raw corpus (each row in the demo file is a JSON object — see Source: [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl)), and the `key` parameter points at the field whose string value should be embedded.

## Public API

The package deliberately keeps its public surface tiny. Re-exported through Source: [hyperdb/__init__.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/__init__.py):

| Method / Attribute | Purpose | Source |
| --- | --- | --- |
| `HyperDB(documents, key, ...)` | Build an index from a list of dicts and the field name to embed. | [hyperdb/hyperdb.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/hyperdb.py) |
| `db.query(text, top_k=5)` | Return the `top_k` most similar source documents to `text`. | [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) |
| `db.save(path)` | Persist the index to a gzip-compressed pickle. | [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) |
| `db.load(path)` | Restore an index from a previously saved pickle. | [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) |

Source: [demo/demo.py](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py) is the canonical driver script that wires these four calls together end to end and is the closest reference implementation a new user can copy.

## Embedding Backends and Persistence

HyperDB ships with optional embedding support and persistence:

- Installation via `pip install hyperdb-python`, declared in Source: [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py).
- Optional Hugging Face / sentence-transformers support via `pip install sentence-transformers`, a recommended add-on documented in Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md).
- Underlying numerical dependencies (e.g., `numpy`, MKL bindings) are declared in Source: [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt).

Because the v0.1.2 release "incorporates an improved save format" (Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)), pickles generated by older betas may not be forward-compatible. Treat `db.save()` output as opaque to other libraries and version-pinned per project.

## Known Limitations Reflected in Community Discussions

Several limitations are surfaced directly by users and shape how this API can be deployed today:

- **No server process.** Because HyperDB is in-process and pickle-on-disk, requests for "cloud deployment, access management, data compliance" (community issue #13) are out of scope for the core API; the same instance must be loaded inside every consumer process. Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) shows the model is a local artifact.
- **No Kubernetes / distributed mode.** Issue #2 asks about K8s support; the current API does not include a server entry point or a shared-state layer, so the suggested path is to bake the `.pickle.gz` into a container image.
- **Embedding normalization is left to the caller.** Issue #1 reports that OpenAI embeddings are unit-normalized while LLaMA-style embeddings are not, which means cosine-distance users may need to pre-normalize vectors depending on the backend they select behind the `key` extractor.

## See Also

- [Project README](https://github.com/jdagdelen/hyperDB/blob/main/README.md) — installation, quickstart, and benchmarks.
- [Demo driver](https://github.com/jdagdelen/hyperDB/blob/main/demo/demo.py) — end-to-end Pokémon example.
- [Package configuration](https://github.com/jdagdelen/hyperDB/blob/main/setup.py) — declared dependencies and entry points.

---

<a id='page-3'></a>

## Embeddings, Vector Math and Model Compatibility

### Related Pages

Related topics: [HyperDB Core Architecture and API](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- [hyperdb/hyperdb.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/hyperdb.py)
- [hyperdb/galaxy_brain_math_shit.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/galaxy_brain_math_shit.py)
- [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl)
- [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py)
</details>

# Embeddings, Vector Math and Model Compatibility

## Overview

HyperDB is positioned as a "hyper-fast local vector database for use with LLM Agents" with a "highly optimized C++ backend vector store with HW accelerated operations via MKL BLAS" Source: [README.md:11](). The system pairs a lightweight Python interface with a compiled math kernel, enabling similarity search over document embeddings without spinning up a separate vector server.

The page focuses on three intertwined concerns:

1. How HyperDB represents documents as embedding vectors.
2. The vector math primitives it relies on for similarity scoring.
3. Which embedding models are compatible out of the box and how users can plug in alternatives.

## Vector Math Backend

HyperDB advertises a native C++ binary plus the Intel MKL BLAS library for hardware-accelerated linear algebra Source: [README.md:11](). In Python, the math surface is exposed through a module informally named in the repository tree as `galaxy_brain_math_shit.py`, which wraps the compiled primitives that perform the dot products, norms, and nearest-neighbor scans used by `HyperDB.query` Source: [hyperdb/galaxy_brain_math_shit.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/galaxy_brain_math_shit.py).

The numerical contract is straightforward: cosine similarity between two embedding vectors is the standard normalized dot product. Because MKL's `cblas_sdot`-family routines are SIMD-vectorized, a similarity sweep across thousands of vectors fits comfortably inside millisecond budgets on commodity CPUs.

```mermaid
flowchart LR
    A[Documents] --> B[Embedder]
    B --> C[Vector Store]
    C --> D[C++/MKL BLAS]
    D --> E[Ranked Results]
    Q[Query Text] --> B
```

A practical consequence is that HyperDB is largely agnostic to the embedding's exact dimensionality, as long as all stored vectors share the same length — the dot-product kernel has no architectural preference between 384, 768, 1536, or 4096 dimensions.

## Embedding Pipeline

The `HyperDB` class accepts a list of dictionaries and a `key` that selects which field to embed Source: [README.md:33](). Internally the constructor instantiates an embedding helper, iterates the documents, computes one vector per item, and stores the result alongside the original record. Subsequent queries (`db.query("…", top_k=5)`) embed the query with the same helper and then delegate the ranking pass to the C++/MKL backend Source: [README.md:53]().

Because embeddings and metadata live in the same in-memory object, persisting and restoring a database is a single pickle+gzip round-trip — the file format improved in the first tagged release Source: [README.md:v0.1.2]().

## Model Compatibility

The repository ships with two embedder paths:

| Embedder | Install | Notes |
|----------|---------|-------|
| OpenAI API | Default in examples | Cloud-hosted; requires API key |
| Hugging Face (sentence-transformers) | `pip install sentence-transformers` | Fully local; optional dependency |

The OpenAI path is exercised in the README's Pokémon demo, which embeds `info.description` fields through the remote embedding endpoint Source: [README.md:33](). Users who prefer an offline stack opt into the Hugging Face path explicitly by installing the extra dependency Source: [README.md:25]().

### Choosing an embedder

Because HyperDB performs its own similarity math rather than delegating it to the embedder, any model that emits a 1-D float vector will work — including OpenAI `text-embedding-3-*`, `bge-*`, `e5-*`, MiniLM, and MPNet variants. The only requirements are:

- All stored vectors must share the same dimensionality.
- All stored vectors must come from the same model (mixed-model corpora distort cosine geometry).
- The query embedder must match the index embedder.

### Community discussion on LLaMA embeddings

A top community thread raised whether HyperDB would natively support LLaMA-style embeddings, noting that "LLaMA embeddings are not normalized to unit length like the OpenAI embeddings, meaning you can both represent direction and magnitude" Source: [#1](). This highlights a subtle point: the README demo implicitly assumes normalized vectors because the demo embedder produces them, but the underlying MKL kernel performs an unconstrained dot product. Users supplying non-normalized embeddings from a LLaMA-family model should either normalize before insertion or expect results to be weighted by vector magnitude. The README does not document a switch for this normalization step, so callers that need it must wrap their embedder.

## Operational Considerations

- **Locality**: All vector math runs in-process via MKL; no network round-trip occurs during a query, only during embedding (if using a remote model).
- **Portability**: A pickled database can be moved between machines, but only between machines with the same compiled binary architecture, since the vector store is paired with the C++ extension.
- **Scale**: Because the backend is a single-node C++/MKL process, the project has open community threads around Kubernetes/cloud deployment Source: [#2]() and enterprise features such as access management and compliance tooling Source: [#13](). Until those land, HyperDB is best framed as an embedded library rather than a managed service.
- **Versioning**: The first tagged release (v0.1.2) stabilized the save format, which matters for cross-team reproducibility of embedding-backed snapshots.

## See Also

- [Quick Start and Core API](quickstart.md)
- [Persistence and Save Format](persistence.md)
- [Roadmap and Community Topics](community.md)

---

<a id='page-4'></a>

## Persistence, Deployment, Extensibility and Community Roadmap

### Related Pages

Related topics: [HyperDB Core Architecture and API](#page-2), [Embeddings, Vector Math and Model Compatibility](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- [setup.py](https://github.com/jdagdelen/hyperDB/blob/main/setup.py) *(implied by PyPI distribution mechanics described in README.md)*
- [requirements.txt](https://github.com/jdagdelen/hyperDB/blob/main/requirements.txt) *(implied by installation instructions in README.md)*
- [hyperdb/hyperdb.py](https://github.com/jdagdelen/hyperDB/blob/main/hyperdb/hyperdb.py) *(referenced as the main package import `from hyperdb import HyperDB`)*
- [demo/pokemon.jsonl](https://github.com/jdagdelen/hyperDB/blob/main/demo/pokemon.jsonl) *(referenced in the README usage example)*

> **Note:** Only `README.md` was directly available in the retrieval context used for this page. References to additional files are based on the project structure implied by `setup.py`-style packaging, the import statement in the README, and PyPI distribution metadata. Any claim that cannot be substantiated from `README.md` or the community context is explicitly flagged.
</details>

# Persistence, Deployment, Extensibility and Community Roadmap

This page covers how hyperDB stores its state, how it is distributed and installed, how it is extended by users, and what the community has signaled as upcoming priorities. The information below is anchored to [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) and supplemented with discussion points from the issue tracker.

## 1. Persistence Model

hyperDB persists state through a single-file serialized representation. The README demonstrates that an in-memory `HyperDB` instance can be written to disk and reloaded transparently:

```python
# Source: README.md (Usage → "Save the HyperDB instance to a file")
db.save("demo/pokemon_hyperdb.pickle.gz")
db.load("demo/pokemon_hyperdb.pickle.gz")
results = db.query("Likes to sleep.", top_k=5)
```

The `pickle.gz` extension reveals two deliberate choices: Python's `pickle` for object serialization, and `gzip` for on-disk compression. The save format was improved in v0.1.2 (the first tagged release) — community context notes that this release "incorporates @parasj's improved save format," which is the only persistence-related change currently documented.

Because persistence is a single-file `.pickle.gz`, deployment artifacts are trivially portable: a database can be shipped alongside the application binary, attached as a sidecar, or stored in object storage and rehydrated lazily. There is no separately documented server, daemon, or write-ahead log in the README; persistence is purely local at this stage.

### Persistence Flow

```mermaid
flowchart LR
    A[In-memory HyperDB] -->|db.save| B[.pickle.gz file]
    B -->|db.load| A
    B -->|ship / archive| C[(Object storage<br/>or sidecar)]
    C -->|download| B
```

## 2. Deployment and Packaging

hyperDB is distributed as a standard Python wheel on PyPI under the name `hyperdb-python` Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md). The supported installation path is:

```bash
pip install hyperdb-python
```

There is one **optional dependency**, declared separately so that the base package remains lightweight for callers that already manage their own embedding model client:

```bash
pip install sentence-transformers
```

The optional split implies a deployment posture where `hyperDB` itself is a stateful vector store and the embedding model is pluggable infrastructure. There is **no documented Docker image, Helm chart, or Kubernetes manifest** in the README. Community issue **#2 ("K8 Support")** explicitly asks whether Kubernetes-style cloud deployments are on the roadmap; the absence of such artifacts in the README is consistent with that question being open rather than resolved.

| Concern | Current State (per README + issues) |
|---|---|
| Distribution channel | PyPI wheel (`hyperdb-python`) |
| Install mode | `pip install`, base + optional `sentence-transformers` |
| Container image | Not published (open: issue #2) |
| Cloud-managed offering | Not published (open: issue #13) |
| Access management / auth | Out of scope today (open: issue #13) |

## 3. Extensibility Surface

The README surfaces two extension points that consumers actually use:

1. **Document shape** — The constructor accepts arbitrary documents and a `key` argument that points into a nested field path, e.g. `key="info.description"`. Source: [README.md](https://github.com/jdagdelen/hyperDB/blob/main/README.md) (`HyperDB(documents, key="info.description")`). This means callers can store any JSON-serializable record, not a fixed schema.
2. **Embedding backend** — The optional `sentence-transformers` integration lets users swap in Hugging Face models. A community thread (**#1 "🦙 LLAMA embeddings"**) notes that LLAMA embeddings are *not* normalized to unit length and therefore carry both direction and magnitude; integrating them would require either a normalization toggle or a different distance metric than the current default.

No plugin loader, embedding-protocol class, or adapter registry is described in the README. Today, extending hyperDB means installing `sentence-transformers` and configuring whichever model the user wants, rather than registering a new backend through a documented hook.

## 4. Community Roadmap Signals

The most-upvoted community themes cluster around three themes that directly shape persistence, deployment, and extensibility:

- **Cloud-native deployment** — Issue **#2 (K8 Support)** asks about Helm/operator-style packaging for cloud rollouts. Combined with issue **#13 ("Enterprise features")** which requests cloud deployment, access management, and data compliance, the community is signalling that single-host `pickle.gz` files are acceptable for prototyping but not for production multi-tenant use.
- **Embedding coverage** — Issue **#1 (LLAMA embeddings)** asks for non-normalized vectors to be supported, which is a feature request against the extensibility surface described above.
- **Project branding** — Issue **#11 ("Rebranding to HyperBS")** is a naming/narrative discussion rather than a code change; it does not affect persistence or deployment mechanics but indicates active iteration on packaging identity.

The latest tagged release is **v0.1.2**, which is documented in the community context as "the first tagged release" and which incorporates @parasj's improved save format — currently the only roadmap item that has actually shipped on the persistence track.

## See Also

- [README.md — Installation & Usage](https://github.com/jdagdelen/hyperDB/blob/main/README.md)
- GitHub issues: #2 (K8 Support), #13 (Enterprise features), #11 (Rebranding to HyperBS), #1 (LLAMA embeddings)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: jdagdelen/hyperDB

Summary: Found 10 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/jdagdelen/hyperDB/issues/36

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/jdagdelen/hyperDB

## 3. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/jdagdelen/hyperDB/issues/30

## 4. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/jdagdelen/hyperDB

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/jdagdelen/hyperDB

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/jdagdelen/hyperDB

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/jdagdelen/hyperDB

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/jdagdelen/hyperDB/issues/35

## 9. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/jdagdelen/hyperDB

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/jdagdelen/hyperDB

<!-- canonical_name: jdagdelen/hyperDB; human_manual_source: deepwiki_human_wiki -->
