# https://github.com/superduper-io/superduper Project Manual

Generated at: 2026-06-23 17:00:48 UTC

## Table of Contents

- [Overview and Core Architecture](#page-overview)
- [Data Management and Database Backends](#page-databackends)
- [AI/ML Model Integration and Vector Search](#page-aiml-integration)
- [Security, Components, and Extensibility](#page-security-extensibility)

<a id='page-overview'></a>

## Overview and Core Architecture

### Related Pages

Related topics: [Data Management and Database Backends](#page-databackends), [AI/ML Model Integration and Vector Search](#page-aiml-integration), [Security, Components, and Extensibility](#page-security-extensibility)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/superduper-io/superduper/blob/main/README.md)
- [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md)
- [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)
- [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md)
- [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md)
- [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md)
- [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md)
</details>

# Overview and Core Architecture

## Purpose and Scope

Superduper is an open-source framework for integrating AI directly with databases, including streaming inference, scalable model training, and vector search. The project ships as a small, dependency-light core package (`superduper-framework`) that delegates database connectivity and AI model integrations to independently installable plugins. Source: [README.md](https://github.com/superduper-io/superduper/blob/main/README.md).

The repository itself focuses on the framework core and the plugin ecosystem. Per [README.md](https://github.com/superduper-io/superduper/blob/main/README.md), installation is split into three layers:

1. The base framework: `pip install superduper-framework >= 0.7.0`.
2. One or more databackend plugins (e.g. `superduper-mongodb`, `superduper-sql`, `superduper-snowflake`, `superduper-redis`).
3. Optional use-case plugins for specific model providers or model families.

This layered design lets users keep their runtime small and pull in only the integrations they need.

## Core Architecture and Plugin Model

The architecture is a thin framework surrounded by domain-specific plugins. According to [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md), "Superduper plugins are a collection of plugins that provide additional functionality to the Superduper framework," and contributors are guided to extend the system through a dedicated plugin and template workflow.

```mermaid
graph TD
    A[superduper-framework core] --> B[Databackend plugins]
    A --> C[Model / AI plugins]
    A --> D[Utility plugins]
    B --> B1[superduper-mongodb]
    B --> B2[superduper-sql]
    B --> B3[superduper-snowflake]
    B --> B4[superduper-redis]
    C --> C1[superduper_openai]
    C --> C2[superduper_anthropic]
    C --> C3[superduper_cohere]
    C --> C4[superduper_torch]
    C --> C5[superduper_transformers]
    C --> C6[superduper_vllm]
    C --> C7[superduper_llamacpp]
    C --> C8[superduper_sentence_transformers]
    C --> C9[superduper_jina]
    D --> D1[superduper_sklearn]
    D --> D2[superduper_pillow]
```

### Framework + Databackends

The base framework is database-agnostic. Users connect to a specific database by installing the matching databackend plugin and instantiating `superduper` with a connection URI. The `superduper-sql` plugin, for example, leverages the [ibis](https://ibis-project.org/) project to expose an ibis-compatible query API with additional support for complex data-types and vector searches. Source: [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md).

```python
from superduper import superduper
db = superduper('mysql://<mysql-uri>')
# or
db = superduper('postgres://<postgres-uri>')
```
Source: [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md).

### Framework + AI Models

AI integrations follow the same pattern. Each model provider ships as its own plugin that exposes predictor classes. For instance:

- `superduper_openai` exposes `OpenAIEmbedding`, `OpenAIChatCompletion`, `OpenAIImageCreation`, `OpenAIImageEdit`, `OpenAIAudioTranscription`, and `OpenAIAudioTranslation`. Source: [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md).
- `superduper_torch` wraps arbitrary PyTorch modules via `TorchModel` and adds a `TorchTrainer` for training. Source: [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md).
- `superduper_transformers` wraps Hugging Face `transformers` pipelines and provides an `LLM` class. Source: [plugins/transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/transformers/README.md).
- `superduper_vllm` and `superduper_llamacpp` provide self-hosted LLM serving via [vLLM](https://github.com/vllm-project/vllm) and [Llama.cpp](https://github.com/ggerganov/llama.cpp) respectively. Source: [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md).
- `superduper_sentence_transformers` integrates [Sentence-Transformers](https://sbert.net) for self-hosted embeddings, with first-class support for the `vector` datatype. Source: [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md).

Each plugin exposes a uniform `predict(...)` interface, so swapping a hosted model for a self-hosted one (or vice versa) does not require changes to application code.

## Plugin Conventions

Every plugin follows the same layout and documentation conventions, enforced by a README template and a generator script. Source: [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md). The template specifies that each plugin README must contain:

- A short description of what the plugin enables.
- An installation snippet (`pip install <plugin_name>`).
- An API section linking to the source tree and the generated API docs, plus a class table listing the exported predictors.
- An Examples section with runnable code blocks.

The auto-generated portion can be refreshed for a single plugin or for every plugin at once by running `python generate_readme.py` from the `plugins/` directory. Source: [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md).

## Installation Workflow

Per [README.md](https://github.com/superduper-io/superduper/blob/main/README.md), a working environment requires Python 3.10+ and at minimum the base framework plus one databackend plugin. The base framework alone is not sufficient to run any end-to-end workload. A typical install looks like:

```bash
pip install superduper-framework >= 0.7.0
pip install superduper-mongodb >= 0.7.0    # pick at least one databackend
pip install superduper-openai              # optional, model integrations
```

Beyond the package distribution, the project distributes documentation, Slack, YouTube, and LinkedIn channels for community support, and accepts contributions in the form of bug reports, docs, feature requests, and new tutorials. Source: [README.md](https://github.com/superduper-io/superduper/blob/main/README.md).

## See Also

- Plugins catalog: [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- Plugin authoring guide referenced from [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md) via `CONTRIBUTING.md`
- Issue #2395 ("Vector search backfill missing") for a known operational caveat when vector indexes are rebuilt across processes.
- Issue #2292 ("Support connections to Elasticsearch databases") for the status of additional databackend requests.

---

<a id='page-databackends'></a>

## Data Management and Database Backends

### Related Pages

Related topics: [Overview and Core Architecture](#page-overview), [AI/ML Model Integration and Vector Search](#page-aiml-integration), [Security, Components, and Extensibility](#page-security-extensibility)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/superduper-io/superduper/blob/main/README.md)
- [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- [plugins/mongodb/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md)
- [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)
- [plugins/pillow/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md)
- [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md)

</details>

# Data Management and Database Backends

## Overview

Superduper is a framework for integrating AI directly with databases, and data management is central to its design. Database connectivity is delivered through a plugin-based architecture that keeps the core framework lean while supporting multiple storage backends.

The project distributes database connectivity as independently installable plugins. As stated in [plugins/README.md:1-6](https://github.com/superduper-io/superduper/blob/main/plugins/README.md), "Superduper plugins are a collection of plugins that provide additional functionality to the Superduper framework." Each plugin follows a common structure with its own README, API surface, and examples, and is published to PyPI under the `superduper_<name>` convention ([README.md:43-46](https://github.com/superduper-io/superduper/blob/main/README.md)).

## Supported Database Backends

Superduper exposes two principal database backend families through dedicated plugins.

### MongoDB Backend

The `superduper_mongodb` plugin provides a high-level API for MongoDB, built on top of `pymongo`. According to [plugins/mongodb/README.md:5-10](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md), the MongoDB query API works exactly as per pymongo, with three distinguishing characteristics:

- Inputs are wrapped in `Document`
- Additional support for vector-search is provided
- Queries are executed lazily

The plugin exposes `superduper_mongodb.data_backend.MongoDataBackend` as its primary data backend class.

### SQL Backend

The `superduper_sql` plugin delivers SQL connectivity through the [ibis](https://ibis-project.org/) project. Per [plugins/sql/README.md:5-9](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md), "Superduper supports SQL databases via the ibis project. With superduper, queries may be built which conform to the ibis API, with additional support for complex data-types and vector-searches." The exposed class is `superduper_sql.data_backend.SQLDataBackend`.

## Architecture

The data management layer separates the core Superduper engine from the storage layer through a uniform plugin pattern:

```mermaid
graph TB
    Core["Superduper Core Framework"]
    MongoP["superduper_mongodb"]
    SQLP["superduper_sql"]
    MongoData["MongoDataBackend"]
    SQLData["SQLDataBackend"]
    PyMongo["PyMongo"]
    Ibis["Ibis Project"]
    MongoDB[("MongoDB")]
    MySQL[("MySQL")]
    Postgres[("PostgreSQL")]

    Core --> MongoP
    Core --> SQLP
    MongoP --> MongoData
    SQLP --> SQLData
    MongoData --> PyMongo
    SQLData --> Ibis
    PyMongo --> MongoDB
    Ibis --> MySQL
    Ibis --> Postgres
```

Both backend classes conform to a shared `DataBackend` interface, allowing the core engine to remain backend-agnostic.

## Connection Patterns

Both backends share a consistent `superduper()` factory function for establishing connections, documented in the plugin READMEs.

### MongoDB Connections

From [plugins/mongodb/README.md:20-30](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md):

```python
from superduper import superduper

# In-memory test backend
db = superduper('mongomock://test')

# Local MongoDB
db = superduper('mongodb://localhost:27017/documents')

# MongoDB Atlas
db = superduper('mongodb+srv://<username>:<password>@<cluster-url>/<database>')
```

### SQL Connections

From [plugins/sql/README.md:16-30](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md):

```python
from superduper import superduper

# MySQL
db = superduper('mysql://<mysql-uri>')

# PostgreSQL
db = superduper('postgres://<postgres-uri>')

# Other databases (via ibis)
db = superduper('<database-uri>')
```

The `mongomock://` scheme is particularly useful for unit tests and CI workflows, since it does not require a running MongoDB server.

## Complex Data Types and Schemas

Beyond standard tabular data, Superduper plugins extend the data model to handle complex media types. The `superduper_pillow` plugin demonstrates how custom datatypes integrate with the schema system. According to [plugins/pillow/README.md:17-32](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md), images can be stored in a database using the `pil_image` field type, and standard Superduper operations (`db.apply`, `insert`, `select`) operate transparently on these typed fields.

## Plugin Development

New data backends or integrations follow the template pattern documented in [plugins/template/README.md:1-25](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md). The template uses an auto-generated README section plus a custom section, with a `generate_readme.py` script that updates documentation by introspecting the plugin's classes:

```bash
python generate_readme.py plugins/<plugin_name>
```

## Known Issues and Limitations

The community has reported several issues relevant to data management:

- **Vector search backfill** — Issue [#2395](https://github.com/superduper-io/superduper/issues/2395) reports that when starting a new process and loading `vector_search`, the index is empty because backfill does not occur in single-threaded operation.
- **SQL parsing** — Issue [#2927](https://github.com/superduper-io/superduper/issues/2927) notes that `parse_query` does not parse SQL.
- **Deserialization risk** — Issue [#2936](https://github.com/superduper-io/superduper/issues/2936) describes unsafe `pickle.loads()` and `dill.loads()` deserialization of artifact-store data (CVSS 9.8).
- **Feature requests** — Issue [#2292](https://github.com/superduper-io/superduper/issues/2292) requests Elasticsearch support, which is not yet provided by an official plugin.

## See Also

- [Plugin Overview](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- [MongoDB Plugin](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md)
- [SQL Plugin](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)
- [Pillow Plugin (Complex Data Types)](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md)
- [Plugin Development Template](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md)

---

<a id='page-aiml-integration'></a>

## AI/ML Model Integration and Vector Search

### Related Pages

Related topics: [Overview and Core Architecture](#page-overview), [Data Management and Database Backends](#page-databackends), [Security, Components, and Extensibility](#page-security-extensibility)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [superduper/components/model.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/model.py)
- [superduper/components/listener.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/listener.py)
- [superduper/components/vector_index.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/vector_index.py)
- [superduper/components/llm/model.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/llm/model.py)
- [superduper/components/llm/prompter.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/llm/prompter.py)
- [superduper/components/training.py](https://github.com/superduper-io/superduper/blob/main/superduper/components/training.py)
- [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md)
- [plugins/anthropic/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/anthropic/README.md)
- [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md)
- [plugins/transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/transformers/README.md)
- [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md)
- [plugins/sklearn/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sklearn/README.md)
- [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md)
- [plugins/llamacpp/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/llamacpp/README.md)
- [plugins/jina/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/jina/README.md)
- [plugins/snowflake/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/snowflake/README.md)
- [plugins/pillow/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md)
- [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)
- [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- [README.md](https://github.com/superduper-io/superduper/blob/main/README.md)
</details>

# AI/ML Model Integration and Vector Search

Superduper is a framework for integrating AI/ML models — including LLMs, embedding models, classical ML estimators, and custom PyTorch networks — directly with databases, and for performing vector search and retrieval-augmented workflows over that data. This page documents the core components that make this possible and the plugin ecosystem that extends the framework to specific model providers, data types, and vector backends.

## 1. Core Model Component

The base abstraction for any model integrated with Superduper is the `Model` component defined in `superduper/components/model.py`. Every concrete model — whether it wraps an OpenAI API call, a scikit-learn estimator, a Hugging Face pipeline, or a PyTorch `nn.Module` — is a subclass of this class. The `Model` class encapsulates inputs/outputs, pre/post-processing, training configuration, and the connection to a `Datalayer` (database) for storing outputs and artifacts.

The `Model.predict` and `Model.predict_batches` methods form the inference API. A typical pattern is to instantiate a model with an `identifier`, an `object` (the underlying model artifact), a `datatype` describing the output encoding, and optional `preprocess`/`postprocess` callables:

```python
from superduper_sentence_transformers import SentenceTransformer
import sentence_transformers

model = SentenceTransformer(
    identifier="embedding",
    object=sentence_transformers.SentenceTransformer("BAAI/bge-small-en"),
    datatype=vector(shape=(1024,)),
    postprocess=lambda x: x.tolist(),
    predict_kwargs={"show_progress_bar": True},
)
model.predict("What is superduper")
```

Source: [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md)

Training is supported uniformly via `superduper/components/training.py`, which provides trainer classes consumed by plugins such as `superduper_torch.training.TorchTrainer` and `superduper_sklearn.model.SklearnTrainer`. Source: [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md)

## 2. Listeners: Connecting Models to Data

A `Listener` (defined in `superduper/components/listener.py`) is the mechanism that wires a `Model` to a column of a database table. When a `Datalayer` `apply()`s a `Listener`, it ensures that incoming or existing rows of the configured table are processed by the model and that outputs are written back as derived outputs. This is the foundation of "streaming inference" in Superduper: a single call connects any model to any supported databackend without separate ETL.

Listeners are commonly combined with vector indexes to populate and maintain an embedding store automatically. Each row that flows through the listener contributes a vector to the index; queries against the index can then retrieve similar rows. Community reports have surfaced edge cases where vector search backfill does not complete when a fresh process loads an existing `vector_search` index — see issue [#2395](https://github.com/superduper-io/superduper/issues/2395). The issue documents that within a single process the index is populated, but a reload can yield an empty index.

## 3. Vector Indexes and Search

Vector search is provided by the `VectorIndex` component in `superduper/components/vector_index.py`. A `VectorIndex` is composed of one or more `Listener` objects (the encoders that produce vectors) and a `vector_search` backend. The framework supports vector search across the same databackends it integrates with generally: MongoDB, SQL databases (via the `superduper_sql` plugin), and Snowflake via the dedicated `superduper_snowflake.vector_search.SnowflakeVectorSearch` class. Source: [plugins/snowflake/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/snowflake/README.md)

The high-level data flow looks like this:

```mermaid
flowchart LR
    A[Source Table] -->|rows| B[Listener / Model]
    B -->|embeddings| C[VectorIndex]
    C --> D[(vector_search backend)]
    D -->|k-NN results| E[Query API]
    A -->|query| E
    E -->|hybrid output| F[Caller]
```

SQL databackends are supported through the `superduper_sql` plugin, which builds on the Ibis project to expose a query API that conforms to the same `Datalayer` interface as the MongoDB backend. Source: [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)

## 4. LLM, Embedding, and Framework Plugins

The `plugins/` directory contains the integration packages that extend the core `Model` to specific model ecosystems. A summary of the supported integrations:

| Plugin | Primary class(es) | Role |
|---|---|---|
| `superduper_openai` | `OpenAIChatCompletion`, `OpenAIEmbedding`, `OpenAIImageCreation`, `OpenAIImageEdit`, `OpenAIAudioTranscription` | Hosted OpenAI inference for chat, embeddings, images, and audio. |
| `superduper_anthropic` | `Anthropic`, `AnthropicCompletions` | Hosted Claude inference. |
| `superduper_jina` | `JinaEmbedding`, `JinaAPIClient` | Hosted Jina embeddings. |
| `superduper_sentence_transformers` | `SentenceTransformer` | Self-hosted sentence embeddings via sbert. |
| `superduper_transformers` | `LLM`, `TextClassificationPipeline` | Hugging Face pipelines and LLM wrappers. |
| `superduper_vllm` | `VllmChat`, `VllmCompletion` | Self-hosted LLM serving with vLLM. |
| `superduper_llamacpp` | `LlamaCpp`, `LlamaCppEmbedding` | Self-hosted LLM serving via llama.cpp. |
| `superduper_torch` | `TorchModel`, `TorchTrainer` | Arbitrary `torch.nn.Module` integration and training. |
| `superduper_sklearn` | `Estimator`, `SklearnTrainer` | scikit-learn estimator integration and training. |
| `superduper_pillow` | `pil_image` field type | Image storage via Pillow. |
| `superduper_sql` | `SQLDataBackend` | SQL databackend (MySQL, Postgres, etc.). |
| `superduper_snowflake` | `SnowflakeVectorSearch` | Native vector search on Snowflake. |

Sources: [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md), [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md), [plugins/anthropic/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/anthropic/README.md), [plugins/jina/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/jina/README.md), [plugins/transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/transformers/README.md), [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md), [plugins/llamacpp/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/llamacpp/README.md), [plugins/pillow/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md), [plugins/sklearn/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sklearn/README.md)

LLM-specific behavior — including prompt templating, chat formatting, and token-level handling — lives in `superduper/components/llm/prompter.py` and `superduper/components/llm/model.py`. The `Prompter` class lets users bind prompt templates with `{}`-style placeholders that are filled with upstream outputs, e.g. `OpenAIChatCompletion(model='gpt-3.5-turbo', prompt='Hello, {context}')`. Source: [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md)

### Installation

The base framework and at least one databackend plugin are required; model plugins are installed on demand:

```bash
pip install superduper-framework >= 0.7.0
pip install superduper-mongodb >= 0.7.0   # or superduper-sql, superduper-snowflake, superduper-redis
pip install superduper-openai            # any combination of model plugins
```

Source: [README.md](https://github.com/superduper-io/superduper/blob/main/README.md)

## 5. Known Limitations and Community Notes

- **Vector search backfill across processes.** Reloading a previously built `vector_search` index in a new process can yield an empty index; this is tracked in [#2395](https://github.com/superduper-io/superduper/issues/2395). Workarounds typically involve re-applying the `VectorIndex` so listeners repopulate it.
- **Query parser behavior.** `parse_query` does not parse SQL — see [#2927](https://github.com/superduper-io/superduper/issues/2927). SQL querying is provided through the Ibis-based path in `superduper_sql`.
- **Plugin coverage.** The community has requested additional integrations, e.g. Elasticsearch ([#2292](https://github.com/superduper-io/superduper/issues/2292)) and Groq ([#2916](https://github.com/superduper-io/superduper/issues/2916)). Until a first-party plugin is merged, these providers are typically accessed by wrapping their HTTP APIs inside a custom `Model` subclass or by routing through the OpenAI-compatible client.
- **Security advisories.** Recent issues [#2935](https://github.com/superduper-io/superduper/issues/2935), [#2936](https://github.com/superduper-io/superduper/issues/2936), and [#2937](https://github.com/superduper-io/superduper/issues/2937) report code-injection and unsafe deserialization concerns in the query and artifact pipelines. Operators exposing Superduper on a network should treat the artifact store as a trust boundary and prefer signed model artifacts.

## See Also

- [README.md](https://github.com/superduper-io/superduper/blob/main/README.md) — installation and high-level overview.
- [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md) — plugin contribution guide.
- [Release 0.10.0](https://github.com/superduper-io/superduper/releases/tag/0.10.0) — most recent release notes.
- [docs.superduper.io](https://docs.superduper.io) — official documentation, including templates such as LLM finetuning and transfer learning.

---

<a id='page-security-extensibility'></a>

## Security, Components, and Extensibility

### Related Pages

Related topics: [Overview and Core Architecture](#page-overview), [Data Management and Database Backends](#page-databackends), [AI/ML Model Integration and Vector Search](#page-aiml-integration)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/superduper-io/superduper/blob/main/README.md)
- [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md)
- [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md)
- [plugins/mongodb/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md)
- [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md)
- [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md)
- [plugins/anthropic/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/anthropic/README.md)
- [plugins/jina/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/jina/README.md)
- [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md)
- [plugins/transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/transformers/README.md)
- [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md)
- [plugins/sklearn/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sklearn/README.md)
- [plugins/llamacpp/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/llamacpp/README.md)
- [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md)
- [plugins/pillow/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md)
- [superduper/base/query.py](https://github.com/superduper-io/superduper/blob/main/superduper/base/query.py)
- [superduper/misc/serialization.py](https://github.com/superduper-io/superduper/blob/main/superduper/misc/serialization.py)
</details>

# Security, Components, and Extensibility

## Overview

Superduper is an open-source framework that integrates AI applications directly with databases, vector search, and streaming inference, distributed under the Apache 2.0 license. Source: [README.md](https://github.com/superduper-io/superduper/blob/main/README.md). The project ships a modular base package (`superduper-framework`) and a curated set of plugins under the `plugins/` directory. Source: [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md). This page documents the security surface that users and integrators should be aware of, the canonical plugin components, and the extension model used to add new databackends, models, or encoders.

## Security Considerations

### Query Parser and `eval()` Risk

The query subsystem parses user-supplied query strings into executable Python. Community security disclosures (#2935, #2937) report that `superduper/base/query.py` invokes `eval()` in five or more locations and, although the evaluation namespace is partially restricted, `__import__` remains reachable, allowing arbitrary code execution from crafted queries. Source: issue [#2937](https://github.com/superduper-io/superduper/issues/2937) and issue [#2935](https://github.com/superduper-io/superduper/issues/2935), referencing `superduper/base/query.py`. Operators should treat user-supplied query strings as untrusted input, avoid exposing the parser to adversarial data, and monitor upstream fixes that replace `eval()` with a constrained AST-based evaluator.

### Serialization and Deserialization Risks

The data pipeline persists model artifacts and intermediate state using `pickle`/`dill`. Community security disclosure #2936 flags that `pickle.loads()` and `dill.loads()` are applied to data from the artifact store without integrity validation, enabling remote code execution (CVSS 9.8, CWE-502). Source: issue [#2936](https://github.com/superduper-io/superduper/issues/2936), referencing `superduper/misc/serialization.py`. The same disclosure also cites `superduper/base/artifacts.py`, `superduper/base/encoding.py`, and `superduper/base/datatype.py` as adjacent trust boundaries. Issue #933 additionally warns that distributing environment `pickle` files in the repository is unsafe because untrusted pickles execute code during deserialization. Recommended mitigations include isolating artifact stores, signing artifacts, and avoiding distribution of `.pkl` files. Source: issue [#933](https://github.com/superduper-io/superduper/issues/933).

### Reported Vector-Search and Logging Behavior

Issue #2395 reports that vector-search indexes populated in one process can be empty when reloaded in a new process because backfill logic does not run on cold load. Source: issue [#2395](https://github.com/superduper-io/superduper/issues/2395). Logging uses the framework logger in `superduper/base/logger.py`; verbose logs may surface sensitive query strings or partial payloads, so redaction at the logger layer is recommended in production deployments.

## Core Components (Plugin Catalog)

Superduper is delivered as a base framework plus separately installable plugins. The base package is `superduper-framework` (Python 3.10+), and at least one databackend plugin is required. Source: [README.md](https://github.com/superduper-io/superduper/blob/main/README.md).

| Plugin (pip name) | Role | Source |
|---|---|---|
| `superduper-mongodb` | `MongoDataBackend`, pymongo-compatible API with vector-search | [plugins/mongodb/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/mongodb/README.md) |
| `superduper-sql` | SQL backends (MySQL, Postgres, etc.) via ibis | [plugins/sql/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sql/README.md) |
| `superduper-redis` | Redis databackend (introduced in 0.10.0) | Release notes 0.10.0 |
| `superduper-snowflake` | Snowflake databackend | [README.md](https://github.com/superduper-io/superduper/blob/main/README.md) |
| `superduper-openai` | Embeddings, chat, image, audio | [plugins/openai/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/openai/README.md) |
| `superduper_anthropic` | Claude predictors | [plugins/anthropic/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/anthropic/README.md) |
| `superduper_jina` | Jina embedding client | [plugins/jina/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/jina/README.md) |
| `superduper_sentence_transformers` | SBERT embedding models | [plugins/sentence_transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sentence_transformers/README.md) |
| `superduper_transformers` | HF pipelines and `LLM` wrapper | [plugins/transformers/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/transformers/README.md) |
| `superduper_torch` | `TorchModel`, `TorchTrainer` | [plugins/torch/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/torch/README.md) |
| `superduper_sklearn` | `Estimator`, `SklearnTrainer` | [plugins/sklearn/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/sklearn/README.md) |
| `superduper_llamacpp` | Self-hosted GGUF models | [plugins/llamacpp/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/llamacpp/README.md) |
| `superduper_vllm` | vLLM chat and completion | [plugins/vllm/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/vllm/README.md) |
| `superduper_pillow` | `pil_image` datatype | [plugins/pillow/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/pillow/README.md) |

## Extensibility via the Plugin Architecture

New functionality is added through independently published plugins. The contributor guide at `plugins/README.md` documents the contract: each plugin is a standalone Python package installed via `pip install superduper-<name>`, exposes one or more classes (models, trainers, datatypes, data backends), and ships an auto-generated README built from class docstrings. Source: [plugins/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/README.md).

The README of each plugin is generated from a Jinja-style template located at `plugins/template/README.md`. The template defines placeholders for `plugin_name`, `description`, an `API` section (with a classes table and links to source/API-docs), and an `Examples` section. Source: [plugins/template/README.md](https://github.com/superduper-io/superduper/blob/main/plugins/template/README.md). The script `generate_readme.py` regenerates a single plugin's README or all README files in the `plugins/` tree, ensuring documentation stays consistent with the exported API.

Community proposals demonstrate the extension pattern in practice. Issue #2916 proposes a Groq plugin that, while callable through OpenAI's client, would expose a dedicated integration path for clearer ergonomics and inference speed. Source: issue [#2916](https://github.com/superduper-io/superduper/issues/2916). Issue #2292 requests an Elasticsearch databackend, which would follow the same databackend plugin contract used by `superduper_mongodb` and `superduper_sql`. Source: issue [#2292](https://github.com/superduper-io/superduper/issues/2292).

## Operational Guidance

When deploying Superduper, restrict who can submit queries to `superduper/base/query.py` until the upstream `eval()` issue is mitigated, isolate the artifact store used by `superduper/misc/serialization.py` from untrusted writers, and pin plugin versions (e.g., the latest `0.10.0` release line) so that backend fixes such as PostgreSQL compatibility, Redis support, and CDC table corrections propagate predictably. Source: release tag [0.10.0](https://github.com/superduper-io/superduper/releases/tag/0.10.0).

## See Also

- [SuperduperDB Query Language and Vector Search](https://github.com/superduper-io/superduper) — overview of the query layer affected by issue #2937.
- Plugin source index: [plugins/](https://github.com/superduper-io/superduper/tree/main/plugins).
- Security advisories indexed at: issues [#2935](https://github.com/superduper-io/superduper/issues/2935), [#2936](https://github.com/superduper-io/superduper/issues/2936), [#2937](https://github.com/superduper-io/superduper/issues/2937).

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: superduper-io/superduper

Summary: Found 11 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/superduper-io/superduper/issues/2916

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/superduper-io/superduper/issues/2395

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/superduper-io/superduper

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/superduper-io/superduper

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/superduper-io/superduper

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/superduper-io/superduper

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/superduper-io/superduper/issues/2937

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/superduper-io/superduper/issues/2936

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/superduper-io/superduper/issues/2933

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/superduper-io/superduper

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/superduper-io/superduper

<!-- canonical_name: superduper-io/superduper; human_manual_source: deepwiki_human_wiki -->