# https://github.com/postgresml/korvus Project Manual

Generated at: 2026-07-02 12:28:14 UTC

## Table of Contents

- [Overview and Getting Started](#page-1)
- [System Architecture and Core Components](#page-2)
- [Multi-Language SDK Bindings and Examples](#page-3)
- [RAG Operations, Configuration, and Common Failure Modes](#page-4)

<a id='page-1'></a>

## Overview and Getting Started

### Related Pages

Related topics: [System Architecture and Core Components](#page-2), [Multi-Language SDK Bindings and Examples](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/postgresml/korvus/blob/main/README.md)
- [CONTRIBUTING.md](https://github.com/postgresml/korvus/blob/main/CONTRIBUTING.md)
- [korvus/README.md](https://github.com/postgresml/korvus/blob/main/korvus/README.md)
- [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)
- [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)
- [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)
- [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)
- [korvus/python/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/python/examples/README.md)
- [korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md)
- [rust-bridge/README.md](https://github.com/postgresml/korvus/blob/main/rust-bridge/README.md)
- [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)
- [rust-bridge/rust-bridge-traits/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-traits/src/lib.rs)
</details>

# Overview and Getting Started

## Project Purpose and Scope

Korvus is an open-source SDK positioned as an alternative to managed stacks that combine OpenAI with Pinecone. Its stated goal is to let developers build end-to-end vector search applications directly on top of PostgreSQL using PgVector, without depending on proprietary APIs for embeddings or vector storage. The crate-level documentation describes it as a tool that "seamlessly manage[s] various database tables related to documents, text chunks, text splitters, LLM models, and embeddings" while leveraging PgVector for "fast and accurate queries." Source: [korvus/src/lib.rs:1-7]()

The repository is structured as a Rust core plus a separate `rust-bridge` workspace that compiles the same core into Python, JavaScript, and C bindings. This multi-language posture is reflected in the top-level README, which advertises the project as an "Open Source Alternative for Building End-to-End Vector Search Applications without OpenAI & Pinecone" and points users to official documentation hosted at `postgresml.org/docs/open-source/korvus/`. Source: [README.md:1-30](), [korvus/README.md:1-3]()

The project is maintained by PostgresML, and the repository's `CONTRIBUTING.md` invites community contributions through GitHub forks, branches, and pull requests, with questions directed to a Discord community. Source: [CONTRIBUTING.md:1-25]()

## High-Level Architecture

Korvus is organized around three first-class Rust types that act as the primary API surface for every language binding:

- `Collection` — a logical grouping of documents and the pipelines that operate on them. It carries a name, an optional database URL, and the names of the `pipelines` and `documents` tables it owns. Source: [korvus/src/collection.rs:46-58]()
- `Pipeline` — a declarative transformation schema (a JSON object of field actions) that describes how raw documents become searchable chunks. Source: [korvus/src/pipeline.rs:13-21]()
- `Model` and `Splitter` — building blocks referenced by pipelines for embedding generation and text splitting. They are re-exported from the crate root alongside `Collection`, `Pipeline`, `Builtins`, `OpenSourceAI`, and `TransformerPipeline`. Source: [korvus/src/lib.rs:30-36]()

A `Collection` is backed by per-project metadata (`ProjectInfo`) that records the task the project performs. The supported `ProjectTask` variants include `Regression`, `Classification`, `QuestionAnswering`, `Summarization`, `Translation`, `TextClassification`, `TextGeneration`, `Text2text`, and `Embedding`. These tasks drive which model and pipeline configurations are valid for a given collection. Source: [korvus/src/collection.rs:19-43]()

```mermaid
flowchart LR
    User[Developer / SDK consumer] --> SDK[Language SDK<br/>Rust / Python / JS / C]
    SDK --> Coll[Collection]
    Coll --> Pipe[Pipeline<br/>JSON schema]
    Pipe --> Splitter
    Pipe --> Model[Model / Builtins / Transformer]
    Coll --> PG[(PostgreSQL + PgVector)]
    Model --> PG
    Splitter --> PG
```

## Language Bindings via rust-bridge

The multi-language support is generated rather than hand-written. The `rust-bridge` workspace provides proc-macros that translate "vanilla Rust" into PyO3 and Neon-compatible Rust, then re-export those bindings. The README describes the approach as a "Rust to Rust transpiler that uses crates like PyO3 to automatically write the necessary foreign function interfaces." Source: [rust-bridge/README.md:1-40]()

Two macros drive the translation:

- `#[derive(alias)]` — attached to a struct, it generates language-specific wrapper types (for example a `*Python` class for PyO3). Source: [rust-bridge/README.md:42-66]()
- `#[alias_methods(...)]` — attached to an `impl` block, it generates per-language method bindings and stub files for Python, JavaScript, and C. Source: [rust-bridge/rust-bridge-macros/src/lib.rs:1-35]()

The macro crate dispatches to language-specific generators (`python`, `javascript`, `c`) and consults an `AttributeArgs` parser that supports per-method `skip` directives, enabling fine-grained control of which surfaces are exposed in each target language. Source: [rust-bridge/rust-bridge-macros/src/common.rs:1-50]()

The features exposed at the SDK root mirror the language bindings available. The languages module currently registers Python, JavaScript, and C under their respective Cargo features. Source: [korvus/src/languages/mod.rs:1-9]()

| Language   | Feature flag       | Binding crate |
|------------|--------------------|---------------|
| Rust       | (default)          | korvus core   |
| Python     | `python`           | PyO3          |
| JavaScript | `javascript`       | Neon          |
| C          | `c`                | manual FFI    |

Community requests filed against the repository (issues #6 "Add Golang", #9 "Add PHP", #24 "Add .net support") indicate that expanding this matrix to Go, PHP, and .NET is a recurring area of interest, but those bindings are not present in the current source tree. Source: [korvus/src/languages/mod.rs:1-9]()

## Getting Started and Example Workflows

Both Python and JavaScript example directories ship parallel tutorials that walk through the same conceptual pipeline: build a `Collection`, register a `Pipeline` (typically `semantic_search`), upsert documents, and run a vector recall or RAG query. Source: [korvus/python/examples/README.md:1-15](), [korvus/javascript/examples/README.md:1-9]()

A representative workflow, consistent across both language SDKs, looks like:

1. Construct a `Collection` that owns its `pipelines` and `documents` tables. Source: [korvus/src/collection.rs:46-58]()
2. Define a `Pipeline` schema mapping document fields to actions such as `semantic_search` or `transform`. Source: [korvus/src/pipeline.rs:13-30]()
3. Upsert documents, then invoke `Builtins.transform()` to run a Hugging Face model on the database — for example, an extractive question-answering model that uses a vector recall result as `context`. Source: [korvus/python/examples/README.md:5-10]()

Community issue #13 flags a concrete documentation gap that newcomers hit at step 3: when a Hugging Face model is referenced inside a pipeline, the configuration must include `"trust_remote_code": True`, otherwise the example scripts fail. The same issue notes that additional setup steps are undocumented. Source: community issue #13

For JavaScript users, a Webpack example demonstrates how to bundle the SDK for the browser, and a summarizing question-answering example chains vector recall with a summarization model. Source: [korvus/javascript/examples/README.md:5-9]()

## Operational Notes and Common Failure Modes

Korvus reuses a Tokio runtime and a lazily initialized global `PgPool` to manage database connections, and it expects a `DATABASE_URL` (or equivalent) to be available in the environment. Long-running or misbehaving pipelines hosted on Korvus Cloud have surfaced in community issue #23 as "worker error (os error 11)" — typically indicating that the Python worker process crashed mid-execution. Restarting the server or re-creating the project database is the documented workaround; deeper diagnostics require inspecting the worker logs on the PostgresML side. Source: community issue #23, [korvus/src/lib.rs:1-30]()

## See Also

- [Rust Bridge internals](rust-bridge.md)
- [Collection API reference](collection-api.md)
- [Pipeline schema reference](pipeline-schema.md)
- Official documentation: <https://postgresml.org/docs/open-source/korvus/>

---

<a id='page-2'></a>

## System Architecture and Core Components

### Related Pages

Related topics: [Overview and Getting Started](#page-1), [RAG Operations, Configuration, and Common Failure Modes](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)
- [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)
- [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)
- [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)
- [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)
- [rust-bridge/README.md](https://github.com/postgresml/korvus/blob/main/rust-bridge/README.md)
- [korvus/javascript/package.json](https://github.com/postgresml/korvus/blob/main/korvus/javascript/package.json)
- [CONTRIBUTING.md](https://github.com/postgresml/korvus/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/postgresml/korvus/blob/main/README.md)
</details>

# System Architecture and Core Components

## Purpose and Scope

Korvus is described in its crate-level documentation as "an open source alternative for building end-to-end vector search applications without OpenAI and Pinecone" ([korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)). The SDK enables developers to manage database tables related to documents, text chunks, splitters, LLM models, and embeddings, leveraging PostgreSQL with `PgVector` for indexing and querying.

The system is implemented as a Rust core library with bindings automatically generated for Python, JavaScript, and C. This architecture lets a single Rust codebase power several language SDKs while still exposing idiomatic APIs to each target language.

## High-Level Architecture

The following diagram captures the relationship between the Rust core, the language bindings, and the underlying PostgreSQL data store.

```mermaid
flowchart TB
    subgraph Client["Client SDKs"]
        PY["Python SDK"]
        JS["JavaScript SDK"]
        C["C SDK"]
    end

    subgraph Bridge["rust-bridge (proc macros)"]
        MAC["#[derive(alias)]<br/>#[alias_methods(...)]"]
    end

    subgraph Core["korvus Rust Core (lib.rs)"]
        COL["Collection"]
        PIPE["Pipeline"]
        MOD["Model"]
        SPL["Splitter"]
        BIN["Builtins / OpenSourceAI"]
    end

    subgraph Store["PostgreSQL + PgVector"]
        PG[("Documents, Pipelines,<br/>Embeddings tables")]
    end

    PY --> MAC
    JS --> MAC
    C --> MAC
    MAC --> Core
    Core --> PG
```

The crate root in [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs) wires every internal module together and re-exports the four primary public types: `Collection`, `Pipeline`, `Model`, and `Splitter`, plus the `Builtins`, `OpenSourceAI`, and `TransformerPipeline` helpers. Each language binding is gated behind a Cargo feature, which is why [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs) only pulls in `javascript`, `python`, or `c` submodules when the corresponding feature is enabled.

## The rust-bridge Translation Layer

A key piece of the architecture is the `rust-bridge` workspace, which is a custom Rust-to-Rust transpiler. As [rust-bridge/README.md](https://github.com/postgresml/korvus/blob/main/rust-bridge/README.md) explains, the goal is not to convert Rust into Python or JavaScript directly, but to use `PyO3` and `Neon` bindings behind procedural macros so that the same source struct or `impl` block can be re-exported into multiple languages.

The macros are defined in [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs) and include:

- `#[derive(alias)]` — placed on a struct, this emits parallel `Python`, `C`, and `JavaScript` wrapper types.
- `#[alias_methods(...)]` — placed on an `impl` block, this lists which methods should be exposed in each language binding.
- `#[derive(alias_manual)]` — escape hatch for hand-written bindings.

In the core crate you can see this in action: `Collection` is annotated with `#[cfg_attr(feature = "rust_bridge", derive(alias))]` and its methods are listed in `#[alias_methods(new, upsert_documents, get_documents, ...)]` ([korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)). The same pattern is used for `Pipeline` in [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).

## Core Domain Model

The four public abstractions form a layered model: a `Collection` owns one or more `Pipeline`s, and each `Pipeline` composes `Model` and `Splitter` actions declared in a JSON schema.

| Component | Defined in | Responsibility |
|-----------|-----------|----------------|
| `Collection` | `korvus/src/collection.rs` | Holds documents, references the pipelines table, manages vector search events and CRUD on documents |
| `Pipeline` | `korvus/src/pipeline.rs` | Describes field-level transformations (splitting, embedding, full-text search) parsed from a JSON schema |
| `Model` | `korvus/src/model.rs` | Represents an embedding or transformer model referenced by a pipeline |
| `Splitter` | `korvus/src/splitter.rs` | Represents a text-chunking strategy applied before embedding |
| `Builtins`, `OpenSourceAI`, `TransformerPipeline` | re-exported from `korvus/src/lib.rs` | Convenience wrappers for hosted/open-source models and Hugging Face transformers |

`Pipeline` accepts a JSON schema whose values are deserialized into a `ValidFieldAction` and then converted into a `FieldAction` containing optional `SplitterAction`, `SemanticSearchAction`, and `FullTextSearchAction` ([korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)). The schema parser rejects duplicate field keys, ensuring deterministic pipelines.

The `Collection` struct stores its name, an optional `database_url`, the table names for pipelines and documents, and a `CollectionDatabaseData` cache hydrated from PostgreSQL ([korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)). A `ProjectTask` enum (with variants such as `Embedding`, `QuestionAnswering`, `Summarization`, `Translation`, and others) is carried inside `ProjectInfo` so that pipelines know what task they are configured for.

## Runtime and Persistence

Inside `korvus/src/lib.rs` a lazy `RwLock<HashMap>` caches `PgPool` connections, and a lazily-built Tokio runtime keeps async database work executable from synchronous bindings (such as the C FFI). The runtime uses `Builder` and `Runtime` from `tokio` along with a `tracing` subscriber, and connections are created via `PgPoolOptions` with a configurable timeout. This is what allows the same compiled crate to be reused by Python, JavaScript, and C without each language having to manage its own event loop.

Schema management is handled by the `migrations` module, also exposed as `pub mod` from [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs), which keeps the document, chunk, and embedding tables in sync with the SDK version recorded on collection creation.

## Distribution and Community Surface

The published JavaScript package is versioned at `1.1.5` according to [korvus/javascript/package.json](https://github.com/postgresml/korvus/blob/main/korvus/javascript/package.json), and Python and JavaScript example suites ship under `korvus/python/examples/` and `korvus/javascript/examples/` respectively. Both README files at the package level describe Korvus as an "Open Source Alternative for Building End-to-End Vector Search Applications without OpenAI & Pinecone" ([korvus/README.md](https://github.com/postgresml/korvus/blob/main/korvus/README.md)).

Several open community issues relate directly to this architecture:

- Language coverage is a recurring request: issues [#6 (Go)](https://github.com/postgresml/korvus/issues/6), [#9 (PHP)](https://github.com/postgresml/korvus/issues/9), and [#24 (.NET)](https://github.com/postgresml/korvus/issues/24) ask for new bindings. Because of the `rust-bridge` design, a new binding requires adding a new submodule under `korvus/src/languages/` (currently `javascript`, `python`, `c` per [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)) and a matching generator in `rust-bridge-macros`.
- Issue [#13](https://github.com/postgresml/korvus/issues/13) highlights a documentation gap around Hugging Face pipelines, where users must pass `"trust_remote_code": true`. This is a configuration of the underlying model loader exposed through the `Pipeline` schema, not a Korvus-level option, which is why clear documentation of model-side parameters is essential.
- Issue [#23](https://github.com/postgresml/korvus/issues/23) describes a Korvus Cloud worker crash with `os error 11`, pointing to the operational surface (worker process, database URL) rather than the SDK core.

## See Also

- [Pipeline Schema and Field Actions](Pipeline-Schema-and-Field-Actions.md)
- [rust-bridge Macros and Code Generation](rust-bridge-Macros-and-Code-Generation.md)
- [Multi-Language SDK Bindings](Multi-Language-SDK-Bindings.md)
- [Deployment and Operational Runbook](Deployment-and-Operational-Runbook.md)

---

<a id='page-3'></a>

## Multi-Language SDK Bindings and Examples

### Related Pages

Related topics: [Overview and Getting Started](#page-1), [System Architecture and Core Components](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)
- [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)
- [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)
- [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)
- [korvus/javascript/package.json](https://github.com/postgresml/korvus/blob/main/korvus/javascript/package.json)
- [korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md)
- [korvus/python/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/python/examples/README.md)
- [rust-bridge/README.md](https://github.com/postgresml/korvus/blob/main/rust-bridge/README.md)
- [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)
- [rust-bridge/rust-bridge-macros/src/common.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/common.rs)
- [rust-bridge/rust-bridge-macros/src/python.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/python.rs)
- [rust-bridge/rust-bridge-macros/src/c.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/c.rs)
- [rust-bridge/rust-bridge-macros/src/types.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/types.rs)
- [CONTRIBUTING.md](https://github.com/postgresml/korvus/blob/main/CONTRIBUTING.md)
- [README.md](https://github.com/postgresml/korvus/blob/main/README.md)
</details>

# Multi-Language SDK Bindings and Examples

## Overview

Korvus is positioned as an open-source alternative for building end-to-end vector search applications without depending on OpenAI and Pinecone, with a single core written in Rust and bindings generated for multiple high-level languages ([README.md](https://github.com/postgresml/korvus/blob/main/README.md)). The repository is organized as a monorepo containing the core Rust SDK at [`korvus/src/`](https://github.com/postgresml/korvus/tree/main/korvus/src), per-language packaging directories under `korvus/python/` and `korvus/javascript/`, and a separate `rust-bridge/` workspace used to generate the foreign function interface glue automatically ([korvus/src/lib.rs:9-44](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)).

The `Collection`, `Pipeline`, `Model`, `Splitter`, and `TransformerPipeline` types declared in the core crate are re-exported through language-specific submodules gated by Cargo features. The `languages` module is the single dispatch point that enables Python, JavaScript, or C compilation: only the relevant submodule is included via `cfg` attributes ([korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)).

## Rust-Bridge Code Generation

Korvus does not maintain separate hand-written bindings. Instead, the `rust-bridge` and `rust-bridge-macros` crates implement a Rust-to-Rust transpiler that emits PyO3, Neon, and C-compatible wrappers from a single annotated Rust source ([rust-bridge/README.md:1-39](https://github.com/postgresml/korvus/blob/main/rust-bridge/README.md)). The macro entry points are:

| Macro | Purpose |
|-------|---------|
| `#[derive(alias)]` | Generate a wrapper struct for a target type in every enabled language |
| `#[alias_methods(...)]` | Generate language-specific wrappers for selected methods on the type |
| `#[derive(alias_manual)]` | Opt out of automatic generation for hand-managed surfaces |

The procedural-macro driver fans the input out to per-language generators and concatenates the result ([rust-bridge/rust-bridge-macros/src/lib.rs:3-50](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)). A `SupportedLanguage` enum (`C`, `Python`, `JavaScript`) tracks which languages the wrapper targets, and individual methods may be excluded per language via attribute arguments ([rust-bridge/rust-bridge-macros/src/common.rs:11-58](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/common.rs)). Type conversion is handled centrally in `types.rs`, mapping Rust primitives and `Vec`, `HashMap`, `Option`, and tuple types into language-native syntax ([rust-bridge/rust-bridge-macros/src/types.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/types.rs)).

```mermaid
flowchart LR
    A[Rust SDK Source<br/>Collection, Pipeline, Model] --> B["#[derive(alias)]<br/>#[alias_methods(...)]"]
    B --> C[rust-bridge-macros]
    C --> D[PyO3 wrappers<br/>Python]
    C --> E[Neon wrappers<br/>JavaScript]
    C --> F[C FFI wrappers]
    D --> G[korvus python package]
    E --> H[korvus javascript package]
    F --> I[C consumers]
```

The C generator, for example, produces an opaque pointer wrapper and `unsafe CustomInto` implementations that box and unbox the underlying Rust type for C callers ([rust-bridge/rust-bridge-macros/src/c.rs:10-58](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/c.rs)). The Python generator emits `pyo3::pyclass` wrappers that convert arguments and call through to the inner Rust methods ([rust-bridge/rust-bridge-macros/src/python.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/python.rs)). Both follow the same boxed-struct pattern: the language struct wraps the Rust struct, and FFI helpers translate ownership in and out.

## Supported Language Surfaces

The currently enabled language bindings are Python, JavaScript, and C ([korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)). Each is gated by its own Cargo feature, so a build can be limited to a single language surface.

### Python

The Python package ships with a CLI (enabled under the `python` feature) and example scripts covering common retrieval patterns. Available examples documented in `korvus/python/examples/README.md` include:

- Semantic search, optionally using a custom embedding model instead of the default `intfloat/e5-small-v2` ([korvus/python/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/python/examples/README.md)).
- Extractive question answering that pipes `vector_recall` results as `context` into a HuggingFace question answering model via `Builtins.transform()`.
- Table question answering using the `deepset/all-mpnet-base-v2-table` model and the OTT-QA dataset.
- Summarizing question answering, which retrieves documents and then summarizes them.

Community issue #13 ("Add documentation") explicitly notes that running the example scripts requires passing `"trust_remote_code": True` when a HuggingFace model is referenced inside a pipeline schema — for instance under a `semantic_search` block — because the model loader otherwise refuses to fetch custom code from the Hub.

### JavaScript

The JavaScript package is published as `korvus` version `1.1.5` and exposes `index.js` as the main entry, built via a `node build.js` script ([korvus/javascript/package.json](https://github.com/postgresml/korvus/blob/main/korvus/javascript/package.json)). The examples directory documents three end-to-end scripts: a HuggingFace question answering example that uses `Builtins.transform()`, a summarizing question answering example, and a Webpack integration example ([korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md)).

### C

The C binding is the thinnest surface and is intended for systems-level consumers. It is generated through the same macro pipeline as Python and JavaScript but produces raw FFI wrappers ([rust-bridge/rust-bridge-macros/src/c.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/c.rs)).

## Shared Domain Types Across Bindings

Because all bindings are generated from a single annotated source, the type vocabulary is consistent. The `ProjectTask` enum, for example, maps identical string forms in every binding — `"regression"`, `"classification"`, `"question_answering"`, `"summarization"`, `"translation"`, `"text_classification"`, `"text_generation"`, `"text2text"`, `"embedding"` — and panics on unknown values ([korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)). The `Pipeline` type accepts a JSON schema whose keys name document fields and whose values describe the transformations to apply, validated once at construction ([korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)). Methods such as `new`, `upsert_documents`, `get_documents`, `search`, `vector_search`, `add_pipeline`, and `remove_pipeline` are exposed through the `alias_methods` macro so they appear identically in every language ([korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)).

## Community-Requested Language Bindings

Several issues request additional languages that are not currently generated by the `rust-bridge` macros:

- **Golang** — requested in issue #6 ("Add Golang"). No Go submodule exists under `korvus/src/languages/`.
- **PHP** — requested in issue #9 ("Add PHP"), motivated by the Laravel installed base. No PHP binding is currently generated.
- **.NET** — requested in issue #24 ("Add .net support"). No .NET submodule exists today.

Each of these would require a new generator module in `rust-bridge/rust-bridge-macros/src/` (analogous to `python.rs`, `c.rs`) and a corresponding `languages/<lang>` submodule gated by a Cargo feature. Contributors are pointed at `CONTRIBUTING.md` for the general workflow (fork, branch, code, test, PR) ([CONTRIBUTING.md](https://github.com/postgresml/korvus/blob/main/CONTRIBUTING.md)).

## See Also

- [Project Architecture](project-architecture.md)
- [Pipeline Schema Reference](pipeline-schema.md)
- [Rust-Bridge Macros](rust-bridge-macros.md)
- [Troubleshooting](troubleshooting.md) — covers worker errors such as the `os error 11` reported in issue #23.

---

<a id='page-4'></a>

## RAG Operations, Configuration, and Common Failure Modes

### Related Pages

Related topics: [System Architecture and Core Components](#page-2), [Multi-Language SDK Bindings and Examples](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)
- [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)
- [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs)
- [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs)
- [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)
- [korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md)
- [README.md](https://github.com/postgresml/korvus/blob/main/README.md)
</details>

# RAG Operations, Configuration, and Common Failure Modes

## 1. Purpose and Scope

Korvus is described as "an open source alternative for building end-to-end vector search applications without OpenAI and Pinecone," where the SDK manages "database tables related to documents, text chunks, text splitters, LLM models, and embeddings," leveraging PgVector for retrieval. Source: [README.md](https://github.com/postgresml/korvus/blob/main/README.md) and [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs).

Within that scope, a *Retrieval-Augmented Generation (RAG) operation* is the orchestrated flow that:

1. Splits raw documents into chunks via a `Splitter`.
2. Embeds chunks through a transformer model (semantic search).
3. Optionally combines full-text search (`FullTextSearchAction`).
4. Reranks / queries the relevant chunks to feed an LLM prompt (summarization, question answering, text generation, etc.).

The supporting task types declared in the codebase — `QuestionAnswering`, `Summarization`, `Translation`, `TextGeneration`, `Text2text`, `Embedding` — map directly onto RAG use cases. Source: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs). Every RAG job is therefore staged against a `ProjectInfo` record that carries the `task` field. Source: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs).

## 2. Pipeline Configuration

The configurable unit is `Pipeline::new(name, schema)`. `schema` is a JSON object whose keys are field names and whose values are *field actions*. Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs). JSON is parsed via `json_to_schema`, which fails if the object is not an object or contains duplicate keys. Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).

`FieldAction` is composed from three optional actions, and is built by `FieldAction::try_from(ValidFieldAction)`. Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).

| Field Action Type | Inner Configuration | Purpose in RAG |
|---|---|---|
| `SplitterAction` | `model: Splitter` (built via `Splitter::new(v.model, v.parameters)`) | Chunks raw documents before embedding |
| `SemanticSearchAction` | `model: Model` (built via `Model::new(Some(v.model), v.source, v.parameters)`) plus `hnsw: HNSW` (defaults via `HNSW::default()` when omitted) | Embeds chunks and indexes them with HNSW for fast nearest-neighbor retrieval |
| `FullTextSearchAction` | passed through verbatim from the schema | Adds lexical / BM25-style recall alongside vector recall |

Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).

```mermaid
flowchart LR
    A[Document] -->|upsert_documents| B[(Collection: documents table)]
    B -->|splitter action| C[Chunks]
    C -->|semantic_search action| D[Embeddings + HNSW index]
    C -->|full_text_search action| E[tsvector index]
    D --> F[vector_search / search]
    E --> F
    F --> G[LLM prompt<br/>summarization · QA · text_generation]
```

`Collection` ties pipelines and documents together, exposing `add_pipeline`, `remove_pipeline`, `enable_pipeline`, `disable_pipeline`, `vector_search`, `search`, and `add_search_event` as the operational surface. Source: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs).

Library-level re-exports clarify which building blocks participate in RAG: `Builtins`, `OpenSourceAI`, `TransformerPipeline`, `Model`, `Splitter`, `Pipeline`, `Collection`. Source: [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs). `Builtins.transform()` is the JS/TS hook used to run models against the database, as shown in the question-answering example. Source: [korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md).

## 3. Multi-Language Operation Surface

Korvus ships the same RAG surface to multiple languages via a custom `rust_bridge` that emits FFI wrappers from a single Rust source. The macro `#[proc_macro_derive(alias)]` emits Python, JavaScript, and C alias structs around each annotated type, and `#[proc_macro_attribute(alias_methods)]` emits matching method wrappers. Source: [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs). Language modules are gated behind Cargo features: `python`, `javascript`, `c`. Source: [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs). This is why the same `Pipeline` and `Collection` definitions (decorated with `#[cfg_attr(feature = "rust_bridge", derive(alias))]` and `alias_methods(...)`) appear identically behind each SDK. Source: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs) and [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).

## 4. Common Failure Modes

The repository itself documents no dedicated troubleshooting guide, but several failure modes are visible in the source and align with community reports:

- **Invalid pipeline schema.** `json_to_schema` returns an error when the top-level value is not a JSON object or contains duplicate keys, and `ValidFieldAction` deserialization fails on malformed actions. Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs). This is the most common cause of a `Pipeline::new` rejection.
- **Unknown project task string.** Constructing a `ProjectTask` from a string that is not in the enumerated set panics: `panic!("Unknown project task: {}", s)`. Source: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs). A typo in the task name during migration or SDK version skew will surface here.
- **Hugging Face model parameter gaps.** Community issue #13 reports that running the example scripts requires passing `"trust_remote_code": True` for Hugging Face models in a `semantic_search` action. Source: [korvus/javascript/examples/README.md](https://github.com/postgresml/korvus/blob/main/korvus/javascript/examples/README.md) and community context. Treat the `parameters` map on `Model::new` as the authoritative place to inject such flags. Source: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs).
- **Korvus Cloud worker errors (os error 11).** Community issue #23 reports pipelines stuck with worker errors even after recreating the server; because the SDK relies on a long-lived async runtime created inside the library, container-level resource exhaustion typically manifests here. Source: [korvus/src/lib.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/lib.rs).
- **Missing language binding.** Community requests for Golang, PHP, and .NET support (#6, #9, #24) reflect that the FFI layer is *currently* limited to Python, JavaScript, and C, as defined by the language module gates. Source: [korvus/src/languages/mod.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/languages/mod.rs) and [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs). Adding a new binding requires extending `rust-bridge-macros`, not just adding a wrapper by hand.

For long-lived reliability, the contributing guide recommends pairing new features with tests and updates to README/examples — the same loop that closed the `trust_remote_code` gap. Source: [CONTRIBUTING.md](https://github.com/postgresml/korvus/blob/main/CONTRIBUTING.md).

## See Also

- Korvus top-level overview: [README.md](https://github.com/postgresml/korvus/blob/main/README.md)
- Pipeline actions reference: [korvus/src/pipeline.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/pipeline.rs)
- Collection lifecycle methods: [korvus/src/collection.rs](https://github.com/postgresml/korvus/blob/main/korvus/src/collection.rs)
- rust-bridge macro reference: [rust-bridge/rust-bridge-macros/src/lib.rs](https://github.com/postgresml/korvus/blob/main/rust-bridge/rust-bridge-macros/src/lib.rs)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: postgresml/korvus

Summary: Found 17 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/9

## 2. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/13

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/22

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/17

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/10

## 6. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/23

## 7. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/18

## 8. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/postgresml/korvus

## 9. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: packet_text.keyword_scan | https://github.com/postgresml/korvus

## 10. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/postgresml/korvus

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/postgresml/korvus

## 12. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/postgresml/korvus

## 13. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/16

## 14. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/20

## 15. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/postgresml/korvus/issues/12

## 16. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/postgresml/korvus

## 17. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/postgresml/korvus

<!-- canonical_name: postgresml/korvus; human_manual_source: deepwiki_human_wiki -->
