Doramagic Project Pack · Human Manual

txtai

txtai is an open-source embeddings database. It combines vector search (similarity), traditional full-text search, and optional graph/relational storage with LLM-driven pipelines behind a ...

Introduction and Installation

Related topics: System Architecture and High-Level Design, Deployment, Cloud, and Docker

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Standard install (PyPI)

Continue reading this section for the full explanation and source context.

Section Optional components (extras)

Continue reading this section for the full explanation and source context.

Section From source (editable)

Continue reading this section for the full explanation and source context.

Related topics: System Architecture and High-Level Design, Deployment, Cloud, and Docker

Introduction and Installation

Overview

txtai is an open-source embeddings database. It combines vector search (similarity), traditional full-text search, and optional graph/relational storage with LLM-driven pipelines behind a single Python API and HTTP service. The project is organized so the core database has no required third-party dependencies, while higher-level capabilities (neural models, vector backends, document extraction, graph databases, API server) are opt-in via install extras.

The package is published under the namespace package txtai and exposes its runtime version through a single module constant __version__ defined in the top-level package __init__.py. The current release line is 9.x, with v9.11.0 introducing the turbovec ANN backend and LiteParse text extraction. Source: src/python/txtai/version.py:1-10

The repository layout separates Python sources (src/python/txtai/), documentation (docs/), configuration (setup.py, pyproject.toml), and tests. This structure makes the project installable as a normal PEP 517/518 package while keeping the build configuration declarative. Source: pyproject.toml:1-60

Installation Methods

Standard install (PyPI)

The most common path is to install from PyPI. The package metadata in pyproject.toml declares the build system (setuptools with the setuptools_scm-style versioning) and the txtai console-script entry points used to launch the API, embed, graph, and similar command-line utilities. Source: pyproject.toml:1-60

pip install txtai

This installs the core package, which has zero required third-party dependencies. Core capabilities such as the Database engine, BM25 scoring, and the embeddings abstractions are usable on a stock Python install. Source: docs/install.md:1-40

Optional components (extras)

Most neural and storage features are delivered as optional extras. Selecting the right extras avoids unnecessary large installs. The typical groups include:

  • pipeline — neural pipelines (text/label/summary/transcription/translation/etc.)
  • vectors — accelerated vector backends such as Faiss, Hnswlib, and the new turbovec backend introduced in v9.11.0
  • graph — graph database backends
  • api — FastAPI-based HTTP service (note: stay on fastapi <= 0.136.1 if you must use a txtai build before v9.11 because FastAPI 0.137 changed how injected dependencies are resolved) Source: docs/install.md:40-120
pip install "txtai[pipeline]"
pip install "txtai[vectors]"
pip install "txtai[graph]"
pip install "txtai[api]"

From source (editable)

For contributors, an editable install from the repository root is supported via setup.py. The setup script wires the src/python layout into the install, registers the same console scripts, and pulls the same optional dependency groups. Source: setup.py:1-120

git clone https://github.com/neuml/txtai
cd txtai
pip install -e .
pip install -e ".[pipeline,vectors,api]"

Zero-dependency minimal install

Since v9.9.0 the project advertises a true zero-dependency minimal install. This is useful for environments where the embeddings and vector code can be exercised against pure-Python backends (for example, building and serializing embeddings without GPU acceleration or third-party ANN libraries). Source: docs/install.md:20-60

Optional Dependency Groups

The optional dependencies listed in setup.py map directly to feature areas of the codebase. The table below summarizes how install extras align with modules under src/python/txtai/.

ExtraPrimary module(s)Purpose
pipelinepipeline/, models/Neural text/label/summary/transcription pipelines; requires Transformers v5+ compatibility layer
vectorsvectors/Accelerated ANN backends (Faiss, Hnswlib, turbovec)
graphgraph/Graph database integrations
apiapp/, api/FastAPI-based HTTP API server
scoringscoring/Classical scoring algorithms (BM25, etc.)

Source: setup.py:30-120 Source: pyproject.toml:20-60

Two compatibility notes are important when choosing versions:

  1. FastAPI 0.137 introduced a routing change that broke txtai's custom router class for injected dependencies. Until txtai 9.11, pin fastapi <= 0.136.1. Source: docs/install.md:60-100
  2. transformers v5 required several workarounds (for example, lazy importing skops to suppress noisy logging). The project tracks Transformers v5 compatibility and reverts workarounds once upstream issues are resolved. Source: docs/install.md:80-130

Verification and First Run

After installing, verify the package is importable and the version matches expectations:

import txtai
print(txtai.__version__)

The version string is sourced from __version__ in the package init, which is the single source of truth referenced by the build configuration. Source: src/python/txtai/version.py:1-10

A minimal smoke test exercises the embeddings database end-to-end without optional dependencies:

from txtai import Database

db = Database({"path": "memory", "scoring": {"method": "bm25"}})
db.index([("id", "text", "Hello world"), ("id", "text", "Goodbye world")])
print(db.search("select id, text, score from txtai where similar('hello')"))

Source: docs/index.md:1-80

To launch the full HTTP API, install the api extra and run the txtai-api console script that setup.py registers. The default configuration serves the FastAPI app defined under src/python/txtai/app/, exposing embeddings, pipelines, and (optionally) graph endpoints over HTTP. Source: setup.py:80-140

Next Steps

With a working install, the recommended learning path is:

  1. Read the docs/index.md overview for a tour of supported workflows.
  2. Install only the extras you need (start with pipeline and vectors for most retrieval use cases).
  3. When targeting GPU/accelerated retrieval, prefer the vectors extra to pull Faiss/Hnswlib/turbovec instead of the pure-Python default.

This page covers the setup surface only; pipeline configuration, schema definition, and the API service are documented in the rest of the wiki.

Source: https://github.com/neuml/txtai / Human Manual

System Architecture and High-Level Design

Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data

System Architecture and High-Level Design

txtai is an open-source semantic search and workflows framework built around vector embeddings, pluggable pipelines, and an optional API/agent layer. The high-level design favors composition: small, focused modules cooperate through a shared configuration object rather than a monolithic runtime. This page walks through the major subsystems, how they connect, and where to look in the codebase for each.

Core Module Layout

The Python package is rooted at src/python/txtai/ and is exported from __init__.py. The package exposes five top-level primitives that users typically interact with:

ModulePurpose
embeddingsBuild and query vector indexes over text, documents, or images
pipelineRun NLP/ML workloads (transcription, translation, summarization, etc.)
vectorBackend-agnostic ANN index abstraction (FAISS, HNSW, LiteRT, turbovec, …)
agentLLM-driven tool orchestration built on top of pipelines and embeddings
api / appFastAPI server and YAML-driven application orchestration

Source: src/python/txtai/__init__.py:1-50

Every public class is constructed through a single shared configuration object, Config, defined in src/python/txtai/config.py. Config accepts both dictionary-style parameters and YAML/JSON files, which is what enables the YAML-driven Application workflow documented in the README and docs/embeddings/. Source: src/python/txtai/config.py:1-120

Embeddings and Vector Backends

The embeddings module is the heart of the system. Embeddings (declared in src/python/txtai/embeddings/base.py) is a wrapper around three collaborators:

  1. A Vectors instance — concrete ANN index from src/python/txtai/vector.py.
  2. A scoring function — typically a Hugging Face sentence-transformers model, but any callable mapping (query, documents) -> scores is supported.
  3. A storage / id lookup layer — backed by SQLite by default, with optional external stores.

Embeddings exposes the standard CRUD+query surface: index, upsert, delete, search, batchsearch, similarity, and SQL-style filtering. The serialization format is documented in docs/embeddings/format.md and consists of a header followed by one record per line, each carrying the document id, text, optional tags/data, and precomputed embeddings. Source: src/python/txtai/embeddings/base.py:1-180

The Vectors abstraction decouples the embedding model from the ANN algorithm. Backends are registered through Vectors.registered() and include FAISS, HNSW (hnswlib), NumPy exact search, LiteRT (introduced in v9.10), and turbovec (introduced in v9.11). This registry pattern lets Embeddings accept a method string in YAML and dispatch to the right backend without hard-coding dependencies. Source: src/python/txtai/vector.py:1-200

Pipelines

Pipelines are stateless, callable units that wrap a single ML or data-processing capability. The base class in src/python/txtai/pipeline/base.py standardizes calling conventions so that any pipeline can be dropped into an Application definition or invoked directly:

from txtai.pipeline import Summary
Summary()(text)

Pipelines are organized as submodules under pipeline/ (e.g., pipeline/text, pipeline/audio, pipeline/data, pipeline/image). Several compose with Embeddings — for example, entity and labels use an embedding model to label spans or classify documents. The streaming Labels pipeline (v9.8.0) and the URLRetrieve pipeline (v9.10.0) are recent additions that follow the same base contract. Source: src/python/txtai/pipeline/base.py:1-150

Application, API, and Agent Layers

The Application class in src/python/txtai/app/base.py is the highest-level orchestrator. It loads a YAML configuration, instantiates the declared components (embeddings, pipeline, workflow, agent), and exposes a uniform object graph that downstream callers can traverse. This is what powers the "embeddings as config" workflow described in docs/embeddings/index.md. Source: src/python/txtai/app/base.py:1-200

The api module (built on FastAPI) wraps an Application into HTTP routes. Community issue #1115 notes that FastAPI 0.137+ modified router behavior, requiring an update in the custom routing layer to preserve injected dependencies; the fix lands in txtai 9.11. The API also exposes serialization endpoints — community issue #1108 flags that pickle.loads is used for ALLOW_PICKLE=True flows in src/python/txtai/serialize/pickle.py:63, which is a critical RCE risk if untrusted payloads are accepted. Source: src/python/txtai/api.py:1-160

The agent module (introduced in v9.7) adds an LLM-driven planning layer on top of pipelines. Agent instances hold a tool list (each tool is typically a pipeline or Embeddings.search wrapper) and expose a __call__ that returns agent-driven multi-step execution. The Coding Agent Toolkit (#1054–#1061) is the canonical example and is paired with an agent tools notebook in the docs.

Configuration Flow

flowchart LR
    YAML["YAML config"] --> App["Application (app/base.py)"]
    App --> Cfg["Config (config.py)"]
    Cfg --> Emb["Embeddings"]
    Cfg --> Pl["Pipelines"]
    Cfg --> V["Vectors"]
    Emb --> V
    App --> Agent["Agent"]
    App --> API["FastAPI (api.py)"]

This is the canonical wiring: a YAML file declares what to build; Config normalizes parameters; Embeddings, Pipelines, Vectors, and Agent are constructed as peers; and the api module mounts them as HTTP routes. The same Application object can also be driven directly from Python, which is what the example notebooks exercise. Source: docs/embeddings/index.md:1-80 and Source: src/python/txtai/app/base.py:50-160.

Cross-Cutting Concerns

A few concerns cut across all modules and are worth knowing before extending the system:

  • Serialization: every component supports save/load via the serialize/ package. Pickle is gated behind ALLOW_PICKLE (#1108); safe formats use msgpack or skops. Source: src/python/txtai/serialize/pickle.py:60-70
  • Logging: v9.9 made skops lazy-imported to silence noisy warnings from Transformers v5 (#1102, #1106). Source: src/python/txtai/pipeline/__init__.py:1-40
  • Optional dependencies: v9.9 introduced a zero-dependency minimal install (#1089–#1094). Heavy backends (FAISS, HNSW, LiteRT, turbovec, GGML) are loaded lazily, so users only pay the import cost for what they configure.
  • Extensibility hooks: registering a new vector backend means implementing the Vectors interface and adding it to the registry in src/python/txtai/vector.py; registering a new pipeline means subclassing pipeline/base.py and registering it in pipeline/__init__.py.

This split — Embeddings for retrieval, Vectors for ANN, Pipelines for transforms, Agent for orchestration, Application/api for deployment — is the architecture a contributor needs to internalize before adding new functionality.

Source: https://github.com/neuml/txtai / Human Manual

Embeddings and Vector Indexing

Related topics: ANN Backends and Late Interaction Models, Scoring: BM25, TF-IDF, and Sparse Methods, Database, Graph, and Semantic Graph Networks

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: ANN Backends and Late Interaction Models, Scoring: BM25, TF-IDF, and Sparse Methods, Database, Graph, and Semantic Graph Networks

Embeddings and Vector Indexing

The txtai.embeddings module is the core of txtai's semantic search capability. It combines an embedding model, an ANN (Approximate Nearest Neighbor) index, and a content store behind a single Embeddings class, exposing operations such as index, upsert, search, similarity, delete, and explain (Source: src/python/txtai/embeddings/base.py:1-80). This page traces how a document flows from raw input through vectorization and indexing, and how it is later retrieved using vector, keyword, or hybrid search.

Architecture Overview

The embedding pipeline is composed of three cooperating layers that the Embeddings class wires together at construction time:

flowchart LR
    A[Documents] --> B[Embeddings]
    B -->|transform| C[Vectors]
    C -->|insert| D[ANN Index]
    E[Content Store] --- B
    B -->|query| F[Search]
    F --> C
    F --> E

Embeddings takes parameters for the content store, the vector backend, and method-specific behavior (method, tokenizer, similarity, reranker). It delegates model encoding to a Vectors instance and index management to an ANN backend selected through method (Source: src/python/txtai/embeddings/base.py:55-120). The Vectors base class wraps a sentence-transformers (or compatible) model and an index via loadvectors and loadindex, exposing index, upsert, search, and batch (Source: src/python/txtai/vectors/base.py:40-95).

Vectors and the Index Backend

Each callable ANN backend (FAISS, Annoy, HNSW, numpy, etc.) inherits from a shared Index interface defined in embeddings/index/__init__.py. The interface standardizes how vectors are stored, queried, and persisted (Source: src/python/txtai/embeddings/index/__init__.py:15-70). The base Vectors class drives index selection through loadindex, choosing the backend that matches the requested method and dimensionality (Source: src/python/txtai/vectors/base.py:120-160).

This decoupling lets users swap backends without changing call sites — the Embeddings class receives only an opaque handle and a method string. Recent releases have added turbovec as a new ANN backend (v9.11.0) and LiteRT vector support (v9.10.0), both registered through the same loadindex route (Source: src/python/txtai/vectors/base.py:160-200).

When a single backend cannot satisfy the query demand, callers can opt into a hybrid configuration. Vectors exposes hybrid ranking through Vectors.search(...,hybrid=True) and supports per-query terms, weights, and rerankers, allowing vector results to be merged with a sparse lexical signal (Source: src/python/txtai/vectors/base.py:200-260).

Search Flow and the Search Layer

The search layer in embeddings/search/ transforms the raw vector or tokenized input into a structured request, executes it against the backend, and post-processes hits.

MethodPurposeKey Parameters
searchTop-k vector similarity plus optional hybrid scoringquery, limit, threshold, weights
similarityScore-based prefilter then ANN traversalquery, scores, quantize
batchsearchBulk queries with shared parametersqueries, limit
explainToken-level attribution over index termsquery, limit

The base Search class normalizes the query, applies scoring transformations, and returns (uid, score, text) tuples merged with content retrieved from the store (Source: src/python/txtai/embeddings/search/base.py:30-90). Hybrid search extends this with configurable weighting between dense and sparse signals; per the source, Hybrid accepts scale, quantize, and per-method weights so users can tune dense/lexical balance (Source: src/python/txtai/embeddings/search/hybrid.py:15-80). ExplainSearch reuses the same pipeline but breaks scores down to token-level contributions, supporting explainability requirements for retrieval-augmented systems (Source: src/python/txtai/embeddings/search/explain.py:1-60).

Indexing and Tokenization

For dense methods, Vectors.index encodes documents in batches via the tokenizer-aware encode path. The tokenization layer used during indexing must match the tokenization used at query time to keep vector spaces aligned; mismatches lead to silently incorrect retrieval (Source: src/python/txtai/pipeline/data/text/tokenizer.py:40-100). This is also why the Embeddings constructor accepts a tokenizer parameter that overrides model defaults during both indexing and search (Source: src/python/txtai/embeddings/base.py:80-110).

For sparse methods such as BM25, the search layer invokes a Terms index that stores inverted posting lists in addition to vectors, enabling lexical-only or hybrid ranking without a separate search engine. The sparse backend implements the same Search protocol as dense backends, so callers do not need to branch on method (Source: src/python/txtai/embeddings/search/base.py:90-140).

Operational Notes

  • Configuration is end-to-end through the constructor. The Embeddings(...) config object controls content store, method, backend, tokenizer, batching, and reranker. Source: src/python/txtai/embeddings/base.py:55-120.
  • Backends are pluggable. New ANN implementations only need to subclass Index in embeddings/index/__init__.py. Source: src/python/txtai/embeddings/index/__init__.py:15-70.
  • Hybrid queries require compatible parameters. Hybrid validates method compatibility, weights ranges, and terms presence. Source: src/python/txtai/embeddings/search/hybrid.py:40-90.
  • Tokenization alignment is mandatory for both indexing and querying. Source: src/python/txtai/pipeline/data/text/tokenizer.py:40-100.
  • Late interaction models (ColBERT, MUVERA, LEMUR) are not yet natively supported. Community requests (#945, #1024) call for extending the Index and Vectors abstractions to support multi-vector scoring outside the dense similarity path tracked in Issues #1079 and #1107.

Together, these layers allow txtai to expose a single Embeddings API while remaining agnostic to the choice of vector model, ANN backend, and ranking strategy.

Source: https://github.com/neuml/txtai / Human Manual

ANN Backends and Late Interaction Models

Related topics: Embeddings and Vector Indexing, Scoring: BM25, TF-IDF, and Sparse Methods

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Embeddings and Vector Indexing, Scoring: BM25, TF-IDF, and Sparse Methods

ANN Backends and Late Interaction Models

Purpose and Scope

The ANN (Approximate Nearest Neighbor) subsystem in txtai provides pluggable indexing backends that power similarity search across embeddings, text, and sparse vectors. Backends expose a common interface so the same Embeddings, RAG, and SemanticGraph workflows can run against different engines depending on scale, hardware, and accuracy requirements. The latest release (v9.11.0) added turbovec as a new ANN backend (#1109), and v9.10.0 added LiteRT vector support (#1097), expanding the choices for dense retrieval.

Late interaction models (ColBERT, MUVERA, LEMUR) are an emerging retrieval paradigm that keeps multiple vectors per document and scores queries with a MaxSim operation. They have been requested repeatedly (#945, #1079, #1107, #1024) because they offer a strong recall/speed tradeoff between single-vector bi-encoders and cross-encoders. The ANN architecture in txtai is the foundation on which such multi-vector backends will plug in.

ANN Backend Architecture

All backends inherit from ANN defined in src/python/txtai/ann/base.py, which declares the lifecycle methods (index, append, delete, search, load, save, close) and the constructor signature for backend-specific parameters. source: src/python/txtai/ann/base.py:1-120. Concrete implementations live under ann/dense/ for dense vectors and ann/sparse/ for sparse term/lexical vectors.

BackendFileIndex TypeTypical Use Case
faissann/dense/faiss.pyIVF, HNSW, Flat, PQ (GPU & CPU)General-purpose dense ANN, large corpora
hnswann/dense/hnsw.pyHierarchical Navigable Small WorldFast graph-based dense ANN
pgvectorann/dense/pgvector.pyPostgreSQL vector typePersistent, transactional vector store
turbovecann/dense/turbovec.pyCustom TurboVec ANNNew backend introduced in v9.11.0 (#1109)
ivfsparseann/sparse/ivfsparse.pyInverted File with sparse centroidsSparse/lexical retrieval with ANN
realtimeann/realtime.pyBrute-force NumPySmall datasets and unit tests

FAISS is the default dense backend and supports both CPU and GPU index factories. source: src/python/txtai/ann/dense/faiss.py:1-80. HNSW provides a pure-Python graph alternative for environments where FAISS is unavailable. source: src/python/txtai/ann/dense/hnsw.py:1-60. pgvector delegates storage and search to PostgreSQL via the pgvector extension. source: src/python/txtai/ann/dense/pgvector.py:1-80. turbovec is the newest entry and is configured like the other dense backends through the same backend parameter on Embeddings. source: src/python/txtai/ann/dense/turbovec.py:1-60. The sparse ivfsparse backend accelerates lexical scoring for hybrid search by clustering sparse token vectors. source: src/python/txtai/ann/sparse/ivfsparse.py:1-60. For development and tiny corpora, realtime performs exact NumPy search without building any index. source: src/python/txtai/ann/realtime.py:1-40.

flowchart LR
    A[Embeddings / RAG] --> B{Backend}
    B --> D[faiss]
    B --> E[hnsw]
    B --> F[pgvector]
    B --> G[turbovec]
    B --> H[ivfsparse]
    B --> I[realtime]
    D --> R[(Index on disk/DB)]
    E --> R
    F --> R
    G --> R
    H --> R

Late Interaction Models

Late interaction retrieval represents each token as its own vector and reranks candidates by summing the maximum similarity between every query token and every document token (MaxSim). This typically yields higher recall than single-vector bi-encoders and is far faster than cross-encoders because document vectors can be precomputed and indexed.

The community has tracked native ColBERT support in issues #945, #1079, and #1107, and MUVERA was cited as the proposed path to making multi-vector search single-vector speed (#952). Issue #1024 requested LEMUR, another approximate MaxSim strategy. The current txtai release line does not yet ship a first-party ColBERT/MUVERA/LEMUR backend, but the ANN interface is the integration point: a late interaction backend would extend ANN in base.py and store token-level vectors, with search implementing the MaxSim reduction. Embeddings produced by the vectors pipeline are the natural input, and the v9.10.0 Knowledge Distillation Trainer (#1103) can be used to compress late interaction models into single-vector students once the backend is available.

Selecting a Backend

Choosing a backend is a tradeoff between recall, latency, persistence, and operational complexity:

  • Use faiss for production workloads that need IVF, PQ, or GPU acceleration.
  • Use hnsw when FAISS native libraries are unavailable and a graph index is acceptable.
  • Use pgvector when vectors must coexist with relational data and benefit from SQL transactions.
  • Use turbovec (added in v9.11.0) for environments where the new engine offers a better cost/performance profile.
  • Use ivfsparse for sparse/lexical ANN or as part of a hybrid retrieval pipeline.
  • Use realtime only for tests and very small datasets because it scans all vectors per query.

All dense backends are configured through the backend parameter of the Embeddings configuration block, with backend-specific options passed through ANN initialization in base.py. source: src/python/txtai/ann/base.py:120-180. LiteRT vectors introduced in v9.10.0 (#1097) plug into the same backend selection mechanism and are available wherever the underlying engine supports the required vector layout.

Limitations and Roadmap

The ANN subsystem today targets single-vector dense and sparse retrieval. Native ColBERT-style late interaction, MUVERA projection, and LEMUR-style learned MaxSim remain open feature requests (#945, #1079, #1107, #1024). Until these land, users can approximate late interaction by combining token-level embeddings with a custom backend that subclasses ANN and implements search with a MaxSim kernel. The v9.10.0 Knowledge Distillation trainer (#1103) and LiteRT vectors (#1097) are forward-compatible building blocks for that roadmap.

Source: https://github.com/neuml/txtai / Human Manual

Scoring: BM25, TF-IDF, and Sparse Methods

Related topics: Embeddings and Vector Indexing, ANN Backends and Late Interaction Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Embeddings and Vector Indexing, ANN Backends and Late Interaction Models

Scoring: BM25, TF-IDF, and Sparse Methods

The txtai.scoring package provides the lexical and sparse retrieval primitives that underpin txtai's hybrid search pipelines. Unlike the dense embedding path (handled by vectors/), the scoring path operates on token-frequency representations and produces deterministic, explainable scores that can be combined with vector similarity for hybrid ranking. The module exposes a common Scoring base interface so that BM25, TF-IDF, SIF, and sparse projections can be swapped interchangeably.

Architecture and Common Interface

All scoring algorithms inherit from a common base class that defines the indexing and querying contract. index(tokens, ids, kwargs) builds per-document term statistics from token streams, while search(query, kwargs) returns (id, score) tuples ranked in descending order. The base class also defines token parsing utilities used by every concrete scorer.

class Scoring:
    def index(self, tokens, ids, **kwargs):
        # Build per-document term statistics
        ...
    def search(self, query, **kwargs):
        # Return ranked (id, score) tuples
        ...

The tokens parameter is an iterable of token lists, and ids is a parallel iterable of document identifiers. This two-phase index/search design allows large corpora to be indexed once and queried repeatedly with negligible setup cost. Source: src/python/txtai/scoring/bm25.py:1-50

BM25 Implementation

BM25 is the default sparse scorer. It maintains document term frequencies and document lengths, then applies the classic Robertson–Walker formula with configurable k1 (term-saturation) and b (length-normalization) parameters.

Key implementation details:

  • Term frequency normalization: BM25 uses tf * (k1 + 1) / (tf + k1 * (1 - b + b * dl / avgdl)), where dl and avgdl are the document and average document lengths. Source: src/python/txtai/scoring/bm25.py — see the score method.
  • IDF: The inverse-document-frequency component uses the standard (log((N - df + 0.5) / (df + 0.5)) + 1) formulation, which is non-negative and stable for high-frequency terms.
  • Configuration: Parameters are exposed through the embeddings YAML config block under scoring.function (e.g. bm25) and scoring.parameters for k1, b, etc.

Community interest in extending BM25 is visible in #1023, a feature request for Bayesian BM25, which transforms BM25 scores into bounded probabilities in [0, 1] for cleaner hybrid fusion with vector similarity.

TF-IDF and SIF

The TF-IDF scorer is a classical implementation using tf * log(N / df) weights, supporting both raw and log-scaled variants. It also exposes a vocabulary-building pass that can be cached on the embeddings object. Source: src/python/txtai/scoring/tfidf.py:1-60.

The SIF (Smooth Inverse Frequency) scorer adapts TF-IDF to short-text embeddings by reweighting terms with a / (a + tf) and subtracting a principal component computed from the corpus. This downweights high-frequency stopwords and yields better representations than raw averages for sentence-level retrieval. Source: src/python/txtai/scoring/sif.py:1-80.

Sparse Representations

Sparse vectors are produced through the SIF and dedicated Sparse scorers. The Sparse class maintains term-to-index mappings and produces sparse weight vectors compatible with sparse vector backends (LiteRT, Turbovec) introduced in releases v9.10 and v9.11. Source: src/python/txtai/scoring/sparse.py:1-70.

Normalization and Hybrid Fusion

Score normalization is handled by Normalize, which supports L1, L2, and max-norm strategies. This is essential when combining BM25 scores with vector cosine similarities in a hybrid pipeline, because raw BM25 magnitudes vary widely across queries. Source: src/python/txtai/scoring/normalize.py:1-50.

A typical hybrid configuration uses normalized BM25 alongside dense embeddings:

ComponentRoleSource
BM25Lexical recall anchorbm25.py
TF-IDF / SIFSparse dense-like vectorstfidf.py, sif.py
SparseSparse backend compatibilitysparse.py
NormalizeScore fusion pre-processingnormalize.py

Selection Guide

  • Use BM25 when you need interpretable, fast lexical retrieval and exact-term matching.
  • Use TF-IDF when building a sparse document representation for downstream classification or clustering.
  • Use SIF when dense-like semantics are needed but embeddings are unavailable.
  • Use Sparse when targeting sparse ANN backends such as the recently added Turbovec (v9.11.0).

For mixed workloads, normalize each score source before applying weighted combination in the embeddings query layer. This keeps BM25 and vector similarities on a comparable scale and prevents one signal from dominating.

Source: https://github.com/neuml/txtai / Human Manual

Database, Graph, and Semantic Graph Networks

Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data

Database, Graph, and Semantic Graph Networks

txtai organizes structured, relational, and graph-based data through three cooperating layers: a Database layer for tabular and SQL access, a Graph layer for explicit node-edge topology, and a Semantic Graph layer that derives topic communities and traversal paths from a vector index. Together these layers let users store metadata, run SQL over embeddings-backed tables, traverse networks, and reason over semantically discovered clusters from a single API.

Architecture Overview

The three layers live under src/python/txtai/database/ and src/python/txtai/graph/. Each exposes a common base contract so alternative engines (DuckDB, SQLite/Postgres, NetworkX, custom graph backends) can be swapped without changing application code. The Database layer answers SQL-like questions against tabular content; the Graph layer stores user-defined relationships and answers traversal queries (search, walk, centrality, pagerank, etc.); the Semantic Graph layer (graph/topics.py) reuses the vector index to mine implicit topic communities when no explicit graph is available. Source: src/python/txtai/database/base.py:1-40, src/python/txtai/graph/base.py:1-40.

flowchart LR
    A[Documents / Rows] --> B[Database Layer]
    B --> B1[DuckDB backend]
    B --> B2[RDBMS backend]
    A --> C[Graph Layer]
    C --> C1[NetworkX backend]
    A --> D[Semantic Graph]
    D --> D1[Topics / Communities]
    D --> D2[Paths]
    B --> E[SQL Queries]
    C --> F[search / walk]
    D --> G[Topic & Path API]

Database Layer

The Database abstraction is defined in database/base.py and defines the contract for opening connections, executing queries, loading tabular content, and supporting hybrid queries that combine SQL with similarity. Concrete backends implement this contract.

  • DuckDB (database/duckdb.py) is the default backend, optimized for embedded analytical workloads. It loads data into in-process tables and supports a wide range of SQL operators over text, numeric, and vector columns. It is the recommended backend for most hybrid queries. Source: src/python/txtai/database/duckdb.py:1-60.
  • RDBMS (database/rdbms.py) is a generic SQL backend that wraps any DBAPI-compatible connection, letting users plug in SQLite, PostgreSQL, MySQL, or external warehouses. It exposes the same interface as DuckDB but delegates execution to the underlying driver. Source: src/python/txtai/database/rdbms.py:1-60.
  • Base (database/base.py) centralizes parameter validation, query parsing, and result shape normalization so all backends return the same dictionary-style rows. It also exposes the high-level entry points (upsert, search, similar) consumed by the rest of txtai. Source: src/python/txtai/database/base.py:40-120.

A typical hybrid query joins a SQL predicate with an embeddings similarity clause, allowing users to filter rows by content first and then re-rank with vector similarity, or vice versa.

Graph Layer

The Graph layer manages explicit node-edge structures. The base class (graph/base.py) defines node/edge attributes, weighted traversal, indexing of node text into the embeddings pipeline, and graph analytics methods (degree, centrality, pagerank, communities, paths). Source: src/python/txtai/graph/base.py:40-140.

The default backend is NetworkX (graph/networkx.py), which builds an in-memory nx.Graph or nx.DiGraph and indexes node text into the same vector store used by the rest of txtai. This makes search("topic") over a graph semantically equivalent to vector search but restricted to nodes. NetworkX enables well-known algorithms (shortest path, connected components, pagerank) to run directly on the graph without leaving txtai. Source: src/python/txtai/graph/networkx.py:1-80.

The graph layer exposes methods such as add(), upsert(), search(), walk(), centrality(), pagerank(), and communities(). Node text is auto-indexed, and edge weights can be set explicitly or computed from similarity. The graph can be persisted and reloaded, and the index can be backed by any vector store supported by txtai.

Semantic Graph Networks

When no explicit graph exists, txtai can still build one from a vector index. graph/topics.py implements the semantic graph by:

  1. Topic clustering — vector embeddings are clustered (single-pass assignment by similarity) into topic communities. Each topic becomes a graph node and document-to-topic assignments become edges. Source: src/python/txtai/graph/topics.py:1-80.
  2. Path discovery — short, high-similarity sequences between documents become paths in the graph. Paths connect related items even when they share no metadata, exposing latent relationships that pure vector search does not surface. Source: src/python/txtai/graph/topics.py:80-160.

This semantic graph is useful for exploratory workflows: discover clusters of related documents, follow paths between them, and then materialize those relationships into a real Graph instance for further traversal and analytics.

When to Use Each Layer

LayerBest forBackend(s)
DatabaseTabular data, SQL filters, hybrid SQL+vector queriesDuckDB, RDBMS (SQLite, Postgres, MySQL)
GraphExplicit relationships, traversal, graph analyticsNetworkX
Semantic GraphImplicit relationships mined from vectorsBuilt-in via topics.py

Choose the Database layer when the data is naturally tabular or when queries need precise SQL predicates. Choose the Graph layer when domain knowledge defines nodes and edges explicitly. Choose the Semantic Graph layer when only text/embeddings exist and you want to surface structure automatically. All three share the same vector backend and embedding model, so a single embeddings index can drive hybrid SQL, semantic search, and graph traversal simultaneously. Source: src/python/txtai/database/base.py:120-200, src/python/txtai/graph/base.py:140-220, src/python/txtai/graph/topics.py:160-240.

Summary

The Database, Graph, and Semantic Graph Networks form a unified structured-data stack on top of txtai's embeddings index. database/base.py and graph/base.py define the contracts; duckdb.py, rdbms.py, networkx.py, and topics.py provide default and specialized implementations. Together they let applications mix SQL filtering, explicit graph traversal, and emergent topic-based reasoning within a single, embeddings-driven workflow.

Source: https://github.com/neuml/txtai / Human Manual

Pipelines: LLM, Text, Audio, Image, and Data

Related topics: Embeddings and Vector Indexing, Workflows and Task Orchestration, Agents and LLM Orchestration

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Embeddings and Vector Indexing, Workflows and Task Orchestration, Agents and LLM Orchestration

Pipelines: LLM, Text, Audio, Image, and Data

txtai organizes the bulk of its inference and transformation logic into a unified pipeline abstraction. A pipeline wraps a single-purpose task — generating an answer, summarizing text, transcribing audio, captioning an image, or parsing structured data — behind a consistent callable interface. Pipelines are the primary integration point for machine-learning models inside the framework, and they compose with embeddings, vector indexes, and the workflow graph exposed via the API.

Pipeline Architecture and Factory

Every pipeline inherits from Pipeline, defined in src/python/txtai/pipeline/base.py. The base class centralizes argument validation, device selection (CPU/GPU/auto), batch inference, and serialization (source/persist). Concrete subclasses implement __call__ and forward inputs through a configurable backend — typically a Hugging Face model, an ONNX runtime session, or a remote API client (Source: src/python/txtai/pipeline/base.py:1-120).

Pipelines are instantiated by name through factory("name") in src/python/txtai/pipeline/factory.py. The factory resolves a string identifier (for example, "summary", "llm", "caption") to the matching class, loads default model parameters, and returns a ready-to-call instance. This indirection means the same task name can be re-targeted to a different backend simply by passing a path parameter (Source: src/python/txtai/pipeline/factory.py:1-200).

Pipeline FamilyExample TasksTypical Model Layer
llmGeneration, RAG, chat completionTransformers / external provider
textSummary, transcription, translation, entity extractionTransformers encoder–decoder
audioText-to-audio, audio indexingAudio diffusion / ASR
imageCaption, objects, labelingVision–language transformers
dataTabular parsing, text extractionpandas / parser backends

The factory dispatch enables YAML-driven configuration in txtai.yml and exposes the same pipeline surface to the FastAPI layer, ensuring parity between Python, HTTP, and workflow usage.

LLM Pipelines

The LLM module in src/python/txtai/pipeline/llm/ provides generation, chat, and retrieval-augmented generation (RAG). The core class is LLM (src/python/txtai/pipeline/llm/llm.py), which accepts either a local Hugging Face model path or a remote provider definition. The pipeline exposes parameters such as maxlength, temperature, topp, and system, and it returns either plain text or structured outputs depending on the requested schema (Source: src/python/txtai/pipeline/llm/llm.py:1-160).

RAG is implemented in rag.py and adds a retriever parameter so that each call can first query a configured embeddings or similarity index and prepend the most relevant snippets to the prompt. This bridges the LLM pipeline with the vector store and is the foundation for the extractor workflow. Community interest in late-interaction retrievers (issues #1079, #1107, and #945) highlights how the LLM layer is expected to grow toward ColBERT- and MUVERA-style multi-vector retrieval.

Text, Audio, and Image Pipelines

Text pipelines cover natural-language inference: Summary (text/summary.py) compresses long documents using a seq2seq model, Transcription (text/transcription.py) converts speech to text using an ASR backend, and Entity performs token-level extraction via GLiNER. Each accepts either a string or a list of strings and returns a list aligned to inputs (Source: src/python/txtai/pipeline/text/summary.py:1-80).

Audio generation lives under src/python/txtai/pipeline/audio/, with TextToAudio synthesizing waveforms from text prompts. The pipeline supports model swapping so users can experiment with different audio diffusion checkpoints (Source: src/python/txtai/pipeline/audio/texttoaudio.py:1-60).

Image pipelines in src/python/txtai/pipeline/image/ include Caption for producing natural-language descriptions and Objects for detection. Caption outputs are lists of strings suitable for downstream embeddings, allowing image-to-image similarity search — a workflow tied to long-standing community request #404 (Source: src/python/txtai/pipeline/image/caption.py:1-70).

Data Pipelines and Constraints

Data pipelines parse, extract, and shape structured input. Tabular in data/tabular.py ingests CSV files and projects rows into dictionaries suitable for vector indexing. As of v9.x, the pipeline restricts input to local CSV files and raises an explicit error when a path is missing or not CSV, addressing issue #1119 (Source: src/python/txtai/pipeline/data/tabular.py:1-90).

Textractor in data/textractor.py performs text extraction from heterogeneous formats (HTML, PDF, Office documents, audio/video). Recent versions added a safeopen parameter (#1077) and LiteParse support (#1118 in v9.11.0), reducing reliance on heavy optional dependencies while maintaining extraction fidelity (Source: src/python/txtai/pipeline/data/textractor.py:1-110).

Across all families, pipelines share a common lifecycle: factory resolution → model load on the selected device → batched __call__ → optional persist for cached state. This uniformity is what allows txtai to expose every pipeline through a single API surface and to mix them inside YAML workflows without bespoke glue code.

Source: https://github.com/neuml/txtai / Human Manual

Workflows and Task Orchestration

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Agents and LLM Orchestration

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Agents and LLM Orchestration

Workflows and Task Orchestration

Overview and Purpose

The workflow subsystem in txtai provides a declarative, DAG-style orchestration layer for chaining individual pipeline units, custom functions, and LLM calls into reusable multi-step processes. Where a txtai pipeline encapsulates a single inference operation, a workflow composes one or more *tasks* with explicit inputs, outputs, and dependencies so that data flows through the graph from upstream tasks to downstream consumers.

Workflows are defined in YAML and loaded into a Workflow object. Each Workflow owns an internal task graph and an executor that handles dispatch, parameter binding, and lifecycle. The subsystem is used both by applications that script txtai directly and by higher-level surfaces such as the interactive console, which routes user commands through workflow definitions.

Source: src/python/txtai/workflow/base.py:1-40

Core Architecture

The orchestration stack is split into four cooperating modules that separate definition from execution.

ComponentModuleResponsibility
Workflowworkflow/base.pyContainer for tasks, schedules, and configuration; parses YAML specs
Taskworkflow/task/base.pyAtomic unit of work; declares inputs, outputs, action, and condition
TaskFactoryworkflow/task/factory.pyResolves task action strings into callable objects (pipelines, methods, Python callables)
WorkflowExecutorworkflow/execute.pyTraverses the task graph, binds parameters, and dispatches tasks locally or remotely

The data flow is straightforward: a YAML specification is parsed into a list of Task instances. The executor builds a dependency map from each task's inputs list, then walks tasks in topological order. When a task's dependencies are satisfied, its inputs are read from the shared element store, the action is invoked through the factory, and the result is written back so downstream tasks can consume it.

flowchart LR
    YAML[YAML Spec] --> WF[Workflow]
    WF -->|parses| T1[Task A]
    WF -->|parses| T2[Task B]
    WF -->|parses| T3[Task C]
    T1 -->|output| Store[(Element Store)]
    T2 -->|reads input| Store
    T3 -->|reads input| Store
    WF --> EX[WorkflowExecutor]
    EX --> TF[TaskFactory]
    TF --> P[pipelines]
    TF --> M[methods]
    TF --> C[callables]

Source: src/python/txtai/workflow/base.py:30-90, src/python/txtai/workflow/task/base.py:20-70, src/python/txtai/workflow/task/factory.py:1-60, src/python/txtai/workflow/execute.py:40-110

Task Definition and the Task Factory

Every Task carries a small, uniform schema regardless of what it ultimately runs. The fields include a unique identifier, an ordered inputs list referencing upstream task IDs or literal values, a single action string describing what to invoke, an optional args dictionary for positional/keyword parameters, an optional condition for gating execution, and an optional task identifier for sub-workflow nesting.

The TaskFactory is the resolution layer that turns an action string into a runtime callable. It supports several action forms:

  • pipeline:<name> — looks up a registered txtai pipeline
  • method:<name> — invokes a method on an existing component
  • python:<module>.<callable> — imports and calls an arbitrary Python function
  • A nested task identifier that delegates to another workflow

This indirection is what lets the same YAML specification describe a workflow built from stock pipelines, custom user code, or a mix of both, without the executor needing to know the concrete type of any action ahead of time.

Source: src/python/txtai/workflow/task/base.py:30-120, src/python/txtai/workflow/task/factory.py:40-130

Execution Model

The WorkflowExecutor is responsible for actually running the graph. It accepts an iterable of input elements, then iterates tasks in dependency order. For each task it:

  1. Reads required inputs from the per-element scratch store, materializing missing upstream outputs.
  2. Evaluates the optional condition predicate against the current element; if it returns false, the task is skipped for that element.
  3. Resolves the action through the TaskFactory.
  4. Invokes the action with merged arguments and writes the result back to the store under the task's ID.

The executor supports both synchronous and asyncio execution paths. Async tasks declared with the appropriate action type are scheduled on the running loop, allowing I/O-bound work such as LLM calls to overlap. Tasks can also be marked for parallel dispatch when their dependencies permit, which the executor detects from the DAG.

Errors in one task do not silently abort the whole run by default; the executor records the failure against the offending task and continues where possible, surfacing failures through the workflow's results.

Source: src/python/txtai/workflow/execute.py:60-180

Integration with the Console

The console layer is a thin command-driven shell that translates user input into workflow invocations. Each console command maps to a YAML workflow bundled with txtai; the console loads the workflow, feeds it user-supplied elements, and renders the resulting elements back to the terminal. Because the same Workflow and WorkflowExecutor are reused, console commands inherit the full feature set of the workflow engine: parameter binding, conditions, async dispatch, and sub-workflow composition.

This shared substrate is also why new console commands are added by writing a YAML workflow rather than imperative Python — the orchestration layer is the single source of truth for both scripted and interactive use.

Source: src/python/txtai/console/base.py:1-80

Practical Usage Notes

When authoring a workflow, keep the following in mind:

  • Task IDs become the keys used by downstream tasks to read outputs, so they should be stable and descriptive.
  • The inputs list on each task accepts either task IDs from earlier in the graph or literal scalar values; mixing the two is supported and common for configuration parameters.
  • Conditions are evaluated per element, so they are the right tool for branching logic such as "only run task B if task A returned non-empty."
  • For LLM-heavy workflows, prefer async-capable actions and keep the dependency graph shallow; the executor's parallelism is bounded by available upstream outputs, not by task count.

Community discussions around late-interaction retrieval models (see issues #945, #1024, #1079, #1107) and the coding agent toolkit introduced in v9.7.0 are relevant examples of features that compose multiple tasks through this orchestration layer. The v9.10.0 release added a knowledge distillation trainer that is itself consumed as a workflow action, and v9.11.0 added the turbovec ANN backend, both of which can be wired into user-defined workflows through the same factory mechanism.

Source: src/python/txtai/workflow/base.py:90-140, src/python/txtai/workflow/task/factory.py:120-180

Source: https://github.com/neuml/txtai / Human Manual

Agents and LLM Orchestration

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Workflows and Task Orchestration

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Workflows and Task Orchestration

Agents and LLM Orchestration

txtai's agent subsystem provides a framework for building LLM-driven agents that can plan, call tools, and interact with txtai's broader retrieval and workflow capabilities. Released as part of the v9.7.0 Coding Agent Toolkit (PRs #1054–#1061), the module is designed to make large language models into first-class orchestration primitives inside txtai applications. Source: src/python/txtai/agent/base.py:1-30

Architecture Overview

The agent module follows a layered design. At the top, base.py defines the abstract Agent class that encapsulates the orchestration loop: it manages conversation state, dispatches tool calls, and returns structured responses to callers. Source: src/python/txtai/agent/base.py:1-120. model.py provides the LLM-facing layer that translates agent decisions into provider-specific chat-completion requests. Source: src/python/txtai/agent/model.py:1-90. The factory.py module is responsible for instantiating the correct agent variant from a configuration object, similar to how other txtai pipelines are resolved through factories. Source: src/python/txtai/agent/factory.py:1-80.

flowchart LR
    User[User / Caller] --> Agent[Agent base.py]
    Agent --> Model[LLM Model model.py]
    Agent --> Tools[Tool Registry tool/__init__.py]
    Tools --> Emb[Embeddings Tool tool/embeddings.py]
    Tools --> Skill[Skill Tool tool/skill.py]
    Tools --> Other[Custom Tools]
    Model --> Provider[(LLM Provider)]
    Agent --> Output[Structured Response]

The diagram shows how a request flows from the caller into the Agent orchestrator, which consults the LLM Model for reasoning and the Tool Registry for execution. Each registered tool is a self-contained callable with a typed schema that the agent can invoke mid-turn. Source: src/python/txtai/agent/tool/__init__.py:1-60.

Tool Subsystem

Tools are the actions an agent can perform. The tool/ package exposes a registry pattern: __init__.py defines the Tool base contract and a registry used by the agent to discover available actions. Source: src/python/txtai/agent/tool/__init__.py:1-90.

Two built-in tools ship with the framework:

  • Embeddings tool — wraps a txtai embeddings index, allowing the agent to perform semantic search, similarity lookups, and question answering over a corpus. Source: src/python/txtai/agent/tool/embeddings.py:1-120.
  • Skill tool — invokes a registered txtai workflow or "skill," letting the agent chain together pipelines such as extraction, transcription, or summarization that have been defined elsewhere in the application. Source: src/python/txtai/agent/tool/skill.py:1-100.

Custom tools can be added by subclassing the Tool base class and registering them with the agent's tool set. Each tool advertises its name, description, and parameter schema so the underlying LLM can decide when and how to invoke it. Source: src/python/txtai/agent/tool/__init__.py:60-140.

LLM Orchestration Loop

The orchestration loop implemented in base.py runs the classic Reason → Act → Observe cycle. The agent sends the current prompt, including prior tool results, to the model layer, then interprets the model's response. If the model returns a tool call, the agent dispatches it to the matching registered tool, captures the result, and re-enters the loop with the updated context. When the model produces a final answer, that text is returned to the caller. Source: src/python/txtai/agent/base.py:80-200.

model.py abstracts provider differences so agents are not coupled to a specific LLM SDK. The model layer handles message formatting, streaming, and tool-call parsing, allowing the same agent definition to target different backends simply by changing configuration. Source: src/python/txtai/agent/model.py:40-140. This abstraction is what enables the v9.7.0 release to ship a Coding Agent Toolkit — agents that can read, write, and execute code by composing the built-in tools with code-execution skills. Source: src/python/txtai/agent/factory.py:30-100.

Factory and Configuration

Configuration follows txtai's standard pipeline pattern. factory.py reads a YAML or dictionary configuration describing the agent's LLM, tools, and orchestration parameters, then constructs a fully wired Agent instance. Source: src/python/txtai/agent/factory.py:1-60.

Typical configuration keys include:

KeyPurpose
modelLLM provider and model identifier
toolsList of tool names or class references to register
maxstepsUpper bound on orchestration iterations
instructionsSystem prompt / persona

Source: src/python/txtai/agent/factory.py:60-130.

Because the factory reuses txtai's shared configuration loader, agents can be defined alongside embeddings and workflow settings in the same YAML file, which simplifies deployment and makes agent behavior reproducible across environments. Source: src/python/txtai/agent/factory.py:130-180.

Practical Use

In practice, developers construct an agent via the factory, call it with a user prompt, and receive either a final answer or a trace of tool calls. The agent can be embedded in API services, used in notebooks, or orchestrated as a step inside a larger txtai workflow. The v9.7.0 release notes call out an accompanying agent tools example notebook (PR #1062) that demonstrates end-to-end usage. The combination of a typed tool registry, a provider-agnostic model layer, and a factory-driven configuration makes txtai's agent module a compact yet extensible surface for building production-grade LLM orchestration.

Source: https://github.com/neuml/txtai / Human Manual

API Layer: FastAPI, MCP, and Authorization

Related topics: Deployment, Cloud, and Docker, Extensibility, Security, and Customization

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Deployment, Cloud, and Docker, Extensibility, Security, and Customization

API Layer: FastAPI, MCP, and Authorization

The txtai API layer exposes the framework's semantic search, vector, workflow, and LLM capabilities over HTTP. It is implemented on top of FastAPI and uses Uvicorn as the ASGI server. The layer is designed to run both as a single-process service and as a horizontally scaled cluster, with optional bearer-token authorization for any deployment that leaves the loopback interface.

Component Layout

FileRole
application.pyFastAPI app factory, lifespan management, middleware wiring
base.pyShared base classes for routing and request handling
route.pyCustom APIRouter subclass used to remain compatible with recent FastAPI releases
authorization.pyBearer-token validation and the Authorization dependency
cluster.pyDistributes requests across worker processes for scale-out
routers/llm.pyRouter that exposes the LLM pipeline as HTTP endpoints

Source: src/python/txtai/api/application.py:1-1, src/python/txtai/api/base.py:1-1, src/python/txtai/api/route.py:1-1, src/python/txtai/api/authorization.py:1-1, src/python/txtai/api/cluster.py:1-1, src/python/txtai/api/routers/llm.py:1-1.

FastAPI Application and Lifespan

application.py builds a FastAPI instance whose lifespan handler loads the configured YAML workflow and the embeddings/vector index, then keeps them resident for the life of the process. The same module wires the FastAPI Router objects that come from each pipeline, attaches the cluster middleware when running in distributed mode, and configures CORS, exception handlers, and an optional static-file mount for the built-in web UI.

Key behaviors:

  • The app is constructed by an Application class that takes a config path. Calling run() launches Uvicorn with the host/port read from the same config. Source: src/python/txtai/api/application.py:1-1.
  • A lifespan async context manager opens the embeddings, vectors, and pipelines on startup and closes them on shutdown, so model weights are loaded once per worker rather than per request. Source: src/python/txtai/api/base.py:1-1.
  • Pipelines register themselves through a small base class in base.py that exposes a router() method, so adding a new HTTP route is mostly a matter of adding a register hook rather than touching application.py. Source: src/python/txtai/api/base.py:1-1.

Custom Router and FastAPI Compatibility

Txtai ships a custom APIRouter subclass defined in route.py. The reason it exists is rooted in a recent upstream change: FastAPI 0.137 modified how routers are constructed and how dependencies are resolved, breaking txtai's request-time dependency injection. Until txtai 9.11 was released, users were advised to pin FastAPI at 0.136.1 or below. Source: src/python/txtai/api/route.py:1-1, community issue #1115.

The custom router preserves the original API surface (path operations, response models, dependency injection via Depends) so that pipelines registered through base.py continue to work against newer FastAPI/Starlette releases. It is the single place that needs to change when FastAPI ships another router-breaking revision, which keeps the rest of the codebase insulated from upstream churn.

Authorization

Authentication is handled by authorization.py and is intentionally minimal. When the environment variable or config flag TXTAI_API_AUTH is set, every request must carry a matching bearer token; otherwise the request is rejected with a 401 before reaching any pipeline handler. The implementation follows these principles:

  • The token is read once at startup from config and compared in constant time to the Authorization: Bearer ... header on each request.
  • The dependency is exposed as a FastAPI Depends(...) callable, so individual routes can opt in or out without duplicating validation logic.
  • Authorization is orthogonal to serialization: even with auth disabled, pickle-based payloads still require the ALLOW_PICKLE flag, which is itself a separate security gate.

This split between *transport-level* auth (the bearer token) and *payload-level* serialization controls is what the project documents as its threat model. The community-reported CVE about pickle.loads when ALLOW_PICKLE=True (issue #1108) is mitigated by treating pickle support as a hard opt-in rather than as a default. Source: src/python/txtai/api/authorization.py:1-1, community issue #1108.

Cluster Mode

For larger indexes, the API layer can run in cluster mode. cluster.py introduces a worker pool that fronts the FastAPI app: requests are dispatched to a worker process by a routing key derived from the request (typically the index uid for search calls), which keeps ANN queries pinned to the worker that owns the relevant shard of the vector index. The cluster is started with a separate entry point and the FastAPI app is launched as the front-end router, while the workers run the same Application lifecycle but bind only to internal ports. Source: src/python/txtai/api/cluster.py:1-1.

LLM and Other Routers

The routers/ package contains one APIRouter per pipeline family that benefits from a richer HTTP surface. routers/llm.py is the most prominent: it exposes streaming completions, chat-style endpoints, and tool/function-calling semantics on top of the LLM pipeline, returning server-sent events for streaming responses. Because routers plug in through the base class in base.py, adding a new pipeline family follows the same pattern: define a router, register it on the Application, and it becomes reachable under /<pipeline>/... without further changes to application.py. Source: src/python/txtai/api/routers/llm.py:1-1.

Request Flow

flowchart LR
    Client[HTTP Client] -->|Bearer token| FastAPI[FastAPI app - application.py]
    FastAPI --> Auth{Authorization - authorization.py}
    Auth -->|valid| Router[Custom APIRouter - route.py]
    Auth -->|invalid| Reject[401 response]
    Router --> Base[Pipeline base - base.py]
    Base --> Cluster{Cluster mode?}
    Cluster -->|yes| Workers[Worker pool - cluster.py]
    Cluster -->|no| Local[In-process pipeline]
    Workers --> Local
    Local --> Response[JSON / SSE response]

This flow shows why the design splits responsibilities the way it does: authentication is enforced before routing, the custom router shields the rest of the code from FastAPI version drift, and cluster mode is a transparent switch in front of the same pipeline handlers used in single-process deployments. Source: src/python/txtai/api/application.py:1-1, src/python/txtai/api/route.py:1-1, src/python/txtai/api/authorization.py:1-1, src/python/txtai/api/base.py:1-1, src/python/txtai/api/cluster.py:1-1.

Practical Notes

  • Pin FastAPI to 0.136.1 or below when running txtai older than 9.11 to avoid the router regression reported in #1115. The 9.11 release ships the route.py fix that restores full dependency injection.
  • Keep ALLOW_PICKLE disabled unless you fully control the clients. Even with bearer-token auth in place, an attacker who reaches the service can still exploit pickle.loads on any endpoint that accepts serialized payloads (issue #1108).
  • Use cluster mode only when a single process can no longer hold the index or serve the QPS target; the routing key in cluster.py assumes requests are idempotent with respect to the embedding/model configuration loaded at startup.

Source: https://github.com/neuml/txtai / Human Manual

Deployment, Cloud, and Docker

Related topics: Introduction and Installation, API Layer: FastAPI, MCP, and Authorization

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Introduction and Installation, API Layer: FastAPI, MCP, and Authorization

Deployment, Cloud, and Docker

txtai ships a layered set of deployment artifacts so that the same Python library can run as a local script, an HTTP API, a serverless workload, or a workflow-driven batch job. The repository separates concerns across three axes: (a) Docker images of varying capability and size, (b) a Python-side cloud abstraction, and (c) a FastAPI-backed HTTP API used by the API image. This page describes how those pieces fit together.

Docker Image Variants

The docker/ directory contains several downstream images that share a common base. The base layer standardizes Python, system dependencies, and the txtai package itself so that downstream images remain thin and reproducible.

ImagePurposeNotes
docker/base/DockerfileFoundation imageUsed as FROM for every downstream variant
docker/minimal/DockerfileZero-dependency buildAligns with the v9.9.0 "zero dependency minimal install" track
docker/api/DockerfileHTTP API serverAdds FastAPI/uvicorn and the txtai API entrypoint
docker/workflow/DockerfileWorkflow executionAdds the runtime needed for scheduled and triggered workflow runs
docker/aws/DockerfileAWS-targeted imageLayered on top of base for cloud-specific integrations

The base image exists to amortize common setup: package indexes, system libraries needed by native extensions such as ONNX Runtime or FAISS, and the canonical installation of txtai. Source: docker/base/Dockerfile Downstream images then add only what is required for their target runtime. The minimal variant ships a stripped-down set of dependencies consistent with the zero-dependency install mode introduced in v9.9.0, which makes it suitable for size-constrained environments such as serverless or edge deployments. Source: docker/minimal/Dockerfile

The API image extends the base with the FastAPI/uvicorn stack and the txtai API entrypoint, exposing the library's pipelines and embeddings over HTTP. Source: docker/api/Dockerfile Workflows run in a heavier image with the runtime required to execute YAML-defined pipelines end-to-end. Source: docker/workflow/Dockerfile The AWS image layers cloud-specific tooling on top of the base to support provider-targeted deployment. Source: docker/aws/Dockerfile

Cloud Abstraction Layer

Beyond Docker, txtai provides a Python-side cloud abstraction rooted in src/python/txtai/cloud/base.py. This module defines the foundation for cloud provider integrations, encapsulating authentication, region selection, storage, and remote execution patterns so that downstream providers can be added without touching the library core. Source: src/python/txtai/cloud/base.py: The AWS image complements this abstraction by packaging the bits needed for AWS-based deployments, including any AWS-specific tooling layered on top of the base image. Source: docker/aws/Dockerfile

The cloud module is intentionally abstract: it does not hard-code a single vendor's SDK. Instead, it exposes a contract that concrete cloud providers implement, mirroring the layered style of the Docker setup. This makes it possible to deploy the same txtai configuration across local Docker, AWS, or other provider targets without rewriting application logic.

API and FastAPI Compatibility

The most common operational deployment of txtai is the HTTP API image. However, FastAPI 0.137 introduced breaking changes that affected txtai's custom routing class. Issue #1115 tracks this: the custom routing class ignored FastAPI-injected dependencies under the new release, and the fix landed in txtai 9.11. Operators running into this issue have been advised to stay on FastAPI ≤ 0.136.1 until 9.11 is deployed. Source: community reference: issue #1115

This is a good reminder that the API image's behavior is coupled to its FastAPI dependency and that upgrades should be staged: pin FastAPI, upgrade txtai, then move the FastAPI pin forward.

Deployment Workflow

flowchart LR
  A["docker/base/Dockerfile"] --> B["docker/minimal/Dockerfile"]
  A --> C["docker/api/Dockerfile"]
  A --> D["docker/workflow/Dockerfile"]
  A --> E["docker/aws/Dockerfile"]
  C --> F["FastAPI / uvicorn"]
  F --> G["HTTP API clients"]
  D --> H["YAML workflows"]
  B --> I["Size-constrained runtimes"]
  E --> J["AWS targets via cloud/base.py"]

The base image seeds every variant. Operators select a variant based on the workload: minimal for lightweight or zero-dependency deployments, API for HTTP services, workflow for scheduled pipelines, and AWS for cloud-native targets. The cloud abstraction in src/python/txtai/cloud/base.py glues together Python-side cloud logic with the AWS Docker image for a coherent deployment story.

Version Awareness

Recent releases reshaped the deployment footprint. v9.9.0 introduced the zero-dependency minimal install path that the minimal image aligns with. Source: docker/minimal/Dockerfile v9.10.0 added LiteRT vectors, a URL Retrieve pipeline, and Knowledge Distillation training, all of which expand the dependency surface that Docker images must accommodate. Source: docker/base/Dockerfile v9.11.0 brought the FastAPI router fix and the turbovec ANN backend, which the API image can now consume without dependency pinning workarounds. Source: community reference: release v9.11.0

For operators, the practical guidance is: keep the base image as the single source of truth for Python and system-level changes, rebuild downstream variants when the base changes, and re-validate the API image whenever FastAPI is upgraded.

Source: https://github.com/neuml/txtai / Human Manual

Extensibility, Security, and Customization

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, API Layer: FastAPI, MCP, and Authorization

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Pipelines: LLM, Text, Audio, Image, and Data, API Layer: FastAPI, MCP, and Authorization

Extensibility, Security, and Customization

txtai exposes several extension points that let users plug in custom models, choose serialization backends, control how archives are read, and override training and export behavior. This page covers the mechanisms that govern those choices and the security posture around pickling.

Model Registration

The registry.py module under src/python/txtai/models/ is the single source of truth for resolving model identifiers into callable Python classes. Pipelines, vector stores, and configuration loaders all funnel through this registry rather than hardcoding class lookups, which is what makes custom backends possible.

Source: src/python/txtai/models/registry.py.

The registry exposes a register decorator so that downstream applications can register a new backend under an explicit name, and a resolver that maps configuration strings (for example, a Hugging Face model id or a "backend/path" pair) to the registered implementation. This is the contract that allows the system to remain extensible: every module that needs a model first asks the registry, then dispatches.

Serialization and Pickling Security

State persistence flows through the serialize/ package, which contains a factory.py that selects a backend per call and concrete implementations such as pickle.py. The factory abstraction is what allows safer formats to coexist with pickle.

Source: src/python/txtai/serialize/factory.py.

For Python objects that cannot be expressed in a safer format, pickle.py falls back to pickle.loads. The function call at pickle.py:63 is gated by the ALLOW_PICKLE environment variable, so by default pickle.loads is not invoked, mitigating the CWE-502 deserialization concern raised in security advisories. When ALLOW_PICKLE is explicitly enabled, the caller is accepting that any pickled payload may execute arbitrary code on deserialization.

Source: src/python/txtai/serialize/pickle.py:63.

This is a deliberate opt-in: lighter objects (configuration data, dictionaries, simple model weights) use safer formats routed by the factory, while objects with arbitrary Python state require ALLOW_PICKLE=true to acknowledge the trust boundary. Operators deploying txtai should treat ALLOW_PICKLE as a privileged flag and keep it unset whenever the input cannot be fully trusted.

Archive Reading

The src/python/txtai/archive/ package defines a base.py abstract reader that wraps ZIP- and tar-format archives consistently. Custom embeddings and pipelines often ship as multi-file archives, and the abstract reader normalizes the access pattern.

Source: src/python/txtai/archive/base.py.

Derived readers implement the same interface for .zip, .tar.gz, and similar containers, so a pipeline can read artifacts without caring about the packaging format. This is also where size and path limits can be enforced, so custom integrations inherit the same bounds as built-in ones.

Training and Export Customization

For custom training loops, pipeline/train/hftrainer.py wraps Hugging Face's Trainer so that txtai-specific configuration (scoring, column mapping, dataset layout) can be supplied while still passing arbitrary TrainingArguments through unchanged. This lets users keep harness-specific options such as evaluation strategies or precision flags.

Source: src/python/txtai/pipeline/train/hftrainer.py.

For deployment-side customization, pipeline/train/hfonnx.py provides an ONNX export path with knobs for opset, dynamic axes, and optimization. The exporter is what enables LiteRT and other ONNX-compatible runtimes to consume the same trained model.

Source: src/python/txtai/pipeline/train/hfonnx.py.

Cross-Cutting Pattern

The same shape repeats across these modules: a thin abstract or factory layer at the package boundary, concrete implementations behind it, and environment variables or registry entries that switch behavior without code changes.

ModuleExtension PointSecurity/Control Knob
models/registry.pyregister(name) decorator, resolverIdentifier allowlist via registration
serialize/factory.pyBackend selection per callRoutes around pickle when possible
serialize/pickle.pypickle.loads only as fallbackALLOW_PICKLE opt-in
archive/base.pyAbstract reader interfaceFormat-specific size limits
pipeline/train/hftrainer.pyWrapped HF TrainerPass-through TrainingArguments
pipeline/train/hfonnx.pyConfigurable ONNX exportOpset, dynamic axes

This is how new vector backends, custom trainers, and safer serialization formats can be added without touching consumer code, while the trust boundary around pickle remains explicit.

Source: https://github.com/neuml/txtai / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 11 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/742

2. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1119

3. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1112

4. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/neuml/txtai

5. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1115

6. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai

7. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/neuml/txtai

8. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/neuml/txtai

9. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1122

10. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai

11. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using txtai with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence