Doramagic Project Pack · Human Manual
txtai
txtai is an open-source embeddings database. It combines vector search (similarity), traditional full-text search, and optional graph/relational storage with LLM-driven pipelines behind a ...
Introduction and Installation
Related topics: System Architecture and High-Level Design, Deployment, Cloud, and Docker
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: System Architecture and High-Level Design, Deployment, Cloud, and Docker
Introduction and Installation
Overview
txtai is an open-source embeddings database. It combines vector search (similarity), traditional full-text search, and optional graph/relational storage with LLM-driven pipelines behind a single Python API and HTTP service. The project is organized so the core database has no required third-party dependencies, while higher-level capabilities (neural models, vector backends, document extraction, graph databases, API server) are opt-in via install extras.
The package is published under the namespace package txtai and exposes its runtime version through a single module constant __version__ defined in the top-level package __init__.py. The current release line is 9.x, with v9.11.0 introducing the turbovec ANN backend and LiteParse text extraction. Source: src/python/txtai/version.py:1-10
The repository layout separates Python sources (src/python/txtai/), documentation (docs/), configuration (setup.py, pyproject.toml), and tests. This structure makes the project installable as a normal PEP 517/518 package while keeping the build configuration declarative. Source: pyproject.toml:1-60
Installation Methods
Standard install (PyPI)
The most common path is to install from PyPI. The package metadata in pyproject.toml declares the build system (setuptools with the setuptools_scm-style versioning) and the txtai console-script entry points used to launch the API, embed, graph, and similar command-line utilities. Source: pyproject.toml:1-60
pip install txtai
This installs the core package, which has zero required third-party dependencies. Core capabilities such as the Database engine, BM25 scoring, and the embeddings abstractions are usable on a stock Python install. Source: docs/install.md:1-40
Optional components (extras)
Most neural and storage features are delivered as optional extras. Selecting the right extras avoids unnecessary large installs. The typical groups include:
pipeline— neural pipelines (text/label/summary/transcription/translation/etc.)vectors— accelerated vector backends such as Faiss, Hnswlib, and the newturbovecbackend introduced in v9.11.0graph— graph database backendsapi— FastAPI-based HTTP service (note: stay onfastapi <= 0.136.1if you must use atxtaibuild before v9.11 because FastAPI 0.137 changed how injected dependencies are resolved) Source: docs/install.md:40-120
pip install "txtai[pipeline]"
pip install "txtai[vectors]"
pip install "txtai[graph]"
pip install "txtai[api]"
From source (editable)
For contributors, an editable install from the repository root is supported via setup.py. The setup script wires the src/python layout into the install, registers the same console scripts, and pulls the same optional dependency groups. Source: setup.py:1-120
git clone https://github.com/neuml/txtai
cd txtai
pip install -e .
pip install -e ".[pipeline,vectors,api]"
Zero-dependency minimal install
Since v9.9.0 the project advertises a true zero-dependency minimal install. This is useful for environments where the embeddings and vector code can be exercised against pure-Python backends (for example, building and serializing embeddings without GPU acceleration or third-party ANN libraries). Source: docs/install.md:20-60
Optional Dependency Groups
The optional dependencies listed in setup.py map directly to feature areas of the codebase. The table below summarizes how install extras align with modules under src/python/txtai/.
| Extra | Primary module(s) | Purpose |
|---|---|---|
pipeline | pipeline/, models/ | Neural text/label/summary/transcription pipelines; requires Transformers v5+ compatibility layer |
vectors | vectors/ | Accelerated ANN backends (Faiss, Hnswlib, turbovec) |
graph | graph/ | Graph database integrations |
api | app/, api/ | FastAPI-based HTTP API server |
scoring | scoring/ | Classical scoring algorithms (BM25, etc.) |
Source: setup.py:30-120 Source: pyproject.toml:20-60
Two compatibility notes are important when choosing versions:
- FastAPI 0.137 introduced a routing change that broke
txtai's custom router class for injected dependencies. Untiltxtai 9.11, pinfastapi <= 0.136.1. Source: docs/install.md:60-100 transformersv5 required several workarounds (for example, lazy importingskopsto suppress noisy logging). The project tracks Transformers v5 compatibility and reverts workarounds once upstream issues are resolved. Source: docs/install.md:80-130
Verification and First Run
After installing, verify the package is importable and the version matches expectations:
import txtai
print(txtai.__version__)
The version string is sourced from __version__ in the package init, which is the single source of truth referenced by the build configuration. Source: src/python/txtai/version.py:1-10
A minimal smoke test exercises the embeddings database end-to-end without optional dependencies:
from txtai import Database
db = Database({"path": "memory", "scoring": {"method": "bm25"}})
db.index([("id", "text", "Hello world"), ("id", "text", "Goodbye world")])
print(db.search("select id, text, score from txtai where similar('hello')"))
Source: docs/index.md:1-80
To launch the full HTTP API, install the api extra and run the txtai-api console script that setup.py registers. The default configuration serves the FastAPI app defined under src/python/txtai/app/, exposing embeddings, pipelines, and (optionally) graph endpoints over HTTP. Source: setup.py:80-140
Next Steps
With a working install, the recommended learning path is:
- Read the
docs/index.mdoverview for a tour of supported workflows. - Install only the extras you need (start with
pipelineandvectorsfor most retrieval use cases). - When targeting GPU/accelerated retrieval, prefer the
vectorsextra to pull Faiss/Hnswlib/turbovecinstead of the pure-Python default.
This page covers the setup surface only; pipeline configuration, schema definition, and the API service are documented in the rest of the wiki.
Source: https://github.com/neuml/txtai / Human Manual
System Architecture and High-Level Design
Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data
System Architecture and High-Level Design
txtai is an open-source semantic search and workflows framework built around vector embeddings, pluggable pipelines, and an optional API/agent layer. The high-level design favors composition: small, focused modules cooperate through a shared configuration object rather than a monolithic runtime. This page walks through the major subsystems, how they connect, and where to look in the codebase for each.
Core Module Layout
The Python package is rooted at src/python/txtai/ and is exported from __init__.py. The package exposes five top-level primitives that users typically interact with:
| Module | Purpose |
|---|---|
embeddings | Build and query vector indexes over text, documents, or images |
pipeline | Run NLP/ML workloads (transcription, translation, summarization, etc.) |
vector | Backend-agnostic ANN index abstraction (FAISS, HNSW, LiteRT, turbovec, …) |
agent | LLM-driven tool orchestration built on top of pipelines and embeddings |
api / app | FastAPI server and YAML-driven application orchestration |
Source: src/python/txtai/__init__.py:1-50
Every public class is constructed through a single shared configuration object, Config, defined in src/python/txtai/config.py. Config accepts both dictionary-style parameters and YAML/JSON files, which is what enables the YAML-driven Application workflow documented in the README and docs/embeddings/. Source: src/python/txtai/config.py:1-120
Embeddings and Vector Backends
The embeddings module is the heart of the system. Embeddings (declared in src/python/txtai/embeddings/base.py) is a wrapper around three collaborators:
- A Vectors instance — concrete ANN index from
src/python/txtai/vector.py. - A scoring function — typically a Hugging Face sentence-transformers model, but any callable mapping
(query, documents) -> scoresis supported. - A storage / id lookup layer — backed by SQLite by default, with optional external stores.
Embeddings exposes the standard CRUD+query surface: index, upsert, delete, search, batchsearch, similarity, and SQL-style filtering. The serialization format is documented in docs/embeddings/format.md and consists of a header followed by one record per line, each carrying the document id, text, optional tags/data, and precomputed embeddings. Source: src/python/txtai/embeddings/base.py:1-180
The Vectors abstraction decouples the embedding model from the ANN algorithm. Backends are registered through Vectors.registered() and include FAISS, HNSW (hnswlib), NumPy exact search, LiteRT (introduced in v9.10), and turbovec (introduced in v9.11). This registry pattern lets Embeddings accept a method string in YAML and dispatch to the right backend without hard-coding dependencies. Source: src/python/txtai/vector.py:1-200
Pipelines
Pipelines are stateless, callable units that wrap a single ML or data-processing capability. The base class in src/python/txtai/pipeline/base.py standardizes calling conventions so that any pipeline can be dropped into an Application definition or invoked directly:
from txtai.pipeline import Summary
Summary()(text)
Pipelines are organized as submodules under pipeline/ (e.g., pipeline/text, pipeline/audio, pipeline/data, pipeline/image). Several compose with Embeddings — for example, entity and labels use an embedding model to label spans or classify documents. The streaming Labels pipeline (v9.8.0) and the URLRetrieve pipeline (v9.10.0) are recent additions that follow the same base contract. Source: src/python/txtai/pipeline/base.py:1-150
Application, API, and Agent Layers
The Application class in src/python/txtai/app/base.py is the highest-level orchestrator. It loads a YAML configuration, instantiates the declared components (embeddings, pipeline, workflow, agent), and exposes a uniform object graph that downstream callers can traverse. This is what powers the "embeddings as config" workflow described in docs/embeddings/index.md. Source: src/python/txtai/app/base.py:1-200
The api module (built on FastAPI) wraps an Application into HTTP routes. Community issue #1115 notes that FastAPI 0.137+ modified router behavior, requiring an update in the custom routing layer to preserve injected dependencies; the fix lands in txtai 9.11. The API also exposes serialization endpoints — community issue #1108 flags that pickle.loads is used for ALLOW_PICKLE=True flows in src/python/txtai/serialize/pickle.py:63, which is a critical RCE risk if untrusted payloads are accepted. Source: src/python/txtai/api.py:1-160
The agent module (introduced in v9.7) adds an LLM-driven planning layer on top of pipelines. Agent instances hold a tool list (each tool is typically a pipeline or Embeddings.search wrapper) and expose a __call__ that returns agent-driven multi-step execution. The Coding Agent Toolkit (#1054–#1061) is the canonical example and is paired with an agent tools notebook in the docs.
Configuration Flow
flowchart LR
YAML["YAML config"] --> App["Application (app/base.py)"]
App --> Cfg["Config (config.py)"]
Cfg --> Emb["Embeddings"]
Cfg --> Pl["Pipelines"]
Cfg --> V["Vectors"]
Emb --> V
App --> Agent["Agent"]
App --> API["FastAPI (api.py)"]This is the canonical wiring: a YAML file declares what to build; Config normalizes parameters; Embeddings, Pipelines, Vectors, and Agent are constructed as peers; and the api module mounts them as HTTP routes. The same Application object can also be driven directly from Python, which is what the example notebooks exercise. Source: docs/embeddings/index.md:1-80 and Source: src/python/txtai/app/base.py:50-160.
Cross-Cutting Concerns
A few concerns cut across all modules and are worth knowing before extending the system:
- Serialization: every component supports save/load via the
serialize/package. Pickle is gated behindALLOW_PICKLE(#1108); safe formats use msgpack or skops.Source: src/python/txtai/serialize/pickle.py:60-70 - Logging: v9.9 made
skopslazy-imported to silence noisy warnings from Transformers v5 (#1102, #1106).Source: src/python/txtai/pipeline/__init__.py:1-40 - Optional dependencies: v9.9 introduced a zero-dependency minimal install (#1089–#1094). Heavy backends (FAISS, HNSW, LiteRT, turbovec, GGML) are loaded lazily, so users only pay the import cost for what they configure.
- Extensibility hooks: registering a new vector backend means implementing the
Vectorsinterface and adding it to the registry insrc/python/txtai/vector.py; registering a new pipeline means subclassingpipeline/base.pyand registering it inpipeline/__init__.py.
This split — Embeddings for retrieval, Vectors for ANN, Pipelines for transforms, Agent for orchestration, Application/api for deployment — is the architecture a contributor needs to internalize before adding new functionality.
Source: https://github.com/neuml/txtai / Human Manual
Embeddings and Vector Indexing
Related topics: ANN Backends and Late Interaction Models, Scoring: BM25, TF-IDF, and Sparse Methods, Database, Graph, and Semantic Graph Networks
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: ANN Backends and Late Interaction Models, Scoring: BM25, TF-IDF, and Sparse Methods, Database, Graph, and Semantic Graph Networks
Embeddings and Vector Indexing
The txtai.embeddings module is the core of txtai's semantic search capability. It combines an embedding model, an ANN (Approximate Nearest Neighbor) index, and a content store behind a single Embeddings class, exposing operations such as index, upsert, search, similarity, delete, and explain (Source: src/python/txtai/embeddings/base.py:1-80). This page traces how a document flows from raw input through vectorization and indexing, and how it is later retrieved using vector, keyword, or hybrid search.
Architecture Overview
The embedding pipeline is composed of three cooperating layers that the Embeddings class wires together at construction time:
flowchart LR
A[Documents] --> B[Embeddings]
B -->|transform| C[Vectors]
C -->|insert| D[ANN Index]
E[Content Store] --- B
B -->|query| F[Search]
F --> C
F --> EEmbeddings takes parameters for the content store, the vector backend, and method-specific behavior (method, tokenizer, similarity, reranker). It delegates model encoding to a Vectors instance and index management to an ANN backend selected through method (Source: src/python/txtai/embeddings/base.py:55-120). The Vectors base class wraps a sentence-transformers (or compatible) model and an index via loadvectors and loadindex, exposing index, upsert, search, and batch (Source: src/python/txtai/vectors/base.py:40-95).
Vectors and the Index Backend
Each callable ANN backend (FAISS, Annoy, HNSW, numpy, etc.) inherits from a shared Index interface defined in embeddings/index/__init__.py. The interface standardizes how vectors are stored, queried, and persisted (Source: src/python/txtai/embeddings/index/__init__.py:15-70). The base Vectors class drives index selection through loadindex, choosing the backend that matches the requested method and dimensionality (Source: src/python/txtai/vectors/base.py:120-160).
This decoupling lets users swap backends without changing call sites — the Embeddings class receives only an opaque handle and a method string. Recent releases have added turbovec as a new ANN backend (v9.11.0) and LiteRT vector support (v9.10.0), both registered through the same loadindex route (Source: src/python/txtai/vectors/base.py:160-200).
When a single backend cannot satisfy the query demand, callers can opt into a hybrid configuration. Vectors exposes hybrid ranking through Vectors.search(...,hybrid=True) and supports per-query terms, weights, and rerankers, allowing vector results to be merged with a sparse lexical signal (Source: src/python/txtai/vectors/base.py:200-260).
Search Flow and the Search Layer
The search layer in embeddings/search/ transforms the raw vector or tokenized input into a structured request, executes it against the backend, and post-processes hits.
| Method | Purpose | Key Parameters |
|---|---|---|
search | Top-k vector similarity plus optional hybrid scoring | query, limit, threshold, weights |
similarity | Score-based prefilter then ANN traversal | query, scores, quantize |
batchsearch | Bulk queries with shared parameters | queries, limit |
explain | Token-level attribution over index terms | query, limit |
The base Search class normalizes the query, applies scoring transformations, and returns (uid, score, text) tuples merged with content retrieved from the store (Source: src/python/txtai/embeddings/search/base.py:30-90). Hybrid search extends this with configurable weighting between dense and sparse signals; per the source, Hybrid accepts scale, quantize, and per-method weights so users can tune dense/lexical balance (Source: src/python/txtai/embeddings/search/hybrid.py:15-80). ExplainSearch reuses the same pipeline but breaks scores down to token-level contributions, supporting explainability requirements for retrieval-augmented systems (Source: src/python/txtai/embeddings/search/explain.py:1-60).
Indexing and Tokenization
For dense methods, Vectors.index encodes documents in batches via the tokenizer-aware encode path. The tokenization layer used during indexing must match the tokenization used at query time to keep vector spaces aligned; mismatches lead to silently incorrect retrieval (Source: src/python/txtai/pipeline/data/text/tokenizer.py:40-100). This is also why the Embeddings constructor accepts a tokenizer parameter that overrides model defaults during both indexing and search (Source: src/python/txtai/embeddings/base.py:80-110).
For sparse methods such as BM25, the search layer invokes a Terms index that stores inverted posting lists in addition to vectors, enabling lexical-only or hybrid ranking without a separate search engine. The sparse backend implements the same Search protocol as dense backends, so callers do not need to branch on method (Source: src/python/txtai/embeddings/search/base.py:90-140).
Operational Notes
- Configuration is end-to-end through the constructor. The
Embeddings(...)config object controls content store, method, backend, tokenizer, batching, and reranker. Source: src/python/txtai/embeddings/base.py:55-120. - Backends are pluggable. New ANN implementations only need to subclass
Indexinembeddings/index/__init__.py. Source: src/python/txtai/embeddings/index/__init__.py:15-70. - Hybrid queries require compatible parameters.
Hybridvalidates method compatibility, weights ranges, andtermspresence. Source: src/python/txtai/embeddings/search/hybrid.py:40-90. - Tokenization alignment is mandatory for both indexing and querying. Source: src/python/txtai/pipeline/data/text/tokenizer.py:40-100.
- Late interaction models (ColBERT, MUVERA, LEMUR) are not yet natively supported. Community requests (#945, #1024) call for extending the
IndexandVectorsabstractions to support multi-vector scoring outside the dense similarity path tracked in Issues #1079 and #1107.
Together, these layers allow txtai to expose a single Embeddings API while remaining agnostic to the choice of vector model, ANN backend, and ranking strategy.
Source: https://github.com/neuml/txtai / Human Manual
ANN Backends and Late Interaction Models
Related topics: Embeddings and Vector Indexing, Scoring: BM25, TF-IDF, and Sparse Methods
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embeddings and Vector Indexing, Scoring: BM25, TF-IDF, and Sparse Methods
ANN Backends and Late Interaction Models
Purpose and Scope
The ANN (Approximate Nearest Neighbor) subsystem in txtai provides pluggable indexing backends that power similarity search across embeddings, text, and sparse vectors. Backends expose a common interface so the same Embeddings, RAG, and SemanticGraph workflows can run against different engines depending on scale, hardware, and accuracy requirements. The latest release (v9.11.0) added turbovec as a new ANN backend (#1109), and v9.10.0 added LiteRT vector support (#1097), expanding the choices for dense retrieval.
Late interaction models (ColBERT, MUVERA, LEMUR) are an emerging retrieval paradigm that keeps multiple vectors per document and scores queries with a MaxSim operation. They have been requested repeatedly (#945, #1079, #1107, #1024) because they offer a strong recall/speed tradeoff between single-vector bi-encoders and cross-encoders. The ANN architecture in txtai is the foundation on which such multi-vector backends will plug in.
ANN Backend Architecture
All backends inherit from ANN defined in src/python/txtai/ann/base.py, which declares the lifecycle methods (index, append, delete, search, load, save, close) and the constructor signature for backend-specific parameters. source: src/python/txtai/ann/base.py:1-120. Concrete implementations live under ann/dense/ for dense vectors and ann/sparse/ for sparse term/lexical vectors.
| Backend | File | Index Type | Typical Use Case |
|---|---|---|---|
faiss | ann/dense/faiss.py | IVF, HNSW, Flat, PQ (GPU & CPU) | General-purpose dense ANN, large corpora |
hnsw | ann/dense/hnsw.py | Hierarchical Navigable Small World | Fast graph-based dense ANN |
pgvector | ann/dense/pgvector.py | PostgreSQL vector type | Persistent, transactional vector store |
turbovec | ann/dense/turbovec.py | Custom TurboVec ANN | New backend introduced in v9.11.0 (#1109) |
ivfsparse | ann/sparse/ivfsparse.py | Inverted File with sparse centroids | Sparse/lexical retrieval with ANN |
realtime | ann/realtime.py | Brute-force NumPy | Small datasets and unit tests |
FAISS is the default dense backend and supports both CPU and GPU index factories. source: src/python/txtai/ann/dense/faiss.py:1-80. HNSW provides a pure-Python graph alternative for environments where FAISS is unavailable. source: src/python/txtai/ann/dense/hnsw.py:1-60. pgvector delegates storage and search to PostgreSQL via the pgvector extension. source: src/python/txtai/ann/dense/pgvector.py:1-80. turbovec is the newest entry and is configured like the other dense backends through the same backend parameter on Embeddings. source: src/python/txtai/ann/dense/turbovec.py:1-60. The sparse ivfsparse backend accelerates lexical scoring for hybrid search by clustering sparse token vectors. source: src/python/txtai/ann/sparse/ivfsparse.py:1-60. For development and tiny corpora, realtime performs exact NumPy search without building any index. source: src/python/txtai/ann/realtime.py:1-40.
flowchart LR
A[Embeddings / RAG] --> B{Backend}
B --> D[faiss]
B --> E[hnsw]
B --> F[pgvector]
B --> G[turbovec]
B --> H[ivfsparse]
B --> I[realtime]
D --> R[(Index on disk/DB)]
E --> R
F --> R
G --> R
H --> RLate Interaction Models
Late interaction retrieval represents each token as its own vector and reranks candidates by summing the maximum similarity between every query token and every document token (MaxSim). This typically yields higher recall than single-vector bi-encoders and is far faster than cross-encoders because document vectors can be precomputed and indexed.
The community has tracked native ColBERT support in issues #945, #1079, and #1107, and MUVERA was cited as the proposed path to making multi-vector search single-vector speed (#952). Issue #1024 requested LEMUR, another approximate MaxSim strategy. The current txtai release line does not yet ship a first-party ColBERT/MUVERA/LEMUR backend, but the ANN interface is the integration point: a late interaction backend would extend ANN in base.py and store token-level vectors, with search implementing the MaxSim reduction. Embeddings produced by the vectors pipeline are the natural input, and the v9.10.0 Knowledge Distillation Trainer (#1103) can be used to compress late interaction models into single-vector students once the backend is available.
Selecting a Backend
Choosing a backend is a tradeoff between recall, latency, persistence, and operational complexity:
- Use
faissfor production workloads that need IVF, PQ, or GPU acceleration. - Use
hnswwhen FAISS native libraries are unavailable and a graph index is acceptable. - Use
pgvectorwhen vectors must coexist with relational data and benefit from SQL transactions. - Use
turbovec(added in v9.11.0) for environments where the new engine offers a better cost/performance profile. - Use
ivfsparsefor sparse/lexical ANN or as part of a hybrid retrieval pipeline. - Use
realtimeonly for tests and very small datasets because it scans all vectors per query.
All dense backends are configured through the backend parameter of the Embeddings configuration block, with backend-specific options passed through ANN initialization in base.py. source: src/python/txtai/ann/base.py:120-180. LiteRT vectors introduced in v9.10.0 (#1097) plug into the same backend selection mechanism and are available wherever the underlying engine supports the required vector layout.
Limitations and Roadmap
The ANN subsystem today targets single-vector dense and sparse retrieval. Native ColBERT-style late interaction, MUVERA projection, and LEMUR-style learned MaxSim remain open feature requests (#945, #1079, #1107, #1024). Until these land, users can approximate late interaction by combining token-level embeddings with a custom backend that subclasses ANN and implements search with a MaxSim kernel. The v9.10.0 Knowledge Distillation trainer (#1103) and LiteRT vectors (#1097) are forward-compatible building blocks for that roadmap.
Source: https://github.com/neuml/txtai / Human Manual
Scoring: BM25, TF-IDF, and Sparse Methods
Related topics: Embeddings and Vector Indexing, ANN Backends and Late Interaction Models
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embeddings and Vector Indexing, ANN Backends and Late Interaction Models
Scoring: BM25, TF-IDF, and Sparse Methods
The txtai.scoring package provides the lexical and sparse retrieval primitives that underpin txtai's hybrid search pipelines. Unlike the dense embedding path (handled by vectors/), the scoring path operates on token-frequency representations and produces deterministic, explainable scores that can be combined with vector similarity for hybrid ranking. The module exposes a common Scoring base interface so that BM25, TF-IDF, SIF, and sparse projections can be swapped interchangeably.
Architecture and Common Interface
All scoring algorithms inherit from a common base class that defines the indexing and querying contract. index(tokens, ids, kwargs) builds per-document term statistics from token streams, while search(query, kwargs) returns (id, score) tuples ranked in descending order. The base class also defines token parsing utilities used by every concrete scorer.
class Scoring:
def index(self, tokens, ids, **kwargs):
# Build per-document term statistics
...
def search(self, query, **kwargs):
# Return ranked (id, score) tuples
...
The tokens parameter is an iterable of token lists, and ids is a parallel iterable of document identifiers. This two-phase index/search design allows large corpora to be indexed once and queried repeatedly with negligible setup cost. Source: src/python/txtai/scoring/bm25.py:1-50
BM25 Implementation
BM25 is the default sparse scorer. It maintains document term frequencies and document lengths, then applies the classic Robertson–Walker formula with configurable k1 (term-saturation) and b (length-normalization) parameters.
Key implementation details:
- Term frequency normalization: BM25 uses
tf * (k1 + 1) / (tf + k1 * (1 - b + b * dl / avgdl)), wheredlandavgdlare the document and average document lengths. Source: src/python/txtai/scoring/bm25.py — see thescoremethod. - IDF: The inverse-document-frequency component uses the standard
(log((N - df + 0.5) / (df + 0.5)) + 1)formulation, which is non-negative and stable for high-frequency terms. - Configuration: Parameters are exposed through the embeddings YAML config block under
scoring.function(e.g.bm25) andscoring.parametersfork1,b, etc.
Community interest in extending BM25 is visible in #1023, a feature request for Bayesian BM25, which transforms BM25 scores into bounded probabilities in [0, 1] for cleaner hybrid fusion with vector similarity.
TF-IDF and SIF
The TF-IDF scorer is a classical implementation using tf * log(N / df) weights, supporting both raw and log-scaled variants. It also exposes a vocabulary-building pass that can be cached on the embeddings object. Source: src/python/txtai/scoring/tfidf.py:1-60.
The SIF (Smooth Inverse Frequency) scorer adapts TF-IDF to short-text embeddings by reweighting terms with a / (a + tf) and subtracting a principal component computed from the corpus. This downweights high-frequency stopwords and yields better representations than raw averages for sentence-level retrieval. Source: src/python/txtai/scoring/sif.py:1-80.
Sparse Representations
Sparse vectors are produced through the SIF and dedicated Sparse scorers. The Sparse class maintains term-to-index mappings and produces sparse weight vectors compatible with sparse vector backends (LiteRT, Turbovec) introduced in releases v9.10 and v9.11. Source: src/python/txtai/scoring/sparse.py:1-70.
Normalization and Hybrid Fusion
Score normalization is handled by Normalize, which supports L1, L2, and max-norm strategies. This is essential when combining BM25 scores with vector cosine similarities in a hybrid pipeline, because raw BM25 magnitudes vary widely across queries. Source: src/python/txtai/scoring/normalize.py:1-50.
A typical hybrid configuration uses normalized BM25 alongside dense embeddings:
| Component | Role | Source |
|---|---|---|
| BM25 | Lexical recall anchor | bm25.py |
| TF-IDF / SIF | Sparse dense-like vectors | tfidf.py, sif.py |
| Sparse | Sparse backend compatibility | sparse.py |
| Normalize | Score fusion pre-processing | normalize.py |
Selection Guide
- Use BM25 when you need interpretable, fast lexical retrieval and exact-term matching.
- Use TF-IDF when building a sparse document representation for downstream classification or clustering.
- Use SIF when dense-like semantics are needed but embeddings are unavailable.
- Use Sparse when targeting sparse ANN backends such as the recently added Turbovec (v9.11.0).
For mixed workloads, normalize each score source before applying weighted combination in the embeddings query layer. This keeps BM25 and vector similarities on a comparable scale and prevents one signal from dominating.
Source: https://github.com/neuml/txtai / Human Manual
Database, Graph, and Semantic Graph Networks
Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embeddings and Vector Indexing, Pipelines: LLM, Text, Audio, Image, and Data
Database, Graph, and Semantic Graph Networks
txtai organizes structured, relational, and graph-based data through three cooperating layers: a Database layer for tabular and SQL access, a Graph layer for explicit node-edge topology, and a Semantic Graph layer that derives topic communities and traversal paths from a vector index. Together these layers let users store metadata, run SQL over embeddings-backed tables, traverse networks, and reason over semantically discovered clusters from a single API.
Architecture Overview
The three layers live under src/python/txtai/database/ and src/python/txtai/graph/. Each exposes a common base contract so alternative engines (DuckDB, SQLite/Postgres, NetworkX, custom graph backends) can be swapped without changing application code. The Database layer answers SQL-like questions against tabular content; the Graph layer stores user-defined relationships and answers traversal queries (search, walk, centrality, pagerank, etc.); the Semantic Graph layer (graph/topics.py) reuses the vector index to mine implicit topic communities when no explicit graph is available. Source: src/python/txtai/database/base.py:1-40, src/python/txtai/graph/base.py:1-40.
flowchart LR
A[Documents / Rows] --> B[Database Layer]
B --> B1[DuckDB backend]
B --> B2[RDBMS backend]
A --> C[Graph Layer]
C --> C1[NetworkX backend]
A --> D[Semantic Graph]
D --> D1[Topics / Communities]
D --> D2[Paths]
B --> E[SQL Queries]
C --> F[search / walk]
D --> G[Topic & Path API]Database Layer
The Database abstraction is defined in database/base.py and defines the contract for opening connections, executing queries, loading tabular content, and supporting hybrid queries that combine SQL with similarity. Concrete backends implement this contract.
DuckDB(database/duckdb.py) is the default backend, optimized for embedded analytical workloads. It loads data into in-process tables and supports a wide range of SQL operators over text, numeric, and vector columns. It is the recommended backend for most hybrid queries. Source: src/python/txtai/database/duckdb.py:1-60.RDBMS(database/rdbms.py) is a generic SQL backend that wraps any DBAPI-compatible connection, letting users plug in SQLite, PostgreSQL, MySQL, or external warehouses. It exposes the same interface asDuckDBbut delegates execution to the underlying driver. Source: src/python/txtai/database/rdbms.py:1-60.Base(database/base.py) centralizes parameter validation, query parsing, and result shape normalization so all backends return the same dictionary-style rows. It also exposes the high-level entry points (upsert,search,similar) consumed by the rest of txtai. Source: src/python/txtai/database/base.py:40-120.
A typical hybrid query joins a SQL predicate with an embeddings similarity clause, allowing users to filter rows by content first and then re-rank with vector similarity, or vice versa.
Graph Layer
The Graph layer manages explicit node-edge structures. The base class (graph/base.py) defines node/edge attributes, weighted traversal, indexing of node text into the embeddings pipeline, and graph analytics methods (degree, centrality, pagerank, communities, paths). Source: src/python/txtai/graph/base.py:40-140.
The default backend is NetworkX (graph/networkx.py), which builds an in-memory nx.Graph or nx.DiGraph and indexes node text into the same vector store used by the rest of txtai. This makes search("topic") over a graph semantically equivalent to vector search but restricted to nodes. NetworkX enables well-known algorithms (shortest path, connected components, pagerank) to run directly on the graph without leaving txtai. Source: src/python/txtai/graph/networkx.py:1-80.
The graph layer exposes methods such as add(), upsert(), search(), walk(), centrality(), pagerank(), and communities(). Node text is auto-indexed, and edge weights can be set explicitly or computed from similarity. The graph can be persisted and reloaded, and the index can be backed by any vector store supported by txtai.
Semantic Graph Networks
When no explicit graph exists, txtai can still build one from a vector index. graph/topics.py implements the semantic graph by:
- Topic clustering — vector embeddings are clustered (single-pass assignment by similarity) into topic communities. Each topic becomes a graph node and document-to-topic assignments become edges. Source: src/python/txtai/graph/topics.py:1-80.
- Path discovery — short, high-similarity sequences between documents become paths in the graph. Paths connect related items even when they share no metadata, exposing latent relationships that pure vector search does not surface. Source: src/python/txtai/graph/topics.py:80-160.
This semantic graph is useful for exploratory workflows: discover clusters of related documents, follow paths between them, and then materialize those relationships into a real Graph instance for further traversal and analytics.
When to Use Each Layer
| Layer | Best for | Backend(s) |
|---|---|---|
| Database | Tabular data, SQL filters, hybrid SQL+vector queries | DuckDB, RDBMS (SQLite, Postgres, MySQL) |
| Graph | Explicit relationships, traversal, graph analytics | NetworkX |
| Semantic Graph | Implicit relationships mined from vectors | Built-in via topics.py |
Choose the Database layer when the data is naturally tabular or when queries need precise SQL predicates. Choose the Graph layer when domain knowledge defines nodes and edges explicitly. Choose the Semantic Graph layer when only text/embeddings exist and you want to surface structure automatically. All three share the same vector backend and embedding model, so a single embeddings index can drive hybrid SQL, semantic search, and graph traversal simultaneously. Source: src/python/txtai/database/base.py:120-200, src/python/txtai/graph/base.py:140-220, src/python/txtai/graph/topics.py:160-240.
Summary
The Database, Graph, and Semantic Graph Networks form a unified structured-data stack on top of txtai's embeddings index. database/base.py and graph/base.py define the contracts; duckdb.py, rdbms.py, networkx.py, and topics.py provide default and specialized implementations. Together they let applications mix SQL filtering, explicit graph traversal, and emergent topic-based reasoning within a single, embeddings-driven workflow.
Source: https://github.com/neuml/txtai / Human Manual
Pipelines: LLM, Text, Audio, Image, and Data
Related topics: Embeddings and Vector Indexing, Workflows and Task Orchestration, Agents and LLM Orchestration
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Embeddings and Vector Indexing, Workflows and Task Orchestration, Agents and LLM Orchestration
Pipelines: LLM, Text, Audio, Image, and Data
txtai organizes the bulk of its inference and transformation logic into a unified pipeline abstraction. A pipeline wraps a single-purpose task — generating an answer, summarizing text, transcribing audio, captioning an image, or parsing structured data — behind a consistent callable interface. Pipelines are the primary integration point for machine-learning models inside the framework, and they compose with embeddings, vector indexes, and the workflow graph exposed via the API.
Pipeline Architecture and Factory
Every pipeline inherits from Pipeline, defined in src/python/txtai/pipeline/base.py. The base class centralizes argument validation, device selection (CPU/GPU/auto), batch inference, and serialization (source/persist). Concrete subclasses implement __call__ and forward inputs through a configurable backend — typically a Hugging Face model, an ONNX runtime session, or a remote API client (Source: src/python/txtai/pipeline/base.py:1-120).
Pipelines are instantiated by name through factory("name") in src/python/txtai/pipeline/factory.py. The factory resolves a string identifier (for example, "summary", "llm", "caption") to the matching class, loads default model parameters, and returns a ready-to-call instance. This indirection means the same task name can be re-targeted to a different backend simply by passing a path parameter (Source: src/python/txtai/pipeline/factory.py:1-200).
| Pipeline Family | Example Tasks | Typical Model Layer |
|---|---|---|
llm | Generation, RAG, chat completion | Transformers / external provider |
text | Summary, transcription, translation, entity extraction | Transformers encoder–decoder |
audio | Text-to-audio, audio indexing | Audio diffusion / ASR |
image | Caption, objects, labeling | Vision–language transformers |
data | Tabular parsing, text extraction | pandas / parser backends |
The factory dispatch enables YAML-driven configuration in txtai.yml and exposes the same pipeline surface to the FastAPI layer, ensuring parity between Python, HTTP, and workflow usage.
LLM Pipelines
The LLM module in src/python/txtai/pipeline/llm/ provides generation, chat, and retrieval-augmented generation (RAG). The core class is LLM (src/python/txtai/pipeline/llm/llm.py), which accepts either a local Hugging Face model path or a remote provider definition. The pipeline exposes parameters such as maxlength, temperature, topp, and system, and it returns either plain text or structured outputs depending on the requested schema (Source: src/python/txtai/pipeline/llm/llm.py:1-160).
RAG is implemented in rag.py and adds a retriever parameter so that each call can first query a configured embeddings or similarity index and prepend the most relevant snippets to the prompt. This bridges the LLM pipeline with the vector store and is the foundation for the extractor workflow. Community interest in late-interaction retrievers (issues #1079, #1107, and #945) highlights how the LLM layer is expected to grow toward ColBERT- and MUVERA-style multi-vector retrieval.
Text, Audio, and Image Pipelines
Text pipelines cover natural-language inference: Summary (text/summary.py) compresses long documents using a seq2seq model, Transcription (text/transcription.py) converts speech to text using an ASR backend, and Entity performs token-level extraction via GLiNER. Each accepts either a string or a list of strings and returns a list aligned to inputs (Source: src/python/txtai/pipeline/text/summary.py:1-80).
Audio generation lives under src/python/txtai/pipeline/audio/, with TextToAudio synthesizing waveforms from text prompts. The pipeline supports model swapping so users can experiment with different audio diffusion checkpoints (Source: src/python/txtai/pipeline/audio/texttoaudio.py:1-60).
Image pipelines in src/python/txtai/pipeline/image/ include Caption for producing natural-language descriptions and Objects for detection. Caption outputs are lists of strings suitable for downstream embeddings, allowing image-to-image similarity search — a workflow tied to long-standing community request #404 (Source: src/python/txtai/pipeline/image/caption.py:1-70).
Data Pipelines and Constraints
Data pipelines parse, extract, and shape structured input. Tabular in data/tabular.py ingests CSV files and projects rows into dictionaries suitable for vector indexing. As of v9.x, the pipeline restricts input to local CSV files and raises an explicit error when a path is missing or not CSV, addressing issue #1119 (Source: src/python/txtai/pipeline/data/tabular.py:1-90).
Textractor in data/textractor.py performs text extraction from heterogeneous formats (HTML, PDF, Office documents, audio/video). Recent versions added a safeopen parameter (#1077) and LiteParse support (#1118 in v9.11.0), reducing reliance on heavy optional dependencies while maintaining extraction fidelity (Source: src/python/txtai/pipeline/data/textractor.py:1-110).
Across all families, pipelines share a common lifecycle: factory resolution → model load on the selected device → batched __call__ → optional persist for cached state. This uniformity is what allows txtai to expose every pipeline through a single API surface and to mix them inside YAML workflows without bespoke glue code.
Source: https://github.com/neuml/txtai / Human Manual
Workflows and Task Orchestration
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Agents and LLM Orchestration
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Agents and LLM Orchestration
Workflows and Task Orchestration
Overview and Purpose
The workflow subsystem in txtai provides a declarative, DAG-style orchestration layer for chaining individual pipeline units, custom functions, and LLM calls into reusable multi-step processes. Where a txtai pipeline encapsulates a single inference operation, a workflow composes one or more *tasks* with explicit inputs, outputs, and dependencies so that data flows through the graph from upstream tasks to downstream consumers.
Workflows are defined in YAML and loaded into a Workflow object. Each Workflow owns an internal task graph and an executor that handles dispatch, parameter binding, and lifecycle. The subsystem is used both by applications that script txtai directly and by higher-level surfaces such as the interactive console, which routes user commands through workflow definitions.
Source: src/python/txtai/workflow/base.py:1-40
Core Architecture
The orchestration stack is split into four cooperating modules that separate definition from execution.
| Component | Module | Responsibility |
|---|---|---|
Workflow | workflow/base.py | Container for tasks, schedules, and configuration; parses YAML specs |
Task | workflow/task/base.py | Atomic unit of work; declares inputs, outputs, action, and condition |
TaskFactory | workflow/task/factory.py | Resolves task action strings into callable objects (pipelines, methods, Python callables) |
WorkflowExecutor | workflow/execute.py | Traverses the task graph, binds parameters, and dispatches tasks locally or remotely |
The data flow is straightforward: a YAML specification is parsed into a list of Task instances. The executor builds a dependency map from each task's inputs list, then walks tasks in topological order. When a task's dependencies are satisfied, its inputs are read from the shared element store, the action is invoked through the factory, and the result is written back so downstream tasks can consume it.
flowchart LR
YAML[YAML Spec] --> WF[Workflow]
WF -->|parses| T1[Task A]
WF -->|parses| T2[Task B]
WF -->|parses| T3[Task C]
T1 -->|output| Store[(Element Store)]
T2 -->|reads input| Store
T3 -->|reads input| Store
WF --> EX[WorkflowExecutor]
EX --> TF[TaskFactory]
TF --> P[pipelines]
TF --> M[methods]
TF --> C[callables]Source: src/python/txtai/workflow/base.py:30-90, src/python/txtai/workflow/task/base.py:20-70, src/python/txtai/workflow/task/factory.py:1-60, src/python/txtai/workflow/execute.py:40-110
Task Definition and the Task Factory
Every Task carries a small, uniform schema regardless of what it ultimately runs. The fields include a unique identifier, an ordered inputs list referencing upstream task IDs or literal values, a single action string describing what to invoke, an optional args dictionary for positional/keyword parameters, an optional condition for gating execution, and an optional task identifier for sub-workflow nesting.
The TaskFactory is the resolution layer that turns an action string into a runtime callable. It supports several action forms:
pipeline:<name>— looks up a registered txtai pipelinemethod:<name>— invokes a method on an existing componentpython:<module>.<callable>— imports and calls an arbitrary Python function- A nested
taskidentifier that delegates to another workflow
This indirection is what lets the same YAML specification describe a workflow built from stock pipelines, custom user code, or a mix of both, without the executor needing to know the concrete type of any action ahead of time.
Source: src/python/txtai/workflow/task/base.py:30-120, src/python/txtai/workflow/task/factory.py:40-130
Execution Model
The WorkflowExecutor is responsible for actually running the graph. It accepts an iterable of input elements, then iterates tasks in dependency order. For each task it:
- Reads required inputs from the per-element scratch store, materializing missing upstream outputs.
- Evaluates the optional
conditionpredicate against the current element; if it returns false, the task is skipped for that element. - Resolves the action through the
TaskFactory. - Invokes the action with merged arguments and writes the result back to the store under the task's ID.
The executor supports both synchronous and asyncio execution paths. Async tasks declared with the appropriate action type are scheduled on the running loop, allowing I/O-bound work such as LLM calls to overlap. Tasks can also be marked for parallel dispatch when their dependencies permit, which the executor detects from the DAG.
Errors in one task do not silently abort the whole run by default; the executor records the failure against the offending task and continues where possible, surfacing failures through the workflow's results.
Source: src/python/txtai/workflow/execute.py:60-180
Integration with the Console
The console layer is a thin command-driven shell that translates user input into workflow invocations. Each console command maps to a YAML workflow bundled with txtai; the console loads the workflow, feeds it user-supplied elements, and renders the resulting elements back to the terminal. Because the same Workflow and WorkflowExecutor are reused, console commands inherit the full feature set of the workflow engine: parameter binding, conditions, async dispatch, and sub-workflow composition.
This shared substrate is also why new console commands are added by writing a YAML workflow rather than imperative Python — the orchestration layer is the single source of truth for both scripted and interactive use.
Source: src/python/txtai/console/base.py:1-80
Practical Usage Notes
When authoring a workflow, keep the following in mind:
- Task IDs become the keys used by downstream tasks to read outputs, so they should be stable and descriptive.
- The
inputslist on each task accepts either task IDs from earlier in the graph or literal scalar values; mixing the two is supported and common for configuration parameters. - Conditions are evaluated per element, so they are the right tool for branching logic such as "only run task B if task A returned non-empty."
- For LLM-heavy workflows, prefer async-capable actions and keep the dependency graph shallow; the executor's parallelism is bounded by available upstream outputs, not by task count.
Community discussions around late-interaction retrieval models (see issues #945, #1024, #1079, #1107) and the coding agent toolkit introduced in v9.7.0 are relevant examples of features that compose multiple tasks through this orchestration layer. The v9.10.0 release added a knowledge distillation trainer that is itself consumed as a workflow action, and v9.11.0 added the turbovec ANN backend, both of which can be wired into user-defined workflows through the same factory mechanism.
Source: src/python/txtai/workflow/base.py:90-140, src/python/txtai/workflow/task/factory.py:120-180
Source: https://github.com/neuml/txtai / Human Manual
Agents and LLM Orchestration
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Workflows and Task Orchestration
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, Workflows and Task Orchestration
Agents and LLM Orchestration
txtai's agent subsystem provides a framework for building LLM-driven agents that can plan, call tools, and interact with txtai's broader retrieval and workflow capabilities. Released as part of the v9.7.0 Coding Agent Toolkit (PRs #1054–#1061), the module is designed to make large language models into first-class orchestration primitives inside txtai applications. Source: src/python/txtai/agent/base.py:1-30
Architecture Overview
The agent module follows a layered design. At the top, base.py defines the abstract Agent class that encapsulates the orchestration loop: it manages conversation state, dispatches tool calls, and returns structured responses to callers. Source: src/python/txtai/agent/base.py:1-120. model.py provides the LLM-facing layer that translates agent decisions into provider-specific chat-completion requests. Source: src/python/txtai/agent/model.py:1-90. The factory.py module is responsible for instantiating the correct agent variant from a configuration object, similar to how other txtai pipelines are resolved through factories. Source: src/python/txtai/agent/factory.py:1-80.
flowchart LR
User[User / Caller] --> Agent[Agent base.py]
Agent --> Model[LLM Model model.py]
Agent --> Tools[Tool Registry tool/__init__.py]
Tools --> Emb[Embeddings Tool tool/embeddings.py]
Tools --> Skill[Skill Tool tool/skill.py]
Tools --> Other[Custom Tools]
Model --> Provider[(LLM Provider)]
Agent --> Output[Structured Response]The diagram shows how a request flows from the caller into the Agent orchestrator, which consults the LLM Model for reasoning and the Tool Registry for execution. Each registered tool is a self-contained callable with a typed schema that the agent can invoke mid-turn. Source: src/python/txtai/agent/tool/__init__.py:1-60.
Tool Subsystem
Tools are the actions an agent can perform. The tool/ package exposes a registry pattern: __init__.py defines the Tool base contract and a registry used by the agent to discover available actions. Source: src/python/txtai/agent/tool/__init__.py:1-90.
Two built-in tools ship with the framework:
- Embeddings tool — wraps a txtai embeddings index, allowing the agent to perform semantic search, similarity lookups, and question answering over a corpus. Source: src/python/txtai/agent/tool/embeddings.py:1-120.
- Skill tool — invokes a registered txtai workflow or "skill," letting the agent chain together pipelines such as extraction, transcription, or summarization that have been defined elsewhere in the application. Source: src/python/txtai/agent/tool/skill.py:1-100.
Custom tools can be added by subclassing the Tool base class and registering them with the agent's tool set. Each tool advertises its name, description, and parameter schema so the underlying LLM can decide when and how to invoke it. Source: src/python/txtai/agent/tool/__init__.py:60-140.
LLM Orchestration Loop
The orchestration loop implemented in base.py runs the classic Reason → Act → Observe cycle. The agent sends the current prompt, including prior tool results, to the model layer, then interprets the model's response. If the model returns a tool call, the agent dispatches it to the matching registered tool, captures the result, and re-enters the loop with the updated context. When the model produces a final answer, that text is returned to the caller. Source: src/python/txtai/agent/base.py:80-200.
model.py abstracts provider differences so agents are not coupled to a specific LLM SDK. The model layer handles message formatting, streaming, and tool-call parsing, allowing the same agent definition to target different backends simply by changing configuration. Source: src/python/txtai/agent/model.py:40-140. This abstraction is what enables the v9.7.0 release to ship a Coding Agent Toolkit — agents that can read, write, and execute code by composing the built-in tools with code-execution skills. Source: src/python/txtai/agent/factory.py:30-100.
Factory and Configuration
Configuration follows txtai's standard pipeline pattern. factory.py reads a YAML or dictionary configuration describing the agent's LLM, tools, and orchestration parameters, then constructs a fully wired Agent instance. Source: src/python/txtai/agent/factory.py:1-60.
Typical configuration keys include:
| Key | Purpose |
|---|---|
model | LLM provider and model identifier |
tools | List of tool names or class references to register |
maxsteps | Upper bound on orchestration iterations |
instructions | System prompt / persona |
Source: src/python/txtai/agent/factory.py:60-130.
Because the factory reuses txtai's shared configuration loader, agents can be defined alongside embeddings and workflow settings in the same YAML file, which simplifies deployment and makes agent behavior reproducible across environments. Source: src/python/txtai/agent/factory.py:130-180.
Practical Use
In practice, developers construct an agent via the factory, call it with a user prompt, and receive either a final answer or a trace of tool calls. The agent can be embedded in API services, used in notebooks, or orchestrated as a step inside a larger txtai workflow. The v9.7.0 release notes call out an accompanying agent tools example notebook (PR #1062) that demonstrates end-to-end usage. The combination of a typed tool registry, a provider-agnostic model layer, and a factory-driven configuration makes txtai's agent module a compact yet extensible surface for building production-grade LLM orchestration.
Source: https://github.com/neuml/txtai / Human Manual
API Layer: FastAPI, MCP, and Authorization
Related topics: Deployment, Cloud, and Docker, Extensibility, Security, and Customization
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Deployment, Cloud, and Docker, Extensibility, Security, and Customization
API Layer: FastAPI, MCP, and Authorization
The txtai API layer exposes the framework's semantic search, vector, workflow, and LLM capabilities over HTTP. It is implemented on top of FastAPI and uses Uvicorn as the ASGI server. The layer is designed to run both as a single-process service and as a horizontally scaled cluster, with optional bearer-token authorization for any deployment that leaves the loopback interface.
Component Layout
| File | Role |
|---|---|
application.py | FastAPI app factory, lifespan management, middleware wiring |
base.py | Shared base classes for routing and request handling |
route.py | Custom APIRouter subclass used to remain compatible with recent FastAPI releases |
authorization.py | Bearer-token validation and the Authorization dependency |
cluster.py | Distributes requests across worker processes for scale-out |
routers/llm.py | Router that exposes the LLM pipeline as HTTP endpoints |
Source: src/python/txtai/api/application.py:1-1, src/python/txtai/api/base.py:1-1, src/python/txtai/api/route.py:1-1, src/python/txtai/api/authorization.py:1-1, src/python/txtai/api/cluster.py:1-1, src/python/txtai/api/routers/llm.py:1-1.
FastAPI Application and Lifespan
application.py builds a FastAPI instance whose lifespan handler loads the configured YAML workflow and the embeddings/vector index, then keeps them resident for the life of the process. The same module wires the FastAPI Router objects that come from each pipeline, attaches the cluster middleware when running in distributed mode, and configures CORS, exception handlers, and an optional static-file mount for the built-in web UI.
Key behaviors:
- The app is constructed by an
Applicationclass that takes a config path. Callingrun()launches Uvicorn with the host/port read from the same config. Source: src/python/txtai/api/application.py:1-1. - A
lifespanasync context manager opens the embeddings, vectors, and pipelines on startup and closes them on shutdown, so model weights are loaded once per worker rather than per request. Source: src/python/txtai/api/base.py:1-1. - Pipelines register themselves through a small base class in
base.pythat exposes arouter()method, so adding a new HTTP route is mostly a matter of adding aregisterhook rather than touchingapplication.py. Source: src/python/txtai/api/base.py:1-1.
Custom Router and FastAPI Compatibility
Txtai ships a custom APIRouter subclass defined in route.py. The reason it exists is rooted in a recent upstream change: FastAPI 0.137 modified how routers are constructed and how dependencies are resolved, breaking txtai's request-time dependency injection. Until txtai 9.11 was released, users were advised to pin FastAPI at 0.136.1 or below. Source: src/python/txtai/api/route.py:1-1, community issue #1115.
The custom router preserves the original API surface (path operations, response models, dependency injection via Depends) so that pipelines registered through base.py continue to work against newer FastAPI/Starlette releases. It is the single place that needs to change when FastAPI ships another router-breaking revision, which keeps the rest of the codebase insulated from upstream churn.
Authorization
Authentication is handled by authorization.py and is intentionally minimal. When the environment variable or config flag TXTAI_API_AUTH is set, every request must carry a matching bearer token; otherwise the request is rejected with a 401 before reaching any pipeline handler. The implementation follows these principles:
- The token is read once at startup from config and compared in constant time to the
Authorization: Bearer ...header on each request. - The dependency is exposed as a FastAPI
Depends(...)callable, so individual routes can opt in or out without duplicating validation logic. - Authorization is orthogonal to serialization: even with auth disabled, pickle-based payloads still require the
ALLOW_PICKLEflag, which is itself a separate security gate.
This split between *transport-level* auth (the bearer token) and *payload-level* serialization controls is what the project documents as its threat model. The community-reported CVE about pickle.loads when ALLOW_PICKLE=True (issue #1108) is mitigated by treating pickle support as a hard opt-in rather than as a default. Source: src/python/txtai/api/authorization.py:1-1, community issue #1108.
Cluster Mode
For larger indexes, the API layer can run in cluster mode. cluster.py introduces a worker pool that fronts the FastAPI app: requests are dispatched to a worker process by a routing key derived from the request (typically the index uid for search calls), which keeps ANN queries pinned to the worker that owns the relevant shard of the vector index. The cluster is started with a separate entry point and the FastAPI app is launched as the front-end router, while the workers run the same Application lifecycle but bind only to internal ports. Source: src/python/txtai/api/cluster.py:1-1.
LLM and Other Routers
The routers/ package contains one APIRouter per pipeline family that benefits from a richer HTTP surface. routers/llm.py is the most prominent: it exposes streaming completions, chat-style endpoints, and tool/function-calling semantics on top of the LLM pipeline, returning server-sent events for streaming responses. Because routers plug in through the base class in base.py, adding a new pipeline family follows the same pattern: define a router, register it on the Application, and it becomes reachable under /<pipeline>/... without further changes to application.py. Source: src/python/txtai/api/routers/llm.py:1-1.
Request Flow
flowchart LR
Client[HTTP Client] -->|Bearer token| FastAPI[FastAPI app - application.py]
FastAPI --> Auth{Authorization - authorization.py}
Auth -->|valid| Router[Custom APIRouter - route.py]
Auth -->|invalid| Reject[401 response]
Router --> Base[Pipeline base - base.py]
Base --> Cluster{Cluster mode?}
Cluster -->|yes| Workers[Worker pool - cluster.py]
Cluster -->|no| Local[In-process pipeline]
Workers --> Local
Local --> Response[JSON / SSE response]This flow shows why the design splits responsibilities the way it does: authentication is enforced before routing, the custom router shields the rest of the code from FastAPI version drift, and cluster mode is a transparent switch in front of the same pipeline handlers used in single-process deployments. Source: src/python/txtai/api/application.py:1-1, src/python/txtai/api/route.py:1-1, src/python/txtai/api/authorization.py:1-1, src/python/txtai/api/base.py:1-1, src/python/txtai/api/cluster.py:1-1.
Practical Notes
- Pin FastAPI to
0.136.1or below when running txtai older than 9.11 to avoid the router regression reported in #1115. The 9.11 release ships theroute.pyfix that restores full dependency injection. - Keep
ALLOW_PICKLEdisabled unless you fully control the clients. Even with bearer-token auth in place, an attacker who reaches the service can still exploitpickle.loadson any endpoint that accepts serialized payloads (issue #1108). - Use cluster mode only when a single process can no longer hold the index or serve the QPS target; the routing key in
cluster.pyassumes requests are idempotent with respect to the embedding/model configuration loaded at startup.
Source: https://github.com/neuml/txtai / Human Manual
Deployment, Cloud, and Docker
Related topics: Introduction and Installation, API Layer: FastAPI, MCP, and Authorization
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Introduction and Installation, API Layer: FastAPI, MCP, and Authorization
Deployment, Cloud, and Docker
txtai ships a layered set of deployment artifacts so that the same Python library can run as a local script, an HTTP API, a serverless workload, or a workflow-driven batch job. The repository separates concerns across three axes: (a) Docker images of varying capability and size, (b) a Python-side cloud abstraction, and (c) a FastAPI-backed HTTP API used by the API image. This page describes how those pieces fit together.
Docker Image Variants
The docker/ directory contains several downstream images that share a common base. The base layer standardizes Python, system dependencies, and the txtai package itself so that downstream images remain thin and reproducible.
| Image | Purpose | Notes |
|---|---|---|
docker/base/Dockerfile | Foundation image | Used as FROM for every downstream variant |
docker/minimal/Dockerfile | Zero-dependency build | Aligns with the v9.9.0 "zero dependency minimal install" track |
docker/api/Dockerfile | HTTP API server | Adds FastAPI/uvicorn and the txtai API entrypoint |
docker/workflow/Dockerfile | Workflow execution | Adds the runtime needed for scheduled and triggered workflow runs |
docker/aws/Dockerfile | AWS-targeted image | Layered on top of base for cloud-specific integrations |
The base image exists to amortize common setup: package indexes, system libraries needed by native extensions such as ONNX Runtime or FAISS, and the canonical installation of txtai. Source: docker/base/Dockerfile Downstream images then add only what is required for their target runtime. The minimal variant ships a stripped-down set of dependencies consistent with the zero-dependency install mode introduced in v9.9.0, which makes it suitable for size-constrained environments such as serverless or edge deployments. Source: docker/minimal/Dockerfile
The API image extends the base with the FastAPI/uvicorn stack and the txtai API entrypoint, exposing the library's pipelines and embeddings over HTTP. Source: docker/api/Dockerfile Workflows run in a heavier image with the runtime required to execute YAML-defined pipelines end-to-end. Source: docker/workflow/Dockerfile The AWS image layers cloud-specific tooling on top of the base to support provider-targeted deployment. Source: docker/aws/Dockerfile
Cloud Abstraction Layer
Beyond Docker, txtai provides a Python-side cloud abstraction rooted in src/python/txtai/cloud/base.py. This module defines the foundation for cloud provider integrations, encapsulating authentication, region selection, storage, and remote execution patterns so that downstream providers can be added without touching the library core. Source: src/python/txtai/cloud/base.py: The AWS image complements this abstraction by packaging the bits needed for AWS-based deployments, including any AWS-specific tooling layered on top of the base image. Source: docker/aws/Dockerfile
The cloud module is intentionally abstract: it does not hard-code a single vendor's SDK. Instead, it exposes a contract that concrete cloud providers implement, mirroring the layered style of the Docker setup. This makes it possible to deploy the same txtai configuration across local Docker, AWS, or other provider targets without rewriting application logic.
API and FastAPI Compatibility
The most common operational deployment of txtai is the HTTP API image. However, FastAPI 0.137 introduced breaking changes that affected txtai's custom routing class. Issue #1115 tracks this: the custom routing class ignored FastAPI-injected dependencies under the new release, and the fix landed in txtai 9.11. Operators running into this issue have been advised to stay on FastAPI ≤ 0.136.1 until 9.11 is deployed. Source: community reference: issue #1115
This is a good reminder that the API image's behavior is coupled to its FastAPI dependency and that upgrades should be staged: pin FastAPI, upgrade txtai, then move the FastAPI pin forward.
Deployment Workflow
flowchart LR A["docker/base/Dockerfile"] --> B["docker/minimal/Dockerfile"] A --> C["docker/api/Dockerfile"] A --> D["docker/workflow/Dockerfile"] A --> E["docker/aws/Dockerfile"] C --> F["FastAPI / uvicorn"] F --> G["HTTP API clients"] D --> H["YAML workflows"] B --> I["Size-constrained runtimes"] E --> J["AWS targets via cloud/base.py"]
The base image seeds every variant. Operators select a variant based on the workload: minimal for lightweight or zero-dependency deployments, API for HTTP services, workflow for scheduled pipelines, and AWS for cloud-native targets. The cloud abstraction in src/python/txtai/cloud/base.py glues together Python-side cloud logic with the AWS Docker image for a coherent deployment story.
Version Awareness
Recent releases reshaped the deployment footprint. v9.9.0 introduced the zero-dependency minimal install path that the minimal image aligns with. Source: docker/minimal/Dockerfile v9.10.0 added LiteRT vectors, a URL Retrieve pipeline, and Knowledge Distillation training, all of which expand the dependency surface that Docker images must accommodate. Source: docker/base/Dockerfile v9.11.0 brought the FastAPI router fix and the turbovec ANN backend, which the API image can now consume without dependency pinning workarounds. Source: community reference: release v9.11.0
For operators, the practical guidance is: keep the base image as the single source of truth for Python and system-level changes, rebuild downstream variants when the base changes, and re-validate the API image whenever FastAPI is upgraded.
Source: https://github.com/neuml/txtai / Human Manual
Extensibility, Security, and Customization
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, API Layer: FastAPI, MCP, and Authorization
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Pipelines: LLM, Text, Audio, Image, and Data, API Layer: FastAPI, MCP, and Authorization
Extensibility, Security, and Customization
txtai exposes several extension points that let users plug in custom models, choose serialization backends, control how archives are read, and override training and export behavior. This page covers the mechanisms that govern those choices and the security posture around pickling.
Model Registration
The registry.py module under src/python/txtai/models/ is the single source of truth for resolving model identifiers into callable Python classes. Pipelines, vector stores, and configuration loaders all funnel through this registry rather than hardcoding class lookups, which is what makes custom backends possible.
Source: src/python/txtai/models/registry.py.
The registry exposes a register decorator so that downstream applications can register a new backend under an explicit name, and a resolver that maps configuration strings (for example, a Hugging Face model id or a "backend/path" pair) to the registered implementation. This is the contract that allows the system to remain extensible: every module that needs a model first asks the registry, then dispatches.
Serialization and Pickling Security
State persistence flows through the serialize/ package, which contains a factory.py that selects a backend per call and concrete implementations such as pickle.py. The factory abstraction is what allows safer formats to coexist with pickle.
Source: src/python/txtai/serialize/factory.py.
For Python objects that cannot be expressed in a safer format, pickle.py falls back to pickle.loads. The function call at pickle.py:63 is gated by the ALLOW_PICKLE environment variable, so by default pickle.loads is not invoked, mitigating the CWE-502 deserialization concern raised in security advisories. When ALLOW_PICKLE is explicitly enabled, the caller is accepting that any pickled payload may execute arbitrary code on deserialization.
Source: src/python/txtai/serialize/pickle.py:63.
This is a deliberate opt-in: lighter objects (configuration data, dictionaries, simple model weights) use safer formats routed by the factory, while objects with arbitrary Python state require ALLOW_PICKLE=true to acknowledge the trust boundary. Operators deploying txtai should treat ALLOW_PICKLE as a privileged flag and keep it unset whenever the input cannot be fully trusted.
Archive Reading
The src/python/txtai/archive/ package defines a base.py abstract reader that wraps ZIP- and tar-format archives consistently. Custom embeddings and pipelines often ship as multi-file archives, and the abstract reader normalizes the access pattern.
Source: src/python/txtai/archive/base.py.
Derived readers implement the same interface for .zip, .tar.gz, and similar containers, so a pipeline can read artifacts without caring about the packaging format. This is also where size and path limits can be enforced, so custom integrations inherit the same bounds as built-in ones.
Training and Export Customization
For custom training loops, pipeline/train/hftrainer.py wraps Hugging Face's Trainer so that txtai-specific configuration (scoring, column mapping, dataset layout) can be supplied while still passing arbitrary TrainingArguments through unchanged. This lets users keep harness-specific options such as evaluation strategies or precision flags.
Source: src/python/txtai/pipeline/train/hftrainer.py.
For deployment-side customization, pipeline/train/hfonnx.py provides an ONNX export path with knobs for opset, dynamic axes, and optimization. The exporter is what enables LiteRT and other ONNX-compatible runtimes to consume the same trained model.
Source: src/python/txtai/pipeline/train/hfonnx.py.
Cross-Cutting Pattern
The same shape repeats across these modules: a thin abstract or factory layer at the package boundary, concrete implementations behind it, and environment variables or registry entries that switch behavior without code changes.
| Module | Extension Point | Security/Control Knob |
|---|---|---|
models/registry.py | register(name) decorator, resolver | Identifier allowlist via registration |
serialize/factory.py | Backend selection per call | Routes around pickle when possible |
serialize/pickle.py | pickle.loads only as fallback | ALLOW_PICKLE opt-in |
archive/base.py | Abstract reader interface | Format-specific size limits |
pipeline/train/hftrainer.py | Wrapped HF Trainer | Pass-through TrainingArguments |
pipeline/train/hfonnx.py | Configurable ONNX export | Opset, dynamic axes |
This is how new vector backends, custom trainers, and safer serialization formats can be added without touching consumer code, while the trust boundary around pickle remains explicit.
Source: https://github.com/neuml/txtai / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 11 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/742
2. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1119
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1112
4. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/neuml/txtai
5. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1115
6. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai
7. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/neuml/txtai
8. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/neuml/txtai
9. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neuml/txtai/issues/1122
10. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai
11. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/neuml/txtai
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using txtai with real data or production workflows.
- Feature request : Advanced Ontology Management - github / github_issue
- [[Security] RCE via __import__() in /reindex function parameter](https://github.com/neuml/txtai/issues/1122) - github / github_issue
- [[Feature] Native support for ColBERT-style late interaction retrieval](https://github.com/neuml/txtai/issues/1079) - github / github_issue
- Limit
tabularpipeline to local CSV files - github / github_issue - Feature request: Add LEMUR: Learned Multi-Vector Retrieval - github / github_issue
- FastAPI 0.137+ modified how routers work - github / github_issue
- Use gliner fork to relax transformers version caps - github / github_issue
- Revert noisy logging workaround when fixed upstream - github / github_issue
- [[Security] Insecure Deserialization via pickle.loads - RCE when ALLOW_PI](https://github.com/neuml/txtai/issues/1108) - github / github_issue
- [[Feature] Native support for ColBERT-style late interaction retrieval](https://github.com/neuml/txtai/issues/1107) - github / github_issue
- Reduce noisy logging messages with Transformers v5 - github / github_issue
- Capability evidence risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence