# https://github.com/timescale/pgai Project Manual

Generated at: 2026-06-23 15:52:14 UTC

## Table of Contents

- [Overview & System Architecture](#page-1)
- [Vectorizer Pipeline & Worker](#page-2)
- [Semantic Catalog (Text-to-SQL)](#page-3)
- [Embedding Providers, Model Integration & Known Issues](#page-4)

<a id='page-1'></a>

## Overview & System Architecture

### Related Pages

Related topics: [Vectorizer Pipeline & Worker](#page-2), [Semantic Catalog (Text-to-SQL)](#page-3), [Embedding Providers, Model Integration & Known Issues](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/timescale/pgai/blob/main/README.md)
- [projects/extension/README.md](https://github.com/timescale/pgai/blob/main/projects/extension/README.md)
- [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
- [examples/embeddings_from_documents/README.md](https://github.com/timescale/pgai/blob/main/examples/embeddings_from_documents/README.md)
- [examples/text_to_sql/README.md](https://github.com/timescale/pgai/blob/main/examples/text_to_sql/README.md)
- [examples/evaluations/litellm_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)
- [scripts/vectorizer-load-test/README.md](https://github.com/timescale/pgai/blob/main/scripts/vectorizer-load-test/README.md)
- [examples/evaluations/voyage_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/voyage_vectorizer/README.md)

</details>

# Overview & System Architecture

## Purpose and Scope

pgai is a Python library and PostgreSQL extension that turns PostgreSQL into the retrieval engine behind production-ready Retrieval-Augmented Generation (RAG) and Agentic applications. It does this by automatically creating and maintaining vector embeddings for data stored in PostgreSQL using a configurable component called a **vectorizer**.

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

The project is organized into three primary deliverables:

| Deliverable | Description |
|---|---|
| `pgai` Python library | Installs the PostgreSQL extension objects and provides a Python SDK (e.g., `Worker`, `CreateVectorizer`) |
| `ai` PostgreSQL extension | Provides SQL-callable functions such as `ai.create_vectorizer`, `ai.embedding_openai`, `ai.ollama_generate`, `ai.text_to_sql`, and `ai.create_semantic_catalog` |
| Vectorizer worker | A stateless process that reads vectorizer configuration, processes the work queue, calls embedding providers, and writes results back into the database |

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md), [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)

pgai works with any PostgreSQL database, including Timescale Cloud, Amazon RDS (a recurring community request tracked in issue #304), Supabase, and self-hosted installations.

## High-Level Architecture

The system is designed around three roles: the **application**, the **PostgreSQL database**, and one or more **stateless vectorizer workers**. The application defines a vectorizer configuration that specifies how to embed data from sources like PostgreSQL columns or remote URIs (e.g., S3). Workers poll the database, consume work items, and write resulting embeddings back. The application then queries these embeddings to power semantic search, RAG, and text-to-SQL.

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

```mermaid
flowchart LR
    A[Application<br/>INSERT / UPDATE / DELETE] --> B[(PostgreSQL<br/>ai schema)]
    B <--> C[Vectorizer Worker<br/>stateless, scalable]
    C --> D[Embedding Provider<br/>OpenAI / Ollama / Voyage / Cohere / ...]
    D --> C
    C --> B
    A --> E[Semantic Search / RAG<br/>using pgvector and pgvectorscale]
    E --> B
```

The decoupling of write operations from embedding generation is the central resilience property of the architecture: if an embedding provider is slow or failing, application writes to the source table still succeed, and the worker simply retries queued work later.

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

## Core Components

### The Vectorizer Pipeline

A vectorizer is a configurable, sequential pipeline. Each stage is composed independently and is mapped 1:1 with an `ai.*` SQL helper used in the `ai.create_vectorizer` call:

- **Loading** — defines the data source: either a column of the source table (`ai.loading_column`) or a URI pointing to a file or S3 object (`ai.loading_uri`).
- **Parsing** — for non-text documents (PDF, HTML, Markdown, XLSX), defines how to extract text.
- **Chunking** — defines how text is split into chunks.
- **Formatting** — defines how each chunk is decorated before embedding (e.g., prepending the document title).
- **Embedding** — specifies the LLM provider, model, and parameters (e.g., `ai.embedding_openai`, `ai.embedding_ollama`, `ai.embedding_voyageai`, or `ai.embedding_litellm` for Cohere, HuggingFace, Mistral, Azure OpenAI, AWS Bedrock, and Vertex AI).

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md), [examples/embeddings_from_documents/README.md](https://github.com/timescale/pgai/blob/main/examples/embeddings_from_documents/README.md)

### The Vectorizer Worker

The worker is a Python process that polls the database for queued work created by triggers or change events on vectorized tables. It can be invoked as:

- A one-shot process: `worker = Worker(DB_URL, once=True); worker.run()` (suitable for batch backfills).
- A continuously-running background process (CLI, Docker container, or in-process FastAPI lifespan).
- A horizontally-scaled set of replicas, since workers are stateless.

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md), [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)

A known community issue (#925) reports that `from pgai import Worker` fails because the `Worker` class lives behind the `vectorizer-worker` extra; users must install `pip install "pgai[vectorizer-worker]"` correctly and import from the appropriate submodule.

Source: [community context, issue #925](https://github.com/timescale/pgai/issues/925)

### Model Calling from SQL

Beyond the vectorizer pipeline, pgai exposes SQL-callable LLM functions so users can do classification, summarization, or data enrichment directly in queries. Examples include `ai.ollama_generate`, `ai.ollama_chat_complete`, and provider-specific embedders. The RAG helper function in the project README composes similarity search with `ai.ollama_embed` against a `*_embedding` view and then calls `ai.ollama_chat_complete` to produce an answer.

Source: [projects/extension/README.md](https://github.com/timescale/pgai/blob/main/projects/extension/README.md)

### Semantic Catalog and Text-to-SQL

The semantic catalog lets users attach natural-language descriptions and example SQL statements to database objects. `ai.create_semantic_catalog` provisions vectorizers for the descriptions and configures the chat-completion provider used by `ai.text_to_sql`. The function performs a semantic search of the catalog, selects relevant objects and examples, and feeds them to an LLM to author SQL. A known bug (#926) reports that text-to-SQL can get stuck in refinement iterations until it exhausts the allowed budget.

Source: [examples/text_to_sql/README.md](https://github.com/timescale/pgai/blob/main/examples/text_to_sql/README.md), [community context, issue #926](https://github.com/timescale/pgai/issues/926)

## Typical Data Flow

1. **Install**: `pgai.install(DB_URL)` creates the `ai` schema and extension objects.
2. **Create a vectorizer**: The application runs `ai.create_vectorizer(...)` (or `CreateVectorizer(...).to_sql()` from Python), which writes a configuration row and installs triggers on the source table.
3. **Insert / update source data**: The application performs ordinary DML. Triggers enqueue work rows; the application's write latency is unaffected by embedding latency.
4. **Worker processes the queue**: Workers claim pending rows, perform the loading → parsing → chunking → formatting → embedding pipeline, and write results into the destination (typically a view over a storage table).
5. **Query**: The application runs pgvector similarity queries against the destination view, optionally composed with chat-completion functions for RAG or text-to-SQL.

Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md), [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)

## Known Architectural Considerations

- **Embedding provider reliability**: Workers batch and retry, but prolonged outages can grow the queue. The project ships a load-test harness (`scripts/vectorizer-load-test/`) that creates a ~1.5M-row `wiki` table for stress testing.
- **HTTP client lifecycle in embedders**: Community issue #919 reports that AsyncOpenAI, Ollama, and VoyageAI embedders can leak HTTP connections in `CLOSE_WAIT` over time, eventually exhausting file descriptors. Long-running deployments should monitor open FDs and reuse clients.
- **Version coupling between extension and library**: The extension (currently `extension-0.11.2`) and Python library (`pgai-v0.12.x`) are versioned and released independently; users should consult release notes when upgrading.
- **Outdated dependency pins**: Issue #915 reports stale upper bounds on `litellm`, `openai`, and similar libraries that conflict with newer ecosystems.

Source: [scripts/vectorizer-load-test/README.md](https://github.com/timescale/pgai/blob/main/scripts/vectorizer-load-test/README.md), [community context, issues #915 and #919](https://github.com/timescale/pgai/issues/915)

## See Also

- [Vectorizer Overview](/docs/vectorizer/overview.md)
- [Vectorizer API Reference](/docs/vectorizer/api-reference.md)
- [Vectorizer Worker](/docs/vectorizer/worker.md)
- [Semantic Catalog](/docs/semantic_catalog/README.md)
- [Handling API Keys](/projects/extension/docs/security/handling-api-keys.md)

---

<a id='page-2'></a>

## Vectorizer Pipeline & Worker

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Embedding Providers, Model Integration & Known Issues](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/timescale/pgai/blob/main/README.md)
- [projects/extension/README.md](https://github.com/timescale/pgai/blob/main/projects/extension/README.md)
- [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
- [scripts/vectorizer-load-test/README.md](https://github.com/timescale/pgai/blob/main/scripts/vectorizer-load-test/README.md)
- [examples/evaluations/litellm_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)
- [examples/evaluations/voyage_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/voyage_vectorizer/README.md)
- [projects/pgai/pgai/vectorizer/vectorizer.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/vectorizer.py)
- [projects/pgai/pgai/vectorizer/worker.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/worker.py)
- [projects/pgai/pgai/vectorizer/create_vectorizer.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/create_vectorizer.py)
- [projects/pgai/pgai/vectorizer/loading.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/loading.py)
- [projects/pgai/pgai/vectorizer/parsing.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/parsing.py)
</details>

# Vectorizer Pipeline & Worker

## Overview

The Vectorizer is the core component of pgai that automatically creates and maintains vector embeddings for data stored in PostgreSQL. It is designed to be resilient: data modifications performed by the application (INSERT, UPDATE, DELETE) are decoupled from the embedding process, ensuring that intermittent failures, rate limits, or latency spikes in LLM endpoints never affect primary data operations. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

A vectorizer is a configurable pipeline defined through a single SQL call (`ai.create_vectorizer`) that runs against any PostgreSQL database, including Timescale Cloud, Amazon RDS, and Supabase. The vectorizer produces a destination view containing the original columns plus a vector `embedding` column and a `chunk` column holding the chunked text corresponding to each embedding. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

The system consists of three logical parts: the application that defines a vectorizer configuration, the PostgreSQL database that stores configuration and queue state in the `ai` schema, and one or more stateless vectorizer workers that perform the actual embedding work. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

## Pipeline Architecture

A vectorizer defines a five-stage pipeline that is applied in sequence to source data. Each stage is independently configurable. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

```mermaid
flowchart LR
    A[Source Table<br/>or S3 URI] --> B[Loading]
    B --> C[Parsing]
    C --> D[Chunking]
    D --> E[Formatting]
    E --> F[Embedding]
    F --> G[Destination View<br/>with embeddings]
```

The pipeline stages are described in the table below.

| Stage | Responsibility | Source |
|-------|---------------|--------|
| Loading | Pulls data either from a column of the source table or from a URI (S3, file, etc.) referenced by a column | [README.md](https://github.com/timescale/pgai/blob/main/README.md) |
| Parsing | Converts non-text documents (PDF, HTML, markdown) into text when the loader returned a URI | [README.md](https://github.com/timescale/pgai/blob/main/README.md) |
| Chunking | Splits the text into smaller pieces sized appropriately for the embedding model | [README.md](https://github.com/timescale/pgai/blob/main/README.md) |
| Formatting | Shapes each chunk before it is sent for embedding (e.g., prepend the document title) | [README.md](https://github.com/timescale/pgai/blob/main/README.md) |
| Embedding | Calls the configured LLM provider (Ollama, OpenAI, Voyage AI, Cohere, HuggingFace, Mistral, Azure OpenAI, AWS Bedrock, Vertex AI) to produce a vector | [README.md](https://github.com/timescale/pgai/blob/main/README.md) |

The pipeline is created with the SQL function `ai.create_vectorizer`, which accepts the source relation and stage-specific configuration values such as `ai.loading_column(column_name=>'text')`, `ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536')`, and `ai.destination_table(view_name=>'wiki_embedding')`. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

The Python package ships convenience builders under `projects/pgai/pgai/vectorizer/` (for example `create_vectorizer.py`) so that the same configuration can be expressed as Python dataclasses and translated to SQL with a `.to_sql()` call. Source: [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)

## Worker Operation and Deployment

The vectorizer worker is a stateless Python process that polls the database for pending items and executes the pipeline against them. A worker is instantiated with a database URL and started with `worker.run()`. Setting `once=True` causes the worker to drain the queue and exit, which is useful for batch jobs, cron schedules, or smoke tests. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)

A typical lifecycle inside an application looks like:

1. Install the database objects with `pgai.install(DB_URL)`, which creates the `ai` schema and its catalog tables. Source: [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
2. Define the vectorizer (e.g., `ai.create_vectorizer(...)` or `CreateVectorizer(...).to_sql()`). Source: [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
3. Run the worker either as a sidecar process, a CLI, or in a Docker container. Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)
4. Query the destination view with the pgvector cosine distance operator (`<=>`) to perform semantic search. Source: [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)

Because the worker is stateless, you can horizontally scale by running multiple workers. The `ai.vectorizer_status` view reports the live state of each vectorizer, and `ai.vectorizer_errors` surfaces failures that need operator attention. Source: [examples/evaluations/litellm_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)

## Common Failure Modes and Operational Notes

Several recurring community-reported issues map directly to the vectorizer pipeline and worker.

- **HTTP client connection leaks in embedders**: The `AsyncOpenAI`, `Ollama`, and `VoyageAI` embedders historically created HTTP clients that were never closed, exhausting file descriptors under sustained load. Long-running workers should be restarted on a schedule or the leak fixed at the embedder level. Source: community issue #919 in context.
- **Outdated dependency pins**: Strict version constraints on `litellm` and `openai` in the vectorizer worker prevent adoption alongside newer AI/ML libraries. Source: community issue #915 in context.
- **llama.cpp / OpenAI-compatible endpoints**: Workers configured against OpenAI-compatible servers (e.g., llama.cpp) may fail at the `tiktoken` tokenization step because no matching tokenizer exists for the model; users either patch the tokenizer or run an OpenAI-compatible proxy. Source: community issue #850 in context.
- **Ollama version compatibility**: Changes in the Ollama HTTP API above version 13.3 broke `ollama_embed` in pgai extension 0.11.2 / library 0.12.1. Pinning Ollama to <= 13.3 or upgrading the extension is required. Source: community issue #921 in context.
- **Worker import path**: `from pgai import Worker` fails when installing with `pip install "pgai[vectorizer-worker]"` because the extra does not yet expose the `Worker` symbol at the package root in some releases. Source: community issue #925 in context.
- **Empty-string batching**: Documents with empty chunks historically caused pipeline failures; the worker now skips empty strings when batching. Source: release notes pgai v0.11.4 in context.
- **Quantization and storage cost**: For high-dimensional embeddings (e.g., 8,192-dim) the storage cost of `float` vectors is substantial; RaBitQ-style quantization is an open feature request tracked in community issue #920.

The repository also ships a load-test harness under `scripts/vectorizer-load-test/` that materialises approximately 1.5M rows into a `wiki` table to exercise the worker at scale. Source: [scripts/vectorizer-load-test/README.md](https://github.com/timescale/pgai/blob/main/scripts/vectorizer-load-test/README.md)

## See Also

- [Vectorizer Overview and API Reference](https://github.com/timescale/pgai/blob/main/docs/vectorizer/overview.md)
- [Embedding Model Configuration](https://github.com/timescale/pgai/blob/main/docs/vectorizer/api-reference.md#embedding-configuration)
- [Worker Deployment Guide](https://github.com/timescale/pgai/blob/main/docs/vectorizer/worker.md)
- [Python Integration Helpers](https://github.com/timescale/pgai/blob/main/docs/vectorizer/python-integration.md)
- [Simple FastAPI Example](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
- [LiteLLM Embedding Evaluation](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)
- [Voyage AI Vectorizer Quickstart](https://github.com/timescale/pgai/blob/main/docs/vectorizer/quick-start-voyage.md)

---

<a id='page-3'></a>

## Semantic Catalog (Text-to-SQL)

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Embedding Providers, Model Integration & Known Issues](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [docs/semantic_catalog/README.md](https://github.com/timescale/pgai/blob/main/docs/semantic_catalog/README.md)
- [docs/semantic_catalog/cli.md](https://github.com/timescale/pgai/blob/main/docs/semantic_catalog/cli.md)
- [docs/semantic_catalog/quickstart-your-data.md](https://github.com/timescale/pgai/blob/main/docs/semantic_catalog/quickstart-your-data.md)
- [docs/semantic_catalog/quickstart-demo-data.md](https://github.com/timescale/pgai/blob/main/docs/semantic_catalog/quickstart-demo-data.md)
- [docs/semantic_catalog/more-questions.md](https://github.com/timescale/pgai/blob/main/docs/semantic_catalog/more-questions.md)
- [projects/pgai/pgai/semantic_catalog/semantic_catalog.py](https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/semantic_catalog/semantic_catalog.py)
- [examples/text_to_sql/README.md](https://github.com/timescale/pgai/blob/main/examples/text_to_sql/README.md)
- [projects/extension/README.md](https://github.com/timescale/pgai/blob/main/projects/extension/README.md)
- [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
- [docs/vectorizer/overview.md](https://github.com/timescale/pgai/blob/main/docs/vectorizer/overview.md)
</details>

# Semantic Catalog (Text-to-SQL)

## Overview and Purpose

The Semantic Catalog is a pgai subsystem that lets users pose a natural-language question in SQL and receive an executable SQL statement generated by a large language model (LLM). It augments PostgreSQL's built-in `pg_catalog` with a machine-readable layer of *descriptions* and *example queries* about schema objects, then uses semantic search to inject the most relevant descriptions into the LLM prompt. The motivation and original design are recorded in the long-standing feature request [Issue #24 "Text-to-SQL"](https://github.com/timescale/pgai/issues/24), which asked whether embedding the `pg_catalog` could power text-to-SQL from inside the database.

The user-facing entry point is the `ai.text_to_sql(...)` SQL function. A working example from the demo database illustrates the intended behavior:

```sql
select ai.text_to_sql('How many flights arrived in Houston, TX in June 2024?');
```

returns a candidate query such as `SELECT COUNT(*) ... FROM postgres_air.flight WHERE arrival_airport = 'IAH' ...` Source: [examples/text_to_sql/README.md:1-25](). The catalog itself is provisioned through `ai.create_semantic_catalog(...)`, which configures both the embedding model and the LLM that will perform SQL generation. The two providers do not have to be the same; the example uses OpenAI for both via `ai.embedding_openai('text-embedding-3-small', 1024)` and `ai.text_to_sql_openai(model=>'o3-mini')` Source: [examples/text_to_sql/README.md:30-50]().

## Architecture and Data Flow

Conceptually the catalog stores three classes of object: schema object descriptions (tables, views, columns, functions), example question/SQL pairs, and short object-level summaries. pgai then attaches a Vectorizer to this catalog so that the embeddings of every description stay in sync as the schema changes, exactly the same way a regular vectorizer keeps embeddings in sync with application tables. This is documented as "pgai uses Vectorizers to create and maintain embeddings for each of the descriptions stored in the semantic catalog" Source: [examples/text_to_sql/README.md:50-65]().

At query time, `ai.text_to_sql` performs a semantic search against the catalog to identify the schema objects and example statements most relevant to the user question, then asks the configured LLM to author a SQL statement that uses those objects. The relevant results are provided to the LLM as context, so the LLM does not have to be retrained on the user's schema Source: [examples/text_to_sql/README.md:55-75]().

```mermaid
flowchart LR
    A[Natural language question] --> B[ai.text_to_sql]
    B --> C[Semantic search over<br/>Semantic Catalog]
    C --> D[Top-k object descriptions<br/>+ example queries]
    D --> E[Prompt assembly]
    E --> F[LLM provider<br/>e.g. text_to_sql_openai]
    F --> G[Candidate SQL]
    G --> H[Optional refinement loop<br/>bounded by max iterations]
    H -->|still failing| C
    H -->|ok| I[Return SQL to caller]
```

The refinement step is internal to `ai.text_to_sql`; the LLM may be re-prompted with diagnostic context if the generated SQL fails validation. Community evidence indicates that this loop is bounded, and an open bug ([Issue #926](https://github.com/timescale/pgai/issues/926)) reports the function becoming "stuck in a refinement loop" until it "exhausts the number of allowed iterations and then throws an exception." This is the most prominent failure mode users encounter today.

## Setup and Configuration

The semantic catalog is gated behind a feature flag and must be enabled before the extension is created. The demo bootstrap sequence is:

```sql
drop extension if exists ai cascade;
drop schema if exists ai cascade;
select set_config('ai.enable_feature_flag_text_to_sql', 'true', false);
create extension ai cascade;
```

Source: [examples/text_to_sql/README.md:10-20](). The catalog is then created in a single call:

```sql
select ai.create_semantic_catalog
( embedding  => ai.embedding_openai('text-embedding-3-small', 1024)
, text_to_sql => ai.text_to_sql_openai(model=>'o3-mini')
);
```

Source: [examples/text_to_sql/README.md:30-45](). The `embedding` argument accepts any vectorizer-style embedding config, so the same provider-mixing described in the vectorizer overview applies: "any provider supported by Vectorizer" can be used for embeddings, and a different provider can be used for chat completions Source: [examples/text_to_sql/README.md:35-50]().

After creation, the catalog must be populated. The `Generate Descriptions` step in the demo walks the configured database, producing natural-language descriptions of tables, views, columns, and functions, and stores them in catalog tables. The Vectorizer then asynchronously embeds those rows. API keys are required at worker run time and follow the standard pgai conventions documented in `projects/extension/docs/security/handling-api-keys.md` Source: [examples/text_to_sql/README.md:55-65]().

The catalog is decoupled from the data pipeline, mirroring the broader pgai architecture described in the project README: "data modifications made by the application are decoupled from the embedding process, ensuring that failures in the embedding service do not affect the core data operations" Source: [projects/extension/README.md:30-50](). The Python integration uses the same `CreateVectorizer` builder pattern shown in the FastAPI example, where vectorizer definitions are produced as SQL statements and executed against the database Source: [examples/simple_fastapi_app/README.md:30-50]().

## Known Issues and Limitations

| Symptom | Reference | Notes |
| --- | --- | --- |
| `ai.text_to_sql` exhausts the refinement iteration budget and throws | [Issue #926](https://github.com/timescale/pgai/issues/926) | The function has a bounded internal loop; pathological prompts can hit the cap. |
| Feature is off by default | Source: [examples/text_to_sql/README.md:10-20]() | Requires the `ai.enable_feature_flag_text_to_sql` GUC and a fresh `create extension`. |
| Provider coverage | Source: [examples/text_to_sql/README.md:35-50]() | Only providers with a corresponding `ai.text_to_sql_<provider>` config can author SQL; embedding and authoring providers are chosen independently. |

Practitioners should treat the generated SQL as a candidate, not a final answer, especially in environments with unusual schemas where the refinement loop is more likely to iterate to its limit.

## See Also

- [Vectorizer Overview](/docs/vectorizer/overview.md)
- [Text-to-SQL demo README](/examples/text_to_sql/README.md)
- [pgai extension README](/projects/extension/README.md)
- [Issue #24 — original Text-to-SQL request](https://github.com/timescale/pgai/issues/24)
- [Issue #926 — refinement loop bug](https://github.com/timescale/pgai/issues/926)

---

<a id='page-4'></a>

## Embedding Providers, Model Integration & Known Issues

### Related Pages

Related topics: [Overview & System Architecture](#page-1), [Vectorizer Pipeline & Worker](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/timescale/pgai/blob/main/README.md)
- [projects/extension/README.md](https://github.com/timescale/pgai/blob/main/projects/extension/README.md)
- [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)
- [examples/embeddings_from_documents/README.md](https://github.com/timescale/pgai/blob/main/examples/embeddings_from_documents/README.md)
- [examples/evaluations/litellm_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)
- [examples/evaluations/voyage_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/voyage_vectorizer/README.md)
- [examples/evaluations/ollama_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/ollama_vectorizer/README.md)
- [scripts/vectorizer-load-test/README.md](https://github.com/timescale/pgai/blob/main/scripts/vectorizer-load-test/README.md)
- [Releases: extension-0.11.2](https://github.com/timescale/pgai/releases/tag/extension-0.11.2)
- [Releases: pgai-v0.12.1](https://github.com/timescale/pgai/releases/tag/pgai-v0.12.1)
- [Releases: pgai-v0.12.0](https://github.com/timescale/pgai/releases/tag/pgai-v0.12.0)
- [Releases: pgai-v0.11.4](https://github.com/timescale/pgai/releases/tag/pgai-v0.11.4)
- [Issue #919 — HTTP client connection leak in embedders](https://github.com/timescale/pgai/issues/919)
- [Issue #921 — ollama_embed error on ollama higher 13.3](https://github.com/timescale/pgai/issues/921)
- [Issue #925 — cannot import name 'Worker' from 'pgai'](https://github.com/timescale/pgai/issues/925)
- [Issue #915 — Outdated dependency versions](https://github.com/timescale/pgai/issues/915)
- [Issue #866 — Use ReRank models through Ollama](https://github.com/timescale/pgai/issues/866)
- [Issue #850 — llama.cpp embedding worker integration](https://github.com/timescale/pgai/issues/850)
- [Issue #920 — RaBitQ feature request](https://github.com/timescale/pgai/issues/920)
- [Issue #795 — OCR for PDF and DOCX](https://github.com/timescale/pgai/issues/795)
- [Issue #926 — semantic catalog text to sql stuck in refinement](https://github.com/timescale/pgai/issues/926)
</details>

# Embedding Providers, Model Integration & Known Issues

## Overview

pgai is designed as a thin orchestration layer between PostgreSQL and external AI services. Its core abstraction is the **vectorizer**, which reads a user-defined configuration, fetches data from a source (table or URI), chunks it, calls an embedding provider, and persists results in a destination view [Source: [README.md](https://github.com/timescale/pgai/blob/main/README.md)].

Embedding providers are pluggable. The `embedding` parameter of `ai.create_vectorizer` selects which provider is used at runtime. This page documents the supported providers, how integration is performed in SQL and Python, and the most prominent open issues observed in the community.

## Supported Embedding Providers

The `README.md` enumerates the providers that ship with the extension. Each is exposed as a SQL function and as a corresponding Python config helper:

| Provider | SQL entry point | Notes |
|---|---|---|
| Ollama | `ai.embedding_ollama` | Local OSS models via `ollama serve` |
| OpenAI | `ai.embedding_openai` | `text-embedding-ada-002`, `text-embedding-3-small`, etc. |
| Voyage AI | `ai.embedding_voyageai` | Domain-specific models (e.g., `voyage-finance-2`) |
| Cohere | routed through `ai.embedding_litellm` | `embed-english-v3.0` |
| HuggingFace | routed through `ai.embedding_litellm` | Serverless inference API |
| Mistral | routed through `ai.embedding_litellm` | `mistral-embed` |
| Azure OpenAI | routed through `ai.embedding_litellm` | Azure-deployed OpenAI |
| AWS Bedrock | routed through `ai.embedding_litellm` | Amazon-hosted models |
| Vertex AI | routed through `ai.embedding_litellm` | GCP-hosted models |

Source: [README.md — "Supported embedding models"](https://github.com/timescale/pgai/blob/main/README.md).

In Python, the matching config objects (`EmbeddingOllamaConfig`, `EmbeddingOpenAIConfig`, etc.) build the SQL `ai.create_vectorizer` statement. The simple FastAPI example shows Ollama and OpenAI being instantiated this way [Source: [examples/simple_fastapi_app/README.md](https://github.com/timescale/pgai/blob/main/examples/simple_fastapi_app/README.md)].

### Provider Architecture

```mermaid
flowchart LR
    A[Application INSERT/UPDATE] --> B[(PostgreSQL source table)]
    B --> C[ai.create_vectorizer config]
    C --> D[ai._vectorizer_queue]
    D --> E[Vectorizer Worker]
    E --> F{Embedding Provider}
    F --> G[Ollama]
    F --> H[OpenAI]
    F --> I[Voyage AI]
    F --> J[LiteLLM → Cohere/Bedrock/Vertex/Azure]
    G --> K[(Destination view)]
    H --> K
    I --> K
    J --> K
    K --> L[App reads via pgvector cosine distance]
```

The worker is stateless and decoupled from application writes, so an outage at the provider does not block DML [Source: [README.md — "Basic Architecture"](https://github.com/timescale/pgai/blob/main/README.md)].

## Evaluation Recipes

Three evaluation examples demonstrate provider selection with identical scaffolding:

- **LiteLLM** uses Cohere `embed-english-v3.0`, Mistral, and OpenAI `text-embedding-3-small` against Paul Graham essays [Source: [examples/evaluations/litellm_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/litellm_vectorizer/README.md)].
- **Voyage** compares OpenAI `text-embedding-3-small` (768 dim) against Voyage `voyage-finance-2` (1024 dim) on SEC filings [Source: [examples/evaluations/voyage_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/voyage_vectorizer/README.md)].
- **Ollama** compares Nomic `nomic-embed-text`, BGE Large, and OpenAI Large models [Source: [examples/evaluations/ollama_vectorizer/README.md](https://github.com/timescale/pgai/blob/main/examples/evaluations/ollama_vectorizer/README.md)].

All three follow the same workflow: load a HuggingFace dataset via `ai.load_dataset`, create one vectorizer per provider, then run identical question-generation/retrieval pipelines. This makes A/B model evaluation a matter of swapping one configuration parameter.

The document-loading path (`ai.loading_uri`) extends embedding beyond text columns to PDFs, DOCX, MD, HTML, and XLSX, demonstrated in [examples/embeddings_from_documents/README.md](https://github.com/timescale/pgai/blob/main/examples/embeddings_from_documents/README.md).

## Known Issues & Community Pain Points

### Connection Leaks in Async Embedders

`AsyncOpenAI`, `Ollama`, and `VoyageAI` embedders create HTTP clients that are never explicitly closed. Under sustained load, sockets accumulate in `CLOSE_WAIT` until the OS raises `Too many open files`. The recommended workaround is to manage HTTP client lifecycle externally rather than relying on garbage collection [Source: [Issue #919](https://github.com/timescale/pgai/issues/919)].

### Ollama Version Compatibility

`ai.ollama_embed` breaks on Ollama server versions above 13.3, affecting extension 0.11.2 and library 0.12.1 on PostgreSQL 17 / Ubuntu 24.04. The breakage appears to be caused by changes in the Ollama API response shape [Source: [Issue #921](https://github.com/timescale/pgai/issues/921)].

### Worker Import Regression

After `pip install "pgai[vectorizer-worker]"`, users receive `cannot import name 'Worker' from 'pgai'`. The expected import path changed and the docs have not yet caught up [Source: [Issue #925](https://github.com/timescale/pgai/issues/925)].

### Outdated Dependency Pins

`litellm` is pinned to `>=1.65.0,<1.73.0` (current upstream: 1.81.5), and `openai` to `>=1.44,<2.0` (current upstream: 2.16.0). Projects on newer transitive dependencies collide and cannot install pgai side-by-side [Source: [Issue #915](https://github.com/timescale/pgai/issues/915)].

### Semantic Catalog Refinement Loop

When the Semantic Catalog text-to-SQL flow is enabled, generation can iterate until the configured iteration cap is exhausted, then raise an exception. This is tracked as a refinement-loop bug rather than a provider bug [Source: [Issue #926](https://github.com/timescale/pgai/issues/926)].

## Open Feature Requests

Several provider-related enhancements remain open:

- **ReRank through Ollama** — only embedding is currently exposed; `qwen-reranker`-style models are unsupported [Source: [Issue #866](https://github.com/timescale/pgai/issues/866)].
- **llama.cpp embedding worker** — needs an OpenAI-compatible base URL override; the existing `embedding_openai` path does not work with llama.cpp's tokenizer [Source: [Issue #850](https://github.com/timescale/pgai/issues/850)].
- **RaBitQ quantization** — large-dimension embeddings (e.g., 8192) need compact storage formats beyond the current Q1–Q16 options [Source: [Issue #920](https://github.com/timescale/pgai/issues/920)].
- **OCR for PDF/DOCX** — `PARSING_AUTO` should expose an OCR toggle via docling for scanned documents [Source: [Issue #795](https://github.com/timescale/pgai/issues/795)].

## See Also

- [Vectorizer Overview](/docs/vectorizer/overview.md)
- [Vectorizer API Reference](/docs/vectorizer/api-reference.md)
- [Vectorizer Worker Guide](/docs/vectorizer/worker.md)
- [Semantic Catalog](/docs/semantic_catalog/README.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: timescale/pgai

Summary: Found 13 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/926

## 2. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/915

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/921

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/925

## 5. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/866

## 6. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/timescale/pgai

## 7. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/919

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/timescale/pgai

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/timescale/pgai

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/timescale/pgai

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/850

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/timescale/pgai

## 13. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/timescale/pgai

<!-- canonical_name: timescale/pgai; human_manual_source: deepwiki_human_wiki -->