Doramagic Project Pack · Human Manual
pgai
A suite of tools to develop RAG, semantic search, and other AI applications more easily with PostgreSQL
Overview & System Architecture
Related topics: Vectorizer Pipeline & Worker, Semantic Catalog (Text-to-SQL), Embedding Providers, Model Integration & Known Issues
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Vectorizer Pipeline & Worker, Semantic Catalog (Text-to-SQL), Embedding Providers, Model Integration & Known Issues
Overview & System Architecture
Purpose and Scope
pgai is a Python library and PostgreSQL extension that turns PostgreSQL into the retrieval engine behind production-ready Retrieval-Augmented Generation (RAG) and Agentic applications. It does this by automatically creating and maintaining vector embeddings for data stored in PostgreSQL using a configurable component called a vectorizer.
Source: README.md
The project is organized into three primary deliverables:
| Deliverable | Description |
|---|---|
pgai Python library | Installs the PostgreSQL extension objects and provides a Python SDK (e.g., Worker, CreateVectorizer) |
ai PostgreSQL extension | Provides SQL-callable functions such as ai.create_vectorizer, ai.embedding_openai, ai.ollama_generate, ai.text_to_sql, and ai.create_semantic_catalog |
| Vectorizer worker | A stateless process that reads vectorizer configuration, processes the work queue, calls embedding providers, and writes results back into the database |
Source: README.md, examples/simple_fastapi_app/README.md
pgai works with any PostgreSQL database, including Timescale Cloud, Amazon RDS (a recurring community request tracked in issue #304), Supabase, and self-hosted installations.
High-Level Architecture
The system is designed around three roles: the application, the PostgreSQL database, and one or more stateless vectorizer workers. The application defines a vectorizer configuration that specifies how to embed data from sources like PostgreSQL columns or remote URIs (e.g., S3). Workers poll the database, consume work items, and write resulting embeddings back. The application then queries these embeddings to power semantic search, RAG, and text-to-SQL.
Source: README.md
flowchart LR
A[Application<br/>INSERT / UPDATE / DELETE] --> B[(PostgreSQL<br/>ai schema)]
B <--> C[Vectorizer Worker<br/>stateless, scalable]
C --> D[Embedding Provider<br/>OpenAI / Ollama / Voyage / Cohere / ...]
D --> C
C --> B
A --> E[Semantic Search / RAG<br/>using pgvector and pgvectorscale]
E --> BThe decoupling of write operations from embedding generation is the central resilience property of the architecture: if an embedding provider is slow or failing, application writes to the source table still succeed, and the worker simply retries queued work later.
Source: README.md
Core Components
The Vectorizer Pipeline
A vectorizer is a configurable, sequential pipeline. Each stage is composed independently and is mapped 1:1 with an ai.* SQL helper used in the ai.create_vectorizer call:
- Loading — defines the data source: either a column of the source table (
ai.loading_column) or a URI pointing to a file or S3 object (ai.loading_uri). - Parsing — for non-text documents (PDF, HTML, Markdown, XLSX), defines how to extract text.
- Chunking — defines how text is split into chunks.
- Formatting — defines how each chunk is decorated before embedding (e.g., prepending the document title).
- Embedding — specifies the LLM provider, model, and parameters (e.g.,
ai.embedding_openai,ai.embedding_ollama,ai.embedding_voyageai, orai.embedding_litellmfor Cohere, HuggingFace, Mistral, Azure OpenAI, AWS Bedrock, and Vertex AI).
Source: README.md, examples/embeddings_from_documents/README.md
The Vectorizer Worker
The worker is a Python process that polls the database for queued work created by triggers or change events on vectorized tables. It can be invoked as:
- A one-shot process:
worker = Worker(DB_URL, once=True); worker.run()(suitable for batch backfills). - A continuously-running background process (CLI, Docker container, or in-process FastAPI lifespan).
- A horizontally-scaled set of replicas, since workers are stateless.
Source: README.md, examples/simple_fastapi_app/README.md
A known community issue (#925) reports that from pgai import Worker fails because the Worker class lives behind the vectorizer-worker extra; users must install pip install "pgai[vectorizer-worker]" correctly and import from the appropriate submodule.
Source: community context, issue #925
Model Calling from SQL
Beyond the vectorizer pipeline, pgai exposes SQL-callable LLM functions so users can do classification, summarization, or data enrichment directly in queries. Examples include ai.ollama_generate, ai.ollama_chat_complete, and provider-specific embedders. The RAG helper function in the project README composes similarity search with ai.ollama_embed against a *_embedding view and then calls ai.ollama_chat_complete to produce an answer.
Source: projects/extension/README.md
Semantic Catalog and Text-to-SQL
The semantic catalog lets users attach natural-language descriptions and example SQL statements to database objects. ai.create_semantic_catalog provisions vectorizers for the descriptions and configures the chat-completion provider used by ai.text_to_sql. The function performs a semantic search of the catalog, selects relevant objects and examples, and feeds them to an LLM to author SQL. A known bug (#926) reports that text-to-SQL can get stuck in refinement iterations until it exhausts the allowed budget.
Source: examples/text_to_sql/README.md, community context, issue #926
Typical Data Flow
- Install:
pgai.install(DB_URL)creates theaischema and extension objects. - Create a vectorizer: The application runs
ai.create_vectorizer(...)(orCreateVectorizer(...).to_sql()from Python), which writes a configuration row and installs triggers on the source table. - Insert / update source data: The application performs ordinary DML. Triggers enqueue work rows; the application's write latency is unaffected by embedding latency.
- Worker processes the queue: Workers claim pending rows, perform the loading → parsing → chunking → formatting → embedding pipeline, and write results into the destination (typically a view over a storage table).
- Query: The application runs pgvector similarity queries against the destination view, optionally composed with chat-completion functions for RAG or text-to-SQL.
Source: README.md, examples/simple_fastapi_app/README.md
Known Architectural Considerations
- Embedding provider reliability: Workers batch and retry, but prolonged outages can grow the queue. The project ships a load-test harness (
scripts/vectorizer-load-test/) that creates a ~1.5M-rowwikitable for stress testing. - HTTP client lifecycle in embedders: Community issue #919 reports that AsyncOpenAI, Ollama, and VoyageAI embedders can leak HTTP connections in
CLOSE_WAITover time, eventually exhausting file descriptors. Long-running deployments should monitor open FDs and reuse clients. - Version coupling between extension and library: The extension (currently
extension-0.11.2) and Python library (pgai-v0.12.x) are versioned and released independently; users should consult release notes when upgrading. - Outdated dependency pins: Issue #915 reports stale upper bounds on
litellm,openai, and similar libraries that conflict with newer ecosystems.
Source: scripts/vectorizer-load-test/README.md, community context, issues #915 and #919
See Also
Source: https://github.com/timescale/pgai / Human Manual
Vectorizer Pipeline & Worker
Related topics: Overview & System Architecture, Embedding Providers, Model Integration & Known Issues
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & System Architecture, Embedding Providers, Model Integration & Known Issues
Vectorizer Pipeline & Worker
Overview
The Vectorizer is the core component of pgai that automatically creates and maintains vector embeddings for data stored in PostgreSQL. It is designed to be resilient: data modifications performed by the application (INSERT, UPDATE, DELETE) are decoupled from the embedding process, ensuring that intermittent failures, rate limits, or latency spikes in LLM endpoints never affect primary data operations. Source: README.md
A vectorizer is a configurable pipeline defined through a single SQL call (ai.create_vectorizer) that runs against any PostgreSQL database, including Timescale Cloud, Amazon RDS, and Supabase. The vectorizer produces a destination view containing the original columns plus a vector embedding column and a chunk column holding the chunked text corresponding to each embedding. Source: README.md
The system consists of three logical parts: the application that defines a vectorizer configuration, the PostgreSQL database that stores configuration and queue state in the ai schema, and one or more stateless vectorizer workers that perform the actual embedding work. Source: README.md
Pipeline Architecture
A vectorizer defines a five-stage pipeline that is applied in sequence to source data. Each stage is independently configurable. Source: README.md
flowchart LR
A[Source Table<br/>or S3 URI] --> B[Loading]
B --> C[Parsing]
C --> D[Chunking]
D --> E[Formatting]
E --> F[Embedding]
F --> G[Destination View<br/>with embeddings]The pipeline stages are described in the table below.
| Stage | Responsibility | Source |
|---|---|---|
| Loading | Pulls data either from a column of the source table or from a URI (S3, file, etc.) referenced by a column | README.md |
| Parsing | Converts non-text documents (PDF, HTML, markdown) into text when the loader returned a URI | README.md |
| Chunking | Splits the text into smaller pieces sized appropriately for the embedding model | README.md |
| Formatting | Shapes each chunk before it is sent for embedding (e.g., prepend the document title) | README.md |
| Embedding | Calls the configured LLM provider (Ollama, OpenAI, Voyage AI, Cohere, HuggingFace, Mistral, Azure OpenAI, AWS Bedrock, Vertex AI) to produce a vector | README.md |
The pipeline is created with the SQL function ai.create_vectorizer, which accepts the source relation and stage-specific configuration values such as ai.loading_column(column_name=>'text'), ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536'), and ai.destination_table(view_name=>'wiki_embedding'). Source: README.md
The Python package ships convenience builders under projects/pgai/pgai/vectorizer/ (for example create_vectorizer.py) so that the same configuration can be expressed as Python dataclasses and translated to SQL with a .to_sql() call. Source: examples/simple_fastapi_app/README.md
Worker Operation and Deployment
The vectorizer worker is a stateless Python process that polls the database for pending items and executes the pipeline against them. A worker is instantiated with a database URL and started with worker.run(). Setting once=True causes the worker to drain the queue and exit, which is useful for batch jobs, cron schedules, or smoke tests. Source: README.md
A typical lifecycle inside an application looks like:
- Install the database objects with
pgai.install(DB_URL), which creates theaischema and its catalog tables. Source: examples/simple_fastapi_app/README.md - Define the vectorizer (e.g.,
ai.create_vectorizer(...)orCreateVectorizer(...).to_sql()). Source: examples/simple_fastapi_app/README.md - Run the worker either as a sidecar process, a CLI, or in a Docker container. Source: README.md
- Query the destination view with the pgvector cosine distance operator (
<=>) to perform semantic search. Source: examples/simple_fastapi_app/README.md
Because the worker is stateless, you can horizontally scale by running multiple workers. The ai.vectorizer_status view reports the live state of each vectorizer, and ai.vectorizer_errors surfaces failures that need operator attention. Source: examples/evaluations/litellm_vectorizer/README.md
Common Failure Modes and Operational Notes
Several recurring community-reported issues map directly to the vectorizer pipeline and worker.
- HTTP client connection leaks in embedders: The
AsyncOpenAI,Ollama, andVoyageAIembedders historically created HTTP clients that were never closed, exhausting file descriptors under sustained load. Long-running workers should be restarted on a schedule or the leak fixed at the embedder level. Source: community issue #919 in context. - Outdated dependency pins: Strict version constraints on
litellmandopenaiin the vectorizer worker prevent adoption alongside newer AI/ML libraries. Source: community issue #915 in context. - llama.cpp / OpenAI-compatible endpoints: Workers configured against OpenAI-compatible servers (e.g., llama.cpp) may fail at the
tiktokentokenization step because no matching tokenizer exists for the model; users either patch the tokenizer or run an OpenAI-compatible proxy. Source: community issue #850 in context. - Ollama version compatibility: Changes in the Ollama HTTP API above version 13.3 broke
ollama_embedin pgai extension 0.11.2 / library 0.12.1. Pinning Ollama to <= 13.3 or upgrading the extension is required. Source: community issue #921 in context. - Worker import path:
from pgai import Workerfails when installing withpip install "pgai[vectorizer-worker]"because the extra does not yet expose theWorkersymbol at the package root in some releases. Source: community issue #925 in context. - Empty-string batching: Documents with empty chunks historically caused pipeline failures; the worker now skips empty strings when batching. Source: release notes pgai v0.11.4 in context.
- Quantization and storage cost: For high-dimensional embeddings (e.g., 8,192-dim) the storage cost of
floatvectors is substantial; RaBitQ-style quantization is an open feature request tracked in community issue #920.
The repository also ships a load-test harness under scripts/vectorizer-load-test/ that materialises approximately 1.5M rows into a wiki table to exercise the worker at scale. Source: scripts/vectorizer-load-test/README.md
See Also
Source: https://github.com/timescale/pgai / Human Manual
Semantic Catalog (Text-to-SQL)
Related topics: Overview & System Architecture, Embedding Providers, Model Integration & Known Issues
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & System Architecture, Embedding Providers, Model Integration & Known Issues
Semantic Catalog (Text-to-SQL)
Overview and Purpose
The Semantic Catalog is a pgai subsystem that lets users pose a natural-language question in SQL and receive an executable SQL statement generated by a large language model (LLM). It augments PostgreSQL's built-in pg_catalog with a machine-readable layer of *descriptions* and *example queries* about schema objects, then uses semantic search to inject the most relevant descriptions into the LLM prompt. The motivation and original design are recorded in the long-standing feature request Issue #24 "Text-to-SQL", which asked whether embedding the pg_catalog could power text-to-SQL from inside the database.
The user-facing entry point is the ai.text_to_sql(...) SQL function. A working example from the demo database illustrates the intended behavior:
select ai.text_to_sql('How many flights arrived in Houston, TX in June 2024?');
returns a candidate query such as SELECT COUNT(*) ... FROM postgres_air.flight WHERE arrival_airport = 'IAH' ... Source: examples/text_to_sql/README.md:1-25. The catalog itself is provisioned through ai.create_semantic_catalog(...), which configures both the embedding model and the LLM that will perform SQL generation. The two providers do not have to be the same; the example uses OpenAI for both via ai.embedding_openai('text-embedding-3-small', 1024) and ai.text_to_sql_openai(model=>'o3-mini') Source: examples/text_to_sql/README.md:30-50.
Architecture and Data Flow
Conceptually the catalog stores three classes of object: schema object descriptions (tables, views, columns, functions), example question/SQL pairs, and short object-level summaries. pgai then attaches a Vectorizer to this catalog so that the embeddings of every description stay in sync as the schema changes, exactly the same way a regular vectorizer keeps embeddings in sync with application tables. This is documented as "pgai uses Vectorizers to create and maintain embeddings for each of the descriptions stored in the semantic catalog" Source: examples/text_to_sql/README.md:50-65.
At query time, ai.text_to_sql performs a semantic search against the catalog to identify the schema objects and example statements most relevant to the user question, then asks the configured LLM to author a SQL statement that uses those objects. The relevant results are provided to the LLM as context, so the LLM does not have to be retrained on the user's schema Source: examples/text_to_sql/README.md:55-75.
flowchart LR
A[Natural language question] --> B[ai.text_to_sql]
B --> C[Semantic search over<br/>Semantic Catalog]
C --> D[Top-k object descriptions<br/>+ example queries]
D --> E[Prompt assembly]
E --> F[LLM provider<br/>e.g. text_to_sql_openai]
F --> G[Candidate SQL]
G --> H[Optional refinement loop<br/>bounded by max iterations]
H -->|still failing| C
H -->|ok| I[Return SQL to caller]The refinement step is internal to ai.text_to_sql; the LLM may be re-prompted with diagnostic context if the generated SQL fails validation. Community evidence indicates that this loop is bounded, and an open bug (Issue #926) reports the function becoming "stuck in a refinement loop" until it "exhausts the number of allowed iterations and then throws an exception." This is the most prominent failure mode users encounter today.
Setup and Configuration
The semantic catalog is gated behind a feature flag and must be enabled before the extension is created. The demo bootstrap sequence is:
drop extension if exists ai cascade;
drop schema if exists ai cascade;
select set_config('ai.enable_feature_flag_text_to_sql', 'true', false);
create extension ai cascade;
Source: examples/text_to_sql/README.md:10-20. The catalog is then created in a single call:
select ai.create_semantic_catalog
( embedding => ai.embedding_openai('text-embedding-3-small', 1024)
, text_to_sql => ai.text_to_sql_openai(model=>'o3-mini')
);
Source: examples/text_to_sql/README.md:30-45. The embedding argument accepts any vectorizer-style embedding config, so the same provider-mixing described in the vectorizer overview applies: "any provider supported by Vectorizer" can be used for embeddings, and a different provider can be used for chat completions Source: examples/text_to_sql/README.md:35-50.
After creation, the catalog must be populated. The Generate Descriptions step in the demo walks the configured database, producing natural-language descriptions of tables, views, columns, and functions, and stores them in catalog tables. The Vectorizer then asynchronously embeds those rows. API keys are required at worker run time and follow the standard pgai conventions documented in projects/extension/docs/security/handling-api-keys.md Source: examples/text_to_sql/README.md:55-65.
The catalog is decoupled from the data pipeline, mirroring the broader pgai architecture described in the project README: "data modifications made by the application are decoupled from the embedding process, ensuring that failures in the embedding service do not affect the core data operations" Source: projects/extension/README.md:30-50. The Python integration uses the same CreateVectorizer builder pattern shown in the FastAPI example, where vectorizer definitions are produced as SQL statements and executed against the database Source: examples/simple_fastapi_app/README.md:30-50.
Known Issues and Limitations
| Symptom | Reference | Notes |
|---|---|---|
ai.text_to_sql exhausts the refinement iteration budget and throws | Issue #926 | The function has a bounded internal loop; pathological prompts can hit the cap. |
| Feature is off by default | Source: examples/text_to_sql/README.md:10-20 | Requires the ai.enable_feature_flag_text_to_sql GUC and a fresh create extension. |
| Provider coverage | Source: examples/text_to_sql/README.md:35-50 | Only providers with a corresponding ai.text_to_sql_<provider> config can author SQL; embedding and authoring providers are chosen independently. |
Practitioners should treat the generated SQL as a candidate, not a final answer, especially in environments with unusual schemas where the refinement loop is more likely to iterate to its limit.
See Also
Source: https://github.com/timescale/pgai / Human Manual
Embedding Providers, Model Integration & Known Issues
Related topics: Overview & System Architecture, Vectorizer Pipeline & Worker
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview & System Architecture, Vectorizer Pipeline & Worker
Embedding Providers, Model Integration & Known Issues
Overview
pgai is designed as a thin orchestration layer between PostgreSQL and external AI services. Its core abstraction is the vectorizer, which reads a user-defined configuration, fetches data from a source (table or URI), chunks it, calls an embedding provider, and persists results in a destination view Source: [README.md].
Embedding providers are pluggable. The embedding parameter of ai.create_vectorizer selects which provider is used at runtime. This page documents the supported providers, how integration is performed in SQL and Python, and the most prominent open issues observed in the community.
Supported Embedding Providers
The README.md enumerates the providers that ship with the extension. Each is exposed as a SQL function and as a corresponding Python config helper:
| Provider | SQL entry point | Notes |
|---|---|---|
| Ollama | ai.embedding_ollama | Local OSS models via ollama serve |
| OpenAI | ai.embedding_openai | text-embedding-ada-002, text-embedding-3-small, etc. |
| Voyage AI | ai.embedding_voyageai | Domain-specific models (e.g., voyage-finance-2) |
| Cohere | routed through ai.embedding_litellm | embed-english-v3.0 |
| HuggingFace | routed through ai.embedding_litellm | Serverless inference API |
| Mistral | routed through ai.embedding_litellm | mistral-embed |
| Azure OpenAI | routed through ai.embedding_litellm | Azure-deployed OpenAI |
| AWS Bedrock | routed through ai.embedding_litellm | Amazon-hosted models |
| Vertex AI | routed through ai.embedding_litellm | GCP-hosted models |
Source: README.md — "Supported embedding models".
In Python, the matching config objects (EmbeddingOllamaConfig, EmbeddingOpenAIConfig, etc.) build the SQL ai.create_vectorizer statement. The simple FastAPI example shows Ollama and OpenAI being instantiated this way Source: [examples/simple_fastapi_app/README.md].
Provider Architecture
flowchart LR
A[Application INSERT/UPDATE] --> B[(PostgreSQL source table)]
B --> C[ai.create_vectorizer config]
C --> D[ai._vectorizer_queue]
D --> E[Vectorizer Worker]
E --> F{Embedding Provider}
F --> G[Ollama]
F --> H[OpenAI]
F --> I[Voyage AI]
F --> J[LiteLLM → Cohere/Bedrock/Vertex/Azure]
G --> K[(Destination view)]
H --> K
I --> K
J --> K
K --> L[App reads via pgvector cosine distance]The worker is stateless and decoupled from application writes, so an outage at the provider does not block DML Source: [README.md — "Basic Architecture"].
Evaluation Recipes
Three evaluation examples demonstrate provider selection with identical scaffolding:
- LiteLLM uses Cohere
embed-english-v3.0, Mistral, and OpenAItext-embedding-3-smallagainst Paul Graham essays Source: [examples/evaluations/litellm_vectorizer/README.md]. - Voyage compares OpenAI
text-embedding-3-small(768 dim) against Voyagevoyage-finance-2(1024 dim) on SEC filings Source: [examples/evaluations/voyage_vectorizer/README.md]. - Ollama compares Nomic
nomic-embed-text, BGE Large, and OpenAI Large models Source: [examples/evaluations/ollama_vectorizer/README.md].
All three follow the same workflow: load a HuggingFace dataset via ai.load_dataset, create one vectorizer per provider, then run identical question-generation/retrieval pipelines. This makes A/B model evaluation a matter of swapping one configuration parameter.
The document-loading path (ai.loading_uri) extends embedding beyond text columns to PDFs, DOCX, MD, HTML, and XLSX, demonstrated in examples/embeddings_from_documents/README.md.
Known Issues & Community Pain Points
Connection Leaks in Async Embedders
AsyncOpenAI, Ollama, and VoyageAI embedders create HTTP clients that are never explicitly closed. Under sustained load, sockets accumulate in CLOSE_WAIT until the OS raises Too many open files. The recommended workaround is to manage HTTP client lifecycle externally rather than relying on garbage collection Source: [Issue #919].
Ollama Version Compatibility
ai.ollama_embed breaks on Ollama server versions above 13.3, affecting extension 0.11.2 and library 0.12.1 on PostgreSQL 17 / Ubuntu 24.04. The breakage appears to be caused by changes in the Ollama API response shape Source: [Issue #921].
Worker Import Regression
After pip install "pgai[vectorizer-worker]", users receive cannot import name 'Worker' from 'pgai'. The expected import path changed and the docs have not yet caught up Source: [Issue #925].
Outdated Dependency Pins
litellm is pinned to >=1.65.0,<1.73.0 (current upstream: 1.81.5), and openai to >=1.44,<2.0 (current upstream: 2.16.0). Projects on newer transitive dependencies collide and cannot install pgai side-by-side Source: [Issue #915].
Semantic Catalog Refinement Loop
When the Semantic Catalog text-to-SQL flow is enabled, generation can iterate until the configured iteration cap is exhausted, then raise an exception. This is tracked as a refinement-loop bug rather than a provider bug Source: [Issue #926].
Open Feature Requests
Several provider-related enhancements remain open:
- ReRank through Ollama — only embedding is currently exposed;
qwen-reranker-style models are unsupported Source: [Issue #866]. - llama.cpp embedding worker — needs an OpenAI-compatible base URL override; the existing
embedding_openaipath does not work with llama.cpp's tokenizer Source: [Issue #850]. - RaBitQ quantization — large-dimension embeddings (e.g., 8192) need compact storage formats beyond the current Q1–Q16 options Source: [Issue #920].
- OCR for PDF/DOCX —
PARSING_AUTOshould expose an OCR toggle via docling for scanned documents Source: [Issue #795].
See Also
Source: https://github.com/timescale/pgai / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 13 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/926
2. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/915
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/921
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/925
5. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: Project evidence flags a capability evidence risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/866
6. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/timescale/pgai
7. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/919
8. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/timescale/pgai
9. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/timescale/pgai
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/timescale/pgai
11. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/timescale/pgai/issues/850
12. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/timescale/pgai
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using pgai with real data or production workflows.
- [[Bug]: semantic catalog text to sql stuck in refinement iterations](https://github.com/timescale/pgai/issues/926) - github / github_issue
- [[Feature]: Use ReRank models (i.e. qwen-reranker) through Ollama](https://github.com/timescale/pgai/issues/866) - github / github_issue
- cannot import name 'Worker' from 'pgai' - github / github_issue
- [[Bug]: ollama_embed error on ollama higher 13.3](https://github.com/timescale/pgai/issues/921) - github / github_issue
- [[Feature]: RaBitQ](https://github.com/timescale/pgai/issues/920) - github / github_issue
- HTTP client connection leak in embedders causes file descriptor exhausti - github / github_issue
- Outdated dependency versions are blocking adoption - github / github_issue
- [[Feature]: OCR for PDF and DOCX](https://github.com/timescale/pgai/issues/795) - github / github_issue
- [[Feature]: llama.cpp embedding worker integration](https://github.com/timescale/pgai/issues/850) - github / github_issue
- extension-0.11.2 - github / github_release
- pgai: v0.12.1 - github / github_release
- pgai: v0.12.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence