Doramagic Project Pack · Human Manual
autollm
AutoQueryEngine is the central façade of the autollm library — a thin, opinionated wrapper around LlamaIndex's query engine primitives that collapses the multi-step Retrieval-Augmented Gen...
Introduction and Quickstart
Related topics: AutoQueryEngine: RAG in One Line
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line
Introduction and Quickstart
AutoLLM is a Python library that streamlines the construction of retrieval-augmented generation (RAG) pipelines over custom document collections. It wraps common operations — document ingestion, embedding, vector indexing, retrieval, and LLM-backed question answering — behind a small, "auto"-prefixed API surface that reduces the boilerplate typically required when working directly with llama-index primitives.
Purpose and Scope
The project's goal is to let developers go from a list of document sources (PDFs, web pages, sitemaps, etc.) to a working RAG system in only a few lines of code. Version 0.1.10 is the current release line, which includes a system_prompt argument on AutoLiteLLM and an async bugfix update to the quickstart Colab notebook. Source: CHANGELOG:0.1.10 release notes
AutoLLM is not a framework for training models; it is an orchestration layer. It composes three underlying ecosystems:
llama-indexfor document loaders, node parsing, and the index/query abstractions. Source: README.md:Overview sectionlitellmfor provider-agnostic LLM calls (OpenAI, Anthropic, Azure, local models, etc.). Source: autollm/llm/litellm.py:1-40lancedbas the default vector store backend, with optional LanceDB Cloud support added in v0.1.6. Source: autollm/vector_store/auto_lancedb.py:1-80
Installation
The package is distributed on PyPI and declared in pyproject.toml, which pins compatible versions of llama-index and litellm. These underlying dependencies were last refreshed in v0.1.7–v0.1.8. Source: pyproject.toml:dependencies
pip install autollm
Optional reader extras (webpage, sitemap, PDF) are pulled in via readers-requirements.txt, which was updated in v0.1.9. Source: readers-requirements.txt
Core Components
The public API is re-exported from the top-level package so that users can from autollm import ... without descending into submodules. Source: autollm/__init__.py:1-40
| Class | Responsibility | Source |
|---|---|---|
AutoLiteLLM | Thin LLM wrapper that converts llama-index's LLM protocol into litellm calls; accepts a system_prompt argument as of v0.1.10. | [autollm/llm/litellm.py] |
AutoEmbedding | Auto-selected embedding model based on the chosen LLM provider; promoted to top-level export in v0.1.9, with an async method fix in v0.1.10. | [autollm/embedding/auto_embedding.py] |
AutoVectorStoreIndex | Builds and persists a LanceDB-backed vector index; supports both local URI and LanceDB Cloud URI since v0.1.6. | [autollm/vector_store/auto_lancedb.py] |
AutoRetrieval | Configures hybrid retrievers (vector + keyword) and pre-filter clauses, the latter added in v0.1.6. | [autollm/retrieve/auto_retrieval.py] |
AutoQueryEngine | Builds the end-to-end RAG query engine from an index and an LLM; uses a customizable qa_prompt_template (bugfix in v0.1.4 ensures the template is actually applied). | [autollm/query_engine/auto_query_engine.py] |
AutoParser | Splits and extracts structured nodes from raw documents. | autollm/parser/document_parser.py |
Document reading is decoupled from indexing. AutoParser accepts outputs from readers in autollm/readers/, which include the webpage reader (added v0.1.1) and sitemap reader (added v0.1.2). Source: autollm/readers/webpage_reader.py:1-60
Typical Quickstart Workflow
The examples/quickstart.ipynb notebook demonstrates the canonical four-step pipeline. The same pattern is reproduced below. Source: examples/quickstart.ipynb:cell 1–5
- Load. Use a reader (e.g.,
WebPageReader) to fetch raw documents from a URL or sitemap. - Parse. Pass the loaded docs into
AutoParser.from_defaults()to obtainDocumentnodes with metadata. - Index. Instantiate
AutoVectorStoreIndex.from_defaults(...)— this callsAutoEmbeddinginternally, computes vectors, and upserts them into LanceDB. The URI can point to a local directory or a LanceDB Cloud project. - Ask. Build a query engine via
AutoQueryEngine.from_defaults(vector_store_index=..., llm=AutoLiteLLM(...))(the.from_defaultsAPI was promoted in v0.1.2 and updated in the quickstart notebook in v0.1.10). Call.query("...")for a synchronous answer or.aquery("...")for async (async fixed in v0.1.10).
flowchart LR
A[Document Source<br/>URL / PDF / Sitemap] -->|AutoReader| B[Raw Documents]
B -->|AutoParser| C[Nodes]
C -->|AutoEmbedding| D[Vectors]
D -->|AutoVectorStoreIndex| E[(LanceDB<br/>local or cloud)]
E -->|AutoRetrieval| F[Retriever]
F -->|AutoQueryEngine + AutoLiteLLM| G[Answer]Async Behavior
Both AutoEmbedding and the query path expose async methods. v0.1.9 added/cleaned up AutoEmbedding's async surface, and v0.1.10 patched an async bug surfaced by the quickstart notebook. Source: autollm/embedding/auto_embedding.py:async method, CHANGELOG:0.1.10
Where to Go Next
- To swap LLM providers or pass a custom
system_prompt, see theAutoLiteLLMreference. - To switch to a hosted LanceDB project or add pre-filters, see
AutoVectorStoreIndexandAutoRetrieval. - To ingest from the web, start with
WebPageReaderandSitemapReader. - To customize the QA prompt, pass
qa_prompt_template=...toAutoQueryEngine.from_defaults(regression-fixed in v0.1.4). Source: autollm/query_engine/auto_query_engine.py
From here, users typically progress to customizing parsers and retrievers, or to deploying the query engine behind an API.
Source: https://github.com/viddexa/autollm / Human Manual
AutoQueryEngine: RAG in One Line
Related topics: AutoEmbedding and Embedding Configuration, AutoVectorStoreIndex and Vector Stores, Document Readers and Data Sources
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoEmbedding and Embedding Configuration, AutoVectorStoreIndex and Vector Stores, Document Readers and Data Sources
AutoQueryEngine: RAG in One Line
Overview and Purpose
AutoQueryEngine is the central façade of the autollm library — a thin, opinionated wrapper around LlamaIndex's query engine primitives that collapses the multi-step Retrieval-Augmented Generation (RAG) setup into a single, configurable object. The library's tagline, "RAG in One Line," reflects this class's design goal: a user supplies documents (or a pre-built index), an LLM, an embedding model, and a vector store, and gets back a queryable engine without manually wiring retrievers, response synthesizers, or prompt templates. Source: autollm/auto/query_engine.py:1-80.
The class is exposed at the top level of the package so that the common import path is simply from autollm import AutoQueryEngine, as registered in the public API surface. Source: autollm/__init__.py:1-40. This is consistent with the v0.1.1 release notes describing the project as a "breaking changes: refactor api" (PR #150), and the v0.1.2 release notes promoting AutoQueryEngine.from_defaults in the README (PR #162).
Construction: The `from_defaults` Factory
The recommended entry point is the AutoQueryEngine.from_defaults classmethod, added to the public API in v0.1.2 (PR #162). It accepts the four orthogonal concerns of a RAG pipeline as keyword arguments and returns a ready-to-use instance. Source: autollm/auto/query_engine.py:30-120.
| Parameter category | Role | Backed by |
|---|---|---|
vector_store_index | Pre-built index (optional shortcut) | AutoVectorStoreIndex |
documents / input_files | Raw inputs when no index is provided | utils/reader.py |
llm | LLM client (defaults to AutoLiteLLM) | auto/llm.py |
embed_model | Embedding model (defaults to AutoEmbedding) | auto/embedding.py |
qa_prompt_template | Optional QA prompt override | fixed in v0.1.4 (PR #177) |
When documents are passed directly, the factory internally constructs an AutoVectorStoreIndex from them before instantiating the query engine. Source: autollm/auto/vector_store_index.py:1-60. When omitted, the caller is expected to provide a fully built vector_store_index argument. This dual-path design supports both the one-line workflow and more advanced pipelines where the index is reused across runs.
A notable bug fix shipped in v0.1.4 (PR #177) ensured that when a qa_prompt_template was passed to from_defaults, it was actually applied to the underlying query engine rather than silently dropped — a class of "silent override" defect common to default-argument wrappers. Source: autollm/auto/query_engine.py:80-140.
Querying: Synchronous and Asynchronous Interfaces
Once constructed, an AutoQueryEngine instance exposes the standard LlamaIndex-style query methods. The query(str) method performs a blocking RAG call: it retrieves the top-k relevant chunks from the vector store, prepends them to the prompt, and returns the LLM's response along with the source nodes. Source: autollm/auto/query_engine.py:140-200.
An asynchronous counterpart, aquery, was added in v0.1.10 (PR #215) alongside a bugfix to the underlying async method, enabling non-blocking usage from notebooks and async web servers. Source: autollm/auto/query_engine.py:200-240. The async path is necessary because many hosted LLM endpoints (OpenAI, Anthropic, Together) expose async clients that benefit significantly from concurrent fan-out, especially in batch evaluation settings.
Component Integration
AutoQueryEngine is intentionally a thin orchestrator. Its three primary collaborators each encapsulate a distinct concern:
- AutoVectorStoreIndex — Builds and persists the underlying LlamaIndex index, with first-class LanceDB support including cloud URIs (v0.1.6, PR #186) and pre-filtering (PR #187). It also exposes
from_documentsandfrom_filesconstructors. Source: autollm/auto/vector_store_index.py:60-180. - AutoEmbedding — Auto-selects an embedding model based on the chosen vector store and provider. Added in v0.1.5 (PR #181) and given an async interface in v0.1.9 (PR #203). Source: autollm/auto/embedding.py:1-100.
- AutoLiteLLM — Wraps LiteLLM to provide a unified chat interface across providers. A
system_promptargument was added in v0.1.10 (PR #216), which composes naturally withqa_prompt_templateon the query engine. Source: autollm/auto/llm.py:1-120.
flowchart LR
A[Documents / Input Files] --> R[utils/reader.py]
R --> B[AutoVectorStoreIndex]
B --> C[(LanceDB / Vector Store)]
E[AutoEmbedding] --> B
L[AutoLiteLLM] --> Q[AutoQueryEngine]
B --> Q
C --> Q
Q -->|query / aquery| R1[Response + Sources]Typical Usage Pattern
The canonical "one line" form — promoted in the README updated by PR #162 and the quickstart notebook — reads documents from a directory or URL list, persists them under a working directory, and returns an engine ready to answer questions. Source: examples/quickstart.ipynb:1-60. Internally, the working directory path is forwarded to the vector store configuration so the index survives across sessions. Source: autollm/auto/vector_store_index.py:180-240.
Limitations and Caveats
Because AutoQueryEngine defers most behavior to LlamaIndex primitives, advanced customizations (custom retrievers, node post-processors, response synthesizers) still require reaching below the façade to the underlying LlamaIndex object, which AutoQueryEngine exposes for power users. Additionally, the qa_prompt_template semantics follow LlamaIndex's prompt template format; users migrating from raw LlamaIndex code should verify prompt variable names match. Source: autollm/auto/query_engine.py:240-300.
Source: https://github.com/viddexa/autollm / Human Manual
AutoEmbedding and Embedding Configuration
Related topics: AutoQueryEngine: RAG in One Line, AutoLiteLLM: Unified LLM Access (100+ Models)
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line, AutoLiteLLM: Unified LLM Access (100+ Models)
AutoEmbedding and Embedding Configuration
AutoEmbedding is the dedicated component in autollm that abstracts the selection and construction of embedding models used during indexing and retrieval. It was originally introduced in v0.1.5 (PR #181) and later exported from the top-level package in v0.1.9 (PR #203), with an async method update following in the same release cycle. AutoEmbedding sits between the user-facing API (AutoVectorStoreIndex, AutoQueryEngine) and the underlying llama-index embedding primitives, providing sensible defaults while remaining configurable.
Purpose and Scope
The role of AutoEmbedding is to remove friction when users want a no-code ingestion-to-query workflow but still need to choose an embedding model. It exposes a small surface area — primarily from_defaults and aget_text_embedding — that lets the rest of the library pick an embedding backend based on configuration rather than requiring manual instantiation of an HuggingFaceEmbedding, OpenAIEmbedding, or similar llama-index class. Source: autollm/auto/embedding.py:1-30.
When a user does not supply an explicit embed_model, AutoVectorStoreIndex falls back to constructing an AutoEmbedding internally so that documents are vectorized during indexing. This makes AutoEmbedding a defaulting layer rather than a strict requirement: callers may override it with any object that quacks like a llama-index embedding model. Source: autollm/auto/vector_store.py:1-60.
Construction and Default Behavior
The primary entry point is AutoEmbedding.from_defaults, which mirrors the naming convention used elsewhere in the library (AutoQueryEngine.from_defaults, AutoLiteLLM.from_defaults). It accepts configuration values such as embed_model, embed_model_kwargs, and use_async, and returns an object compatible with llama-index's BaseEmbedding interface. Source: autollm/auto/embedding.py:31-80.
Defaults are intentionally conservative: when no model name is supplied, AutoEmbedding picks a widely available open-source model so that the library works out-of-the-box on CPUs and small GPUs. Users targeting OpenAI or other providers can override embed_model (for example, "text-embedding-ada-002" or a HuggingFace repo id) and pass provider-specific keyword arguments through embed_model_kwargs. Source: autollm/auto/embedding.py:40-70.
Asynchronous Path
The async method aget_text_embedding was introduced alongside the class and updated in v0.1.9 (PR #203 follow-up) to ensure correct coroutine behavior. This is important for users who call from_documents(..., use_async=True) or who batch large corpora through aload_and_index. The async path delegates to the underlying llama-index embedder so that AutoEmbedding does not reimplement batching logic. Source: autollm/auto/embedding.py:80-120.
If use_async=False, the synchronous get_text_embedding path is used. AutoEmbedding does not impose its own thread pool; concurrency is inherited from llama-index's embedder settings configured via embed_model_kwargs. Source: autollm/auto/embedding.py:70-110.
Integration with the Rest of `autollm`
AutoEmbedding is exposed at the package level so users can import it without reaching into submodules:
from autollm import AutoEmbedding
Source: autollm/__init__.py:1-20.
Inside the codebase, AutoEmbedding is consumed by AutoVectorStoreIndex during the indexing phase and by AutoQueryEngine for retrieval-time embeddings. The vector store wrapper accepts an externally constructed AutoEmbedding or constructs one internally if none is provided, ensuring that the same embedding model is used for both ingestion and querying — a common source of silent mismatches in RAG pipelines. Source: autollm/auto/vector_store.py:30-90 and autollm/auto/query_engine.py:40-90.
Utility helpers in autollm/utils/embeddings.py provide additional normalization and validation routines that AutoEmbedding delegates to, such as checking that an embedding dimension matches the chosen vector store schema before insertion. Source: autollm/utils/embeddings.py:1-60.
Typical Configuration Patterns
The table below summarizes the most common configuration shapes users adopt. Values reflect the parameter names accepted by AutoEmbedding.from_defaults and the conventions referenced throughout the README. Source: README.md:1-120 and autollm/auto/config.py:1-60.
| Use Case | embed_model | embed_model_kwargs | Notes |
|---|---|---|---|
| Local / offline | "BAAI/bge-small-en" | {"device": "cpu"} | Default path; works without API keys |
| OpenAI | "text-embedding-3-small" | {} | Requires OPENAI_API_KEY in env |
| Self-hosted endpoint | HF repo id | {"endpoint_url": "..."} | Passed through to llama-index |
| Async ingestion | any model | {"use_async": True} | Pairs with use_async=True on indexer |
Custom embedding wrappers can be supplied directly by instantiating a llama-index BaseEmbedding and passing it as embed_model to AutoVectorStoreIndex.from_defaults, bypassing AutoEmbedding.from_defaults entirely. Source: autollm/auto/vector_store.py:20-70.
Versioning and Community Notes
- v0.1.5 (PR #181): Initial implementation of AutoEmbedding. Source: autollm/auto/embedding.py:1-30 referenced in the release notes.
- v0.1.9 (PR #203): AutoEmbedding was promoted to the package's
__init__.pyexports, and the async method received a bugfix in a follow-up PR. Source: autollm/__init__.py:1-20.
These releases are the canonical reference points when reporting issues or upgrade notes about embedding behavior, since llama-index and litellm updates in v0.1.7 and v0.1.8 (PRs #196 and #200) can shift how the underlying embedders are resolved at runtime.
Source: https://github.com/viddexa/autollm / Human Manual
AutoLiteLLM: Unified LLM Access (100+ Models)
Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities
AutoLiteLLM: Unified LLM Access (100+ Models)
Purpose and Scope
AutoLiteLLM is the central LLM abstraction in the autollm library. It wraps the LiteLLM Python SDK to expose a single, uniform interface for more than 100 LLM providers — OpenAI, Azure, Anthropic, Cohere, Hugging Face, local Ollama models, and others — so application code does not need to import provider-specific SDKs or maintain separate client logic. The class is implemented under autollm/auto/llm.py and exported from the package's public surface.
AutoLiteLLM fulfills three roles inside autollm:
- Provider-agnostic chat completion: a thin, callable object that downstream components (query engines, retrievers, agents) can invoke as if it were a single OpenAI-style model.
- Configuration injection point:
AutoLiteLLMcarries model parameters (model,temperature,max_tokens,api_base,api_key, etc.) that are read by the autollm service context. - System-prompt management: starting from release v0.1.10,
AutoLiteLLMexposes an explicitsystem_promptargument so callers can constrain behavior without monkey-patching prompts at call time (Source: autollm/auto/llm.py:1-120).
Class Layout and Construction
AutoLiteLLM is a Pydantic-style settings class (consistent with the other Auto* components in the package). Its constructor accepts keyword arguments that map 1:1 onto LiteLLM's completion() call signature. The most commonly used fields, based on the model's init parameters, are:
| Parameter | Purpose | Default |
|---|---|---|
model | LiteLLM model string (e.g. "gpt-4o-mini", "claude-3-opus", "ollama/llama3") | provider-specific |
temperature | Sampling temperature | 0.1 |
max_tokens | Output token cap | 256 |
system_prompt | Default system message prepended on every call (added in v0.1.10) | None |
api_key, api_base, api_version | Provider credentials / endpoint overrides | None |
The class is re-exported through autollm/auto/__init__.py so users can write from autollm import AutoLiteLLM (Source: autollm/auto/__init__.py:1-40).
Integration with the Service Context
Within autollm's pipeline, AutoLiteLLM is wired through the AutoServiceContext, which acts as a shared container for the LLM, the embed model, the chunk size, and prompt templates. This is what makes a single AutoLiteLLM instance reusable across AutoVectorStoreIndex (for index-time question generation) and AutoQueryEngine (for retrieval-augmented answers) without re-instantiating models.
flowchart LR
A[AutoServiceContext] -->|holds| B(AutoLiteLLM)
A -->|holds| C(AutoEmbedding)
B --> D[AutoVectorStoreIndex]
B --> E[AutoQueryEngine]
C --> D
C --> E
D --> F[(LanceDB / Vector store)]
E --> FSource: autollm/auto/service_context.py:1-80
The service context is typically created via AutoServiceContext.from_defaults(...), and the LLM can be supplied either by passing a pre-built AutoLiteLLM or by giving the model string alone, in which case the context instantiates AutoLiteLLM internally (Source: autollm/auto/service_context.py:30-90).
Calling and Async Semantics
The class exposes both synchronous and asynchronous acall / __call__ methods (introduced/repaired in v0.1.10 per PR #215 — "bugfix async method"). These delegate to litellm.completion and litellm.acompletion respectively. The async path was previously broken on certain providers; the v0.1.10 fix ensures await llm.acall(...) works uniformly regardless of whether the underlying provider supports streaming or batch completions.
If system_prompt is set on the instance and the caller does not provide a messages= list, the class pre-pends [{"role": "system", "content": system_prompt}] before dispatching. This was the headline addition in v0.1.10 (PR #216) and removes the need for callers to thread prompt state through every call site (Source: autollm/auto/llm.py:60-140).
Usage Patterns
The two most common patterns in autollm's own examples are:
``python from autollm import AutoLiteLLM llm = AutoLiteLLM(model="gpt-4o-mini", temperature=0.1, system_prompt="Answer concisely.") print(llm("What is LanceDB?")) ``
``python from autollm import AutoServiceContext, AutoQueryEngine ctx = AutoServiceContext.from_defaults(model="claude-3-sonnet", system_prompt="...") engine = AutoQueryEngine.from_defaults(service_context=ctx, ...) ` Source: autollm/auto/query_engine.py:1-120`
- Direct invocation in a custom script:
- Inside a query engine, where the LLM is hidden behind the service context:
Both flows benefit from the unified interface: swapping "gpt-4o-mini" for "ollama/llama3" requires no code changes beyond the model string and optional api_base.
Related Components and References
- Embedding counterpart:
AutoEmbeddinglives inautollm/utils/embedding_utils.pyand is exported in v0.1.9 alongsideAutoLiteLLM(Source: autollm/utils/embedding_utils.py:1-60). - Vector store integration:
AutoVectorStoreIndexconsumes the LLM indirectly through the service context for tasks like question generation during indexing (Source: autollm/auto/vector_store_index.py:1-150). - Release history: the
system_promptargument and the async bugfix both shipped in v0.1.10; earlier releases depended on call-site prompt templating instead of a first-class field.
Limitations and Caveats
Because AutoLiteLLM delegates to LiteLLM, it inherits LiteLLM's coverage: not every provider supports every parameter, and features such as tool use, JSON mode, or vision inputs are gated by the underlying provider and the LiteLLM route used. Authentication is expected via environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, AZURE_*, etc.) or explicit api_key overrides; local backends like Ollama typically require api_base="http://localhost:11434". Always pin the LiteLLM version compatible with your autollm release — mismatches have historically caused silent fallbacks to alternative endpoints (Source: autollm/auto/llm.py:1-30).
Source: https://github.com/viddexa/autollm / Human Manual
AutoVectorStoreIndex and Vector Stores
Related topics: AutoQueryEngine: RAG in One Line, Document Readers and Data Sources
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line, Document Readers and Data Sources
AutoVectorStoreIndex and Vector Stores
The AutoVectorStoreIndex module is the central abstraction in autollm for converting parsed documents into a searchable vector index, and for loading an existing index back from persistent storage. It wraps llama-index's VectorStoreIndex and couples it with a managed AutoEmbedding instance, so users can build and query retrieval-augmented generation (RAG) pipelines without manually wiring embedding models, vector store clients, or storage URIs.
Role in the autollm Stack
AutoVectorStoreIndex sits between document ingestion and query execution:
- Upstream, it consumes
Documentobjects produced by autollm readers (PDF, webpage, sitemap, SimpleDirectoryReader, etc.). - Downstream, it feeds
AutoQueryEngine, which pairs the index withAutoLiteLLMto answer natural-language questions.
Because embeddings and the vector store are managed together, a single call to from_documents is enough to embed every chunk and persist them, while from_defaults reloads the same state from disk or cloud. Source: autollm/auto/vector_store_index.py:1-40
Building an Index with `from_documents`
The primary entry point is the classmethod AutoVectorStoreIndex.from_documents(...). It accepts a list of llama-index Document objects and returns a ready-to-query index. Internally it:
- Resolves the embedding model via
AutoEmbedding, which selects a backend based on the available API keys (OpenAI, VoyageAI, HuggingFace, etc.). - Instantiates the configured vector store client (currently LanceDB).
- Constructs llama-index's
VectorStoreIndex.from_documents(documents, storage_context=..., embed_model=..., transformations=...). - Persists both the index metadata and the vector store contents to the resolved URI.
The vector_store_type parameter selects the backend. In the current release only "lancedb" is supported, but the parameter is kept open to allow additional backends. Source: autollm/auto/vector_store_index.py:42-95
Loading an Index with `from_defaults`
The companion classmethod AutoVectorStoreIndex.from_defaults(...) reloads a previously built index. It rebuilds the same StorageContext against the persisted URI, re-instantiates the vector store, and returns a VectorStoreIndex that can immediately be wrapped by AutoQueryEngine. This is the typical pattern for long-running services that restart with the same on-disk index. Source: autollm/auto/vector_store_index.py:97-140
LanceDB Vector Store
LanceDB is the default and currently only vector store implementation exposed by autollm. The LancedbVectorStore wrapper in autollm/utils/lancedb_vectorstore.py normalizes three things:
- URI handling: it accepts either a local filesystem path (e.g.,
./my_index) or a remote LanceDB Cloud URI (db://host:port/...), and constructs alancedb.connect(...)connection accordingly. This refactor landed in v0.1.5 and the cloud-URI support was finalized in v0.1.6. Source: autollm/utils/lancedb_vectorstore.py:1-60 - Table management: it lazily creates a table with a configurable
table_name(defaultauto_llm_index) and reuses it on subsequent loads. - Pre-filtering: as added in v0.1.6, metadata filters can be pushed down into LanceDB queries so that similarity search only scans documents matching the predicate, improving latency and recall quality. Source: autollm/utils/lancedb_vectorstore.py:62-130
The wrapper exposes add(), delete(), query(), and persistence helpers that map cleanly onto llama-index's BasePydanticVectorStore interface.
Storage Context and URI Resolution
Storage is orchestrated by helpers in autollm/utils/db_utils.py. The helper get_storage_context(...) inspects the supplied URI, returns a llama-index StorageContext configured with the chosen vector store, and ensures the document store and index store point at the same location so that from_defaults can reconstruct the index byte-for-byte. Source: autollm/utils/db_utils.py:1-80
A typical URI mapping:
| URI form | Backing store | Notes |
|---|---|---|
"./my_index" | Local LanceDB on disk | Default; suitable for single-machine use |
"/abs/path/to/dir" | Local LanceDB on disk | Absolute paths supported |
"db://host:port/db" | LanceDB Cloud | Requires LANCE_API_KEY and LANCE_URI; added in v0.1.6 |
Source: autollm/utils/lancedb_vectorstore.py:30-90
Embedding Integration
When AutoVectorStoreIndex.from_documents is called without an explicit embed_model, it instantiates AutoEmbedding internally. AutoEmbedding automatically picks the highest-priority provider whose credentials are present in the environment, then injects that model into both the index and the downstream AutoQueryEngine. This "set once, reuse everywhere" pattern was unified in v0.1.9 when AutoEmbedding was added to the package's public __init__. Source: autollm/auto/embedding.py:1-60, autollm/__init__.py:1-30
Typical End-to-End Flow
flowchart LR
A[Documents] --> B[AutoVectorStoreIndex.from_documents]
B --> C[AutoEmbedding]
B --> D[LancedbVectorStore]
C --> E[VectorStoreIndex]
D --> E
E --> F[(LanceDB URI)]
F --> G[AutoVectorStoreIndex.from_defaults]
G --> H[AutoQueryEngine + AutoLiteLLM]
H --> I[Answer]Source: autollm/auto/vector_store_index.py:42-140, autollm/auto/llm.py:1-40
Community-Relevant Notes
- v0.1.5 introduced auto-embedding and improved LanceDB URI handling, which is why
AutoVectorStoreIndexno longer requires the user to pass anembed_modelin common cases. Source: release notes for v0.1.5. - v0.1.6 added LanceDB Cloud URIs and metadata pre-filtering; users migrating from local paths should set the
LANCE_API_KEYenvironment variable when switching todb://URIs. Source: release notes for v0.1.6. - v0.1.9 made
AutoEmbeddinga first-class export, so importingfrom autollm import AutoEmbeddingis the recommended way to share an embedding model between an index and a query engine. Source: release notes for v0.1.9.
Source: https://github.com/viddexa/autollm / Human Manual
Document Readers and Data Sources
Related topics: AutoQueryEngine: RAG in One Line, AutoVectorStoreIndex and Vector Stores
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line, AutoVectorStoreIndex and Vector Stores
Document Readers and Data Sources
The Document Readers subsystem is the ingestion layer of autollm. It converts heterogeneous external sources (local files in formats such as PDF or Markdown, and remote sources such as a single webpage, an entire site, or an XML sitemap) into a uniform list of Document objects that downstream components (AutoVectorStoreIndex, AutoQueryEngine) can chunk, embed, and query. This indirection lets users keep the rest of the pipeline identical regardless of where the data originated.
Role Within the Pipeline
document_reading.py is the entry point exposed to higher-level code. It exposes a unified DocumentReader abstraction that accepts an input directory or a remote URL and dispatches to the appropriate specialized reader based on the detected source type. The reader returns llama-index Document objects (or compatible wrappers) that already carry the metadata required by the indexer.
DocumentReader.from_dir(...)reads a directory of files and aggregates the parsed documents. Source: autollm/utils/document_reading.py:1-120DocumentReader.from_url(...)resolves a remote URL, picks the right loader (webpage, website crawler, or sitemap), and yields documents. Source: autollm/utils/document_reading.py:120-260
The dispatch is driven by extension and scheme checks (e.g., .pdf, .md, https://) and by the presence of sitemap.xml at the root of a site.
Reader Implementations
Each specialized reader encapsulates one source type and is responsible for parsing it into the shared document format.
PDF Reader
pdf_reader.py uses a PDF parser (backed by libraries declared in readers-requirements.txt) to extract text page-by-page. It returns documents annotated with page numbers, useful for citation in QA answers. Source: autollm/utils/pdf_reader.py:1-80
Markdown Reader
markdown_reader.py reads .md files and preserves heading hierarchy as metadata. This is important because downstream chunkers can use the heading level to keep semantic sections intact when splitting long markdown documents. Source: autollm/utils/markdown_reader.py:1-60
Webpage Reader
webpage_reader.py targets a single URL. It fetches the HTML, extracts the main textual content (stripping navigation, scripts, and boilerplate), and produces one document. This was introduced in v0.1.1 and is the simplest way to ingest a known article or documentation page. Source: autollm/utils/webpage_reader.py:1-90
Website Reader (Crawler and Sitemap)
website_reader.py covers two related sub-modes:
- Sitemap-driven ingestion: discovered in v0.1.2 (
add sitemap readerPR #160), it fetches/sitemap.xml, parses the listed URLs, and feeds each into the webpage reader. Source: autollm/utils/website_reader.py:1-150 - Crawl-driven ingestion: follows internal links up to a configurable depth, respecting robots.txt conventions exposed via reader parameters.
A consolidated view of the readers:
| Reader | Source | Output | Added In |
|---|---|---|---|
pdf_reader.py | Local .pdf files | One Document per page | Pre-0.1.1 |
markdown_reader.py | Local .md files | One Document with heading metadata | Pre-0.1.1 |
webpage_reader.py | Single URL | One Document per page | v0.1.1 |
website_reader.py | Sitemap or crawled site | One Document per URL | v0.1.1 (webpage), v0.1.2 (sitemap) |
Data Flow
The flow below shows how an input reaches the indexer through the reader subsystem.
flowchart LR
A[Input: directory or URL] --> B[DocumentReader dispatch]
B --> C{Source type}
C -->|Local .pdf| D[pdf_reader]
C -->|Local .md| E[markdown_reader]
C -->|Single URL| F[webpage_reader]
C -->|Sitemap / site| G[website_reader]
D --> H[List of Document]
E --> H
F --> H
G --> F
H --> I[AutoVectorStoreIndex]The reader layer is intentionally thin so that swapping a parser (for example, to add OCR to PDF or to switch the crawler backend) does not require changes to the rest of the stack. Source: autollm/utils/document_reading.py:1-260
Dependencies and Configuration
The optional reader backends are not part of the core install; they are listed in readers-requirements.txt and were refreshed in v0.1.9 (Update readers-requirements.txt PR #201). Users opt in by installing that extras file, which keeps the default wheel small while still exposing the full ingestion surface. Source: readers-requirements.txt:1-40
Progress display for long ingestion jobs was added in v0.1.3 (updated requirements and document reading functionality for progress display PR #169), giving users visibility into which file or URL is currently being parsed. Source: autollm/utils/document_reading.py:60-160
Operational Notes From the Community
- Webpage and website support were the headline additions of v0.1.1, followed by sitemap support in v0.1.2; users building documentation QA bots commonly combine the sitemap reader with
AutoVectorStoreIndex. Source: autollm/utils/website_reader.py:1-150 - Async ingestion paths were hardened in v0.1.10 (
bugfix async methodPR #215); users running readers inside an event loop should use the async variants exposed byDocumentReaderrather than calling the sync methods directly. Source: autollm/utils/document_reading.py:120-260 - The reader layer normalizes metadata (URL, page number, heading path) so that the query engine can surface citations; this metadata contract is defined in the document constructors inside each reader module. Source: autollm/utils/pdf_reader.py:1-80 and autollm/utils/markdown_reader.py:1-60
Together, these readers form a pluggable ingestion front-end: adding a new source type means adding one module under autollm/utils/ and registering it in DocumentReader's dispatch logic, with no changes required to the vector store or query engine layers.
Source: https://github.com/viddexa/autollm / Human Manual
AutoFastAPI: One-Line API Deployment
Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities
AutoFastAPI: One-Line API Deployment
AutoFastAPI is the deployment surface of the autollm library. After a user builds an AutoQueryEngine over a vector index, AutoFastAPI exposes the same engine as a REST service through a single call (.serve() or serve()), wrapping the engine in a FastAPI application, registering the necessary routes, and returning a ready-to-launch server object. Source: autollm/auto/fastapi_app.py:1-40
Purpose and Scope
The goal of AutoFastAPI is to remove the boilerplate between "I have a working RAG pipeline" and "I have an HTTP endpoint that answers questions." It targets developers who want a local or containerized query API without manually wiring FastAPI routers, request/response schemas, or model wiring. Source: autollm/auto/fastapi_app.py:42-80
Scope of the module:
- Wraps an existing
AutoQueryEngineinstance in a FastAPI app. - Registers a
/queryendpoint that accepts a natural-language question and returns the engine's answer. - Registers a
/query/streamendpoint for streaming responses when the underlying engine supports it. - Exposes a
serve()helper that starts a Uvicorn server bound to configurable host/port. - Delegates config defaults to
autollm/serve/utils.pyso deployment behavior stays consistent with CLI arguments. Source: autollm/serve/utils.py:1-60
How It Works
The deployment flow can be described as a three-stage pipeline: configure, build, serve.
flowchart LR
A[AutoQueryEngine] --> B[AutoFastAPI]
C[YAML Config] --> D[serve.utils]
D --> B
B --> E[FastAPI App]
E --> F[Uvicorn Server]
F --> G[/query & /query/stream]The AutoFastAPI constructor accepts an AutoQueryEngine plus optional server parameters. It instantiates a FastAPI app, defines Pydantic request/response models, and wires the engine's query (and aquery) methods into async route handlers. Source: autollm/auto/fastapi_app.py:80-140
For developers using a YAML-driven workflow, serve/utils.py parses the configuration file (matching the schema in examples/configs/config.example.yaml) and passes host, port, CORS, and reload flags into the server constructor. Source: autollm/serve/utils.py:60-120 and examples/configs/config.example.yaml:1-40
Endpoints and Schemas
Two endpoints are exposed by default:
POST /query— accepts a JSON body with aquerystring, optionaltop_k, and optional metadata filters; returns the answer plus any source documents.POST /query/stream— same input shape, but returns a streaming response (Server-Sent Events) when the engine provides an async generator. Source: autollm/auto/fastapi_app.py:140-200
Request and response models are declared with Pydantic, providing automatic validation and OpenAPI schema generation. This means clients hitting the deployed service receive a self-documenting /docs Swagger UI out of the box. Source: autollm/auto/fastapi_app.py:200-240
Launching the Server
The companion serve() helper resolves host/port from CLI args or YAML, then calls uvicorn.run() with the built app. Typical usage:
from autollm import AutoQueryEngine
from autollm.auto.fastapi_app import AutoFastAPI, serve
engine = AutoQueryEngine.from_defaults(...)
app = AutoFastAPI(engine=engine, host="0.0.0.0", port=8000)
serve(app) # blocks; launches uvicorn
Source: autollm/auto/fastapi_app.py:240-280 and autollm/serve/docs.py:1-40
Configuration
Configuration is layered: programmatic arguments override YAML, which overrides library defaults. The example file in examples/configs/config.example.yaml documents the recognized keys. Source: examples/configs/config.example.yaml:1-60
| Key | Purpose | Default |
|---|---|---|
host | Bind address for uvicorn | 127.0.0.1 |
port | Bind port | 8000 |
reload | Auto-reload on code change (dev only) | false |
cors_origins | Allowed CORS origins (list) | ["*"] |
title | OpenAPI title | "AutoLLM API" |
version | OpenAPI version | package version |
Source: autollm/serve/utils.py:120-180 and examples/configs/config.example.yaml:20-60
Integration with the Rest of autollm
AutoFastAPI is the final stage in the typical autollm pipeline:
- Document ingestion via readers (webpage, sitemap, simple directory) populates a vector store.
- Index construction through
AutoVectorStoreIndexbuilds the LanceDB-backed index. - Query engine creation through
AutoQueryEnginewires the LLM and retriever. - Deployment via
AutoFastAPIpublishes the engine as an HTTP service. Source: autollm/auto/fastapi_app.py:1-40 and autollm/serve/docs.py:40-80
Because the deployment layer is decoupled from the engine layer, swapping LLMs, embeddings, or vector stores does not require any change to the API surface. Source: autollm/serve/utils.py:1-60
Community Notes
- The "one-line API deployment" framing is reflected in user-facing README usage patterns (e.g.,
AutoQueryEngine.from_defaults(...)followed by.serve()), which were updated in v0.1.2 (PR #162). - Async and streaming behavior has been a recurring area of improvement; v0.1.10 includes an "async method bugfix" (PR #215), which affects users running
AutoFastAPIbehind async ASGI servers. - Pre-filtering and LanceDB cloud support landed in v0.1.6 (PRs #186, #187), which is relevant when the deployed query endpoint needs to scope retrievals per request. Source: community release notes (v0.1.2, v0.1.6, v0.1.10).
Practical Tips
- Use
reload=trueonly in development; in production, rely on a process manager and disable reload. - Pin the package version (e.g.,
autollm==0.1.10) when deploying to ensure consistent route schemas. - If CORS is needed for browser clients, restrict
cors_originsto known hosts rather than leaving the wildcard in production. Source: autollm/serve/utils.py:180-220
Limitations
AutoFastAPIexposes only the query surface; ingestion, reindexing, and admin endpoints are not part of this module and must be handled separately if needed.- Streaming support depends on the underlying engine exposing an async generator; not every retriever/LLM combination supports it. Source: autollm/auto/fastapi_app.py:140-200
- The module ships a single Uvicorn launcher; production deployments behind Gunicorn, behind a reverse proxy, or with TLS termination are out of scope and must be wired by the user.
Source: https://github.com/viddexa/autollm / Human Manual
Cost Calculation, Callbacks, and Utilities
Related topics: AutoLiteLLM: Unified LLM Access (100+ Models), AutoFastAPI: One-Line API Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: AutoLiteLLM: Unified LLM Access (100+ Models), AutoFastAPI: One-Line API Deployment
Cost Calculation, Callbacks, and Utilities
Overview
The autollm library wraps llama-index primitives with a set of cross-cutting concerns: cost tracking, prompt templates, logging, environment loading, and source-document cloning. These concerns live in three areas of the codebase:
autollm/callbacks/cost_calculating.py— token and USD cost accounting.autollm/auto/service_context.py— the wiring point that registers the cost callback on the active service context.autollm/utils/— templates, logging, environment helpers, and git cloning for remote document sources.
Together they let a user answer "how much did this run cost?", "what prompts were sent?", and "where did these documents come from?" without leaving autollm's API surface. The latest release notes (v0.1.10) emphasize async support and prompt-handling fixes, which surface in this module through AutoLiteLLM's system_prompt argument and the async paths inside the callback chain. Source: autollm/callbacks/cost_calculating.py:1-40
The Cost-Calculating Callback
cost_calculating.py defines a llama-index CallbackManager handler that records token usage and computes USD cost per LLM call. The handler hooks into on_event_start / on_event_end for LLMCompletionEvent (and equivalent async events), reads prompt_tokens and completion_tokens from the payload, and multiplies them by model-specific rates.
Key design points:
- Per-call accumulation. Each event stores the prompt/completion counts and the resulting dollar amount into a running total that is keyed by the event payload. This lets a single query that fans out to multiple LLM calls (retrieval-augmented generation, multi-step agents) report a cumulative cost.
- Async parity. Following the v0.1.10 bugfix ("bugfix async method"), the callback implements both sync and async hooks so cost tracking works with
await query_engine.aquery(...). Source: autollm/callbacks/cost_calculating.py:40-120 - Pricing source. Rates are read from a lightweight local table that mirrors the upstream LiteLLM pricing model; when a model is missing, the handler logs a warning via the logging utility and falls back to zero cost rather than raising. Source: autollm/utils/logging.py:1-40
- Reset semantics. Consumers call a
reset()method between independent runs so totals from one query do not leak into the next.
To retrieve totals after a query, callers access the handler through the active ServiceContext and call total_cost, total_tokens, or a breakdown by model.
Service Context Wiring
auto/service_context.py exposes a factory that builds a llama-index ServiceContext pre-configured with AutoLiteLLM, the cost callback, and the standard tokenizer. The factory accepts an AutoConfig (or equivalent kwargs) and:
- Constructs the
AutoLiteLLMinstance — including thesystem_promptargument added in v0.1.10 — and wraps it for llama-index. Source: autollm/auto/service_context.py:30-90 - Attaches the cost handler from
cost_calculating.pyto aCallbackManager. - Returns the composed
ServiceContextso downstream components (AutoQueryEngine,AutoVectorStoreIndex) inherit cost tracking transparently.
Because the callback is registered at service-context construction, every downstream query, aquery, chat, and achat call is automatically metered. The user does not need to manually install the handler. Source: autollm/auto/service_context.py:90-150
| Component | Module | Responsibility |
|---|---|---|
AutoLiteLLM | auto/llm/ | Wraps LiteLLM with system_prompt and unified sync/async surface |
| Cost handler | callbacks/cost_calculating.py | Counts tokens, computes USD |
CallbackManager | llama-index | Dispatches events to handlers |
ServiceContext | llama-index | Carries LLM, embed model, callback manager |
Utility Modules
Templates — `utils/templates.py`
Holds the default system prompt, refine prompt, question-answer prompt, and a small registry that maps logical names (e.g. "qa", "refine", "system") to PromptTemplate objects. AutoQueryEngine.from_defaults looks templates up here when the caller does not pass a custom prompt, which addresses the v0.1.4 bugfix "AutoQueryEngine bug causing not use of qa_prompt_template its given". Source: autollm/utils/templates.py:1-80
Logging — `utils/logging.py`
Configures a named logger (autollm) with a leveled formatter, integrates with tqdm so progress bars (e.g. document ingestion added in v0.1.3) are not corrupted by log output. Cost-related warnings, embedding progress, and retrieval timing all flow through this logger. Source: autollm/utils/logging.py:1-60
Environment — `utils/env_utils.py`
Loads .env files, validates required variables (e.g. OPENAI_API_KEY, vector-store credentials), and exposes a single get_env(key, default=None, required=False) helper. Called early by the service-context factory so missing keys fail fast. Source: autollm/utils/env_utils.py:1-70
Git — `utils/git_utils.py`
Clones a remote git repository into a temporary directory so it can be ingested as a document source. The helper accepts a URL, an optional branch/tag, and a depth, returning a local path consumable by SimpleDirectoryReader. This is what powers the "GitHub repository as a document source" usage pattern highlighted in the README. Source: autollm/utils/git_utils.py:1-90
End-to-End Flow
flowchart LR
A[AutoConfig + .env] --> B[env_utils]
B --> C[ServiceContext factory]
D[templates.py] --> C
E[AutoLiteLLM with system_prompt] --> C
F[cost_calculating handler] --> C
C --> G[AutoQueryEngine]
G --> H[LLM call events]
H --> F
F --> I[total_cost, total_tokens]
J[git_utils] --> K[Documents]
K --> GSource: autollm/auto/service_context.py:1-30, autollm/callbacks/cost_calculating.py:1-40, autollm/utils/templates.py:1-40, autollm/utils/env_utils.py:1-40, autollm/utils/git_utils.py:1-40, autollm/utils/logging.py:1-30
Practical Notes
- Always call
handler.reset()between independent experiments to avoid cross-contamination of token counts. - Missing-model cost warnings are emitted through
autollm.utils.logging; enable theautollmlogger atINFOto see them. - Custom prompts should be passed positionally or by name to
AutoQueryEngine.from_defaults; the templates module acts as the fallback registry rather than the primary API. - For remote repositories, prefer
git_utils.clone_repository(url, branch=..., depth=1)to keep clones small.
Source: https://github.com/viddexa/autollm / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.
1. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/viddexa/autollm
2. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm
3. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/viddexa/autollm
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/viddexa/autollm
5. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using autollm with real data or production workflows.
- v0.1.10 - github / github_release
- v0.1.9 - github / github_release
- v0.1.8 - github / github_release
- v0.1.7 - github / github_release
- v0.1.6 - github / github_release
- v0.1.5 - github / github_release
- v0.1.4 - github / github_release
- v0.1.3 - github / github_release
- v0.1.2 - github / github_release
- v0.1.1 - github / github_release
- Capability evidence risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence