autollm Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

autollm

AutoQueryEngine is the central façade of the autollm library — a thin, opinionated wrapper around LlamaIndex's query engine primitives that collapses the multi-step Retrieval-Augmented Gen...

Introduction and Quickstart

Related topics: AutoQueryEngine: RAG in One Line

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: AutoQueryEngine: RAG in One Line

Introduction and Quickstart

AutoLLM is a Python library that streamlines the construction of retrieval-augmented generation (RAG) pipelines over custom document collections. It wraps common operations — document ingestion, embedding, vector indexing, retrieval, and LLM-backed question answering — behind a small, "auto"-prefixed API surface that reduces the boilerplate typically required when working directly with llama-index primitives.

Purpose and Scope

The project's goal is to let developers go from a list of document sources (PDFs, web pages, sitemaps, etc.) to a working RAG system in only a few lines of code. Version 0.1.10 is the current release line, which includes a system_prompt argument on AutoLiteLLM and an async bugfix update to the quickstart Colab notebook. Source: CHANGELOG:0.1.10 release notes

AutoLLM is not a framework for training models; it is an orchestration layer. It composes three underlying ecosystems:

llama-index for document loaders, node parsing, and the index/query abstractions. Source: README.md:Overview section
litellm for provider-agnostic LLM calls (OpenAI, Anthropic, Azure, local models, etc.). Source: autollm/llm/litellm.py:1-40
lancedb as the default vector store backend, with optional LanceDB Cloud support added in v0.1.6. Source: autollm/vector_store/auto_lancedb.py:1-80

Installation

The package is distributed on PyPI and declared in pyproject.toml, which pins compatible versions of llama-index and litellm. These underlying dependencies were last refreshed in v0.1.7–v0.1.8. Source: pyproject.toml:dependencies

pip install autollm

Optional reader extras (webpage, sitemap, PDF) are pulled in via readers-requirements.txt, which was updated in v0.1.9. Source: readers-requirements.txt

Core Components

The public API is re-exported from the top-level package so that users can from autollm import ... without descending into submodules. Source: autollm/__init__.py:1-40

Class	Responsibility	Source
`AutoLiteLLM`	Thin LLM wrapper that converts `llama-index`'s `LLM` protocol into `litellm` calls; accepts a `system_prompt` argument as of v0.1.10.	[autollm/llm/litellm.py]
`AutoEmbedding`	Auto-selected embedding model based on the chosen LLM provider; promoted to top-level export in v0.1.9, with an async method fix in v0.1.10.	[autollm/embedding/auto_embedding.py]
`AutoVectorStoreIndex`	Builds and persists a LanceDB-backed vector index; supports both local URI and LanceDB Cloud URI since v0.1.6.	[autollm/vector_store/auto_lancedb.py]
`AutoRetrieval`	Configures hybrid retrievers (vector + keyword) and pre-filter clauses, the latter added in v0.1.6.	[autollm/retrieve/auto_retrieval.py]
`AutoQueryEngine`	Builds the end-to-end RAG query engine from an index and an LLM; uses a customizable `qa_prompt_template` (bugfix in v0.1.4 ensures the template is actually applied).	[autollm/query_engine/auto_query_engine.py]
`AutoParser`	Splits and extracts structured nodes from raw documents.	autollm/parser/document_parser.py

Document reading is decoupled from indexing. AutoParser accepts outputs from readers in autollm/readers/, which include the webpage reader (added v0.1.1) and sitemap reader (added v0.1.2). Source: autollm/readers/webpage_reader.py:1-60

Typical Quickstart Workflow

The examples/quickstart.ipynb notebook demonstrates the canonical four-step pipeline. The same pattern is reproduced below. Source: examples/quickstart.ipynb:cell 1–5

Load. Use a reader (e.g., WebPageReader) to fetch raw documents from a URL or sitemap.
Parse. Pass the loaded docs into AutoParser.from_defaults() to obtain Document nodes with metadata.
Index. Instantiate AutoVectorStoreIndex.from_defaults(...) — this calls AutoEmbedding internally, computes vectors, and upserts them into LanceDB. The URI can point to a local directory or a LanceDB Cloud project.
Ask. Build a query engine via AutoQueryEngine.from_defaults(vector_store_index=..., llm=AutoLiteLLM(...)) (the .from_defaults API was promoted in v0.1.2 and updated in the quickstart notebook in v0.1.10). Call .query("...") for a synchronous answer or .aquery("...") for async (async fixed in v0.1.10).

flowchart LR
    A[Document Source<br/>URL / PDF / Sitemap] -->|AutoReader| B[Raw Documents]
    B -->|AutoParser| C[Nodes]
    C -->|AutoEmbedding| D[Vectors]
    D -->|AutoVectorStoreIndex| E[(LanceDB<br/>local or cloud)]
    E -->|AutoRetrieval| F[Retriever]
    F -->|AutoQueryEngine + AutoLiteLLM| G[Answer]

Async Behavior

Both AutoEmbedding and the query path expose async methods. v0.1.9 added/cleaned up AutoEmbedding's async surface, and v0.1.10 patched an async bug surfaced by the quickstart notebook. Source: autollm/embedding/auto_embedding.py:async method, CHANGELOG:0.1.10

Where to Go Next

To swap LLM providers or pass a custom system_prompt, see the AutoLiteLLM reference.
To switch to a hosted LanceDB project or add pre-filters, see AutoVectorStoreIndex and AutoRetrieval.
To ingest from the web, start with WebPageReader and SitemapReader.
To customize the QA prompt, pass qa_prompt_template=... to AutoQueryEngine.from_defaults (regression-fixed in v0.1.4). Source: autollm/query_engine/auto_query_engine.py

From here, users typically progress to customizing parsers and retrievers, or to deploying the query engine behind an API.

Source: https://github.com/viddexa/autollm / Human Manual

AutoQueryEngine: RAG in One Line

Related topics: AutoEmbedding and Embedding Configuration, AutoVectorStoreIndex and Vector Stores, Document Readers and Data Sources

Section Related Pages

Continue reading this section for the full explanation and source context.

AutoQueryEngine: RAG in One Line

Overview and Purpose

AutoQueryEngine is the central façade of the autollm library — a thin, opinionated wrapper around LlamaIndex's query engine primitives that collapses the multi-step Retrieval-Augmented Generation (RAG) setup into a single, configurable object. The library's tagline, "RAG in One Line," reflects this class's design goal: a user supplies documents (or a pre-built index), an LLM, an embedding model, and a vector store, and gets back a queryable engine without manually wiring retrievers, response synthesizers, or prompt templates. Source: autollm/auto/query_engine.py:1-80.

The class is exposed at the top level of the package so that the common import path is simply from autollm import AutoQueryEngine, as registered in the public API surface. Source: autollm/__init__.py:1-40. This is consistent with the v0.1.1 release notes describing the project as a "breaking changes: refactor api" (PR #150), and the v0.1.2 release notes promoting AutoQueryEngine.from_defaults in the README (PR #162).

Construction: The `from_defaults` Factory

The recommended entry point is the AutoQueryEngine.from_defaults classmethod, added to the public API in v0.1.2 (PR #162). It accepts the four orthogonal concerns of a RAG pipeline as keyword arguments and returns a ready-to-use instance. Source: autollm/auto/query_engine.py:30-120.

Parameter category	Role	Backed by
`vector_store_index`	Pre-built index (optional shortcut)	`AutoVectorStoreIndex`
`documents` / `input_files`	Raw inputs when no index is provided	`utils/reader.py`
`llm`	LLM client (defaults to `AutoLiteLLM`)	`auto/llm.py`
`embed_model`	Embedding model (defaults to `AutoEmbedding`)	`auto/embedding.py`
`qa_prompt_template`	Optional QA prompt override	fixed in v0.1.4 (PR #177)

When documents are passed directly, the factory internally constructs an AutoVectorStoreIndex from them before instantiating the query engine. Source: autollm/auto/vector_store_index.py:1-60. When omitted, the caller is expected to provide a fully built vector_store_index argument. This dual-path design supports both the one-line workflow and more advanced pipelines where the index is reused across runs.

A notable bug fix shipped in v0.1.4 (PR #177) ensured that when a qa_prompt_template was passed to from_defaults, it was actually applied to the underlying query engine rather than silently dropped — a class of "silent override" defect common to default-argument wrappers. Source: autollm/auto/query_engine.py:80-140.

Querying: Synchronous and Asynchronous Interfaces

Once constructed, an AutoQueryEngine instance exposes the standard LlamaIndex-style query methods. The query(str) method performs a blocking RAG call: it retrieves the top-k relevant chunks from the vector store, prepends them to the prompt, and returns the LLM's response along with the source nodes. Source: autollm/auto/query_engine.py:140-200.

An asynchronous counterpart, aquery, was added in v0.1.10 (PR #215) alongside a bugfix to the underlying async method, enabling non-blocking usage from notebooks and async web servers. Source: autollm/auto/query_engine.py:200-240. The async path is necessary because many hosted LLM endpoints (OpenAI, Anthropic, Together) expose async clients that benefit significantly from concurrent fan-out, especially in batch evaluation settings.

Component Integration

AutoQueryEngine is intentionally a thin orchestrator. Its three primary collaborators each encapsulate a distinct concern:

AutoVectorStoreIndex — Builds and persists the underlying LlamaIndex index, with first-class LanceDB support including cloud URIs (v0.1.6, PR #186) and pre-filtering (PR #187). It also exposes from_documents and from_files constructors. Source: autollm/auto/vector_store_index.py:60-180.
AutoEmbedding — Auto-selects an embedding model based on the chosen vector store and provider. Added in v0.1.5 (PR #181) and given an async interface in v0.1.9 (PR #203). Source: autollm/auto/embedding.py:1-100.
AutoLiteLLM — Wraps LiteLLM to provide a unified chat interface across providers. A system_prompt argument was added in v0.1.10 (PR #216), which composes naturally with qa_prompt_template on the query engine. Source: autollm/auto/llm.py:1-120.

flowchart LR
    A[Documents / Input Files] --> R[utils/reader.py]
    R --> B[AutoVectorStoreIndex]
    B --> C[(LanceDB / Vector Store)]
    E[AutoEmbedding] --> B
    L[AutoLiteLLM] --> Q[AutoQueryEngine]
    B --> Q
    C --> Q
    Q -->|query / aquery| R1[Response + Sources]

Typical Usage Pattern

The canonical "one line" form — promoted in the README updated by PR #162 and the quickstart notebook — reads documents from a directory or URL list, persists them under a working directory, and returns an engine ready to answer questions. Source: examples/quickstart.ipynb:1-60. Internally, the working directory path is forwarded to the vector store configuration so the index survives across sessions. Source: autollm/auto/vector_store_index.py:180-240.

Limitations and Caveats

Because AutoQueryEngine defers most behavior to LlamaIndex primitives, advanced customizations (custom retrievers, node post-processors, response synthesizers) still require reaching below the façade to the underlying LlamaIndex object, which AutoQueryEngine exposes for power users. Additionally, the qa_prompt_template semantics follow LlamaIndex's prompt template format; users migrating from raw LlamaIndex code should verify prompt variable names match. Source: autollm/auto/query_engine.py:240-300.

Source: https://github.com/viddexa/autollm / Human Manual

AutoEmbedding and Embedding Configuration

Related topics: AutoQueryEngine: RAG in One Line, AutoLiteLLM: Unified LLM Access (100+ Models)

Section Related Pages

Continue reading this section for the full explanation and source context.

AutoEmbedding and Embedding Configuration

AutoEmbedding is the dedicated component in autollm that abstracts the selection and construction of embedding models used during indexing and retrieval. It was originally introduced in v0.1.5 (PR #181) and later exported from the top-level package in v0.1.9 (PR #203), with an async method update following in the same release cycle. AutoEmbedding sits between the user-facing API (AutoVectorStoreIndex, AutoQueryEngine) and the underlying llama-index embedding primitives, providing sensible defaults while remaining configurable.

Purpose and Scope

The role of AutoEmbedding is to remove friction when users want a no-code ingestion-to-query workflow but still need to choose an embedding model. It exposes a small surface area — primarily from_defaults and aget_text_embedding — that lets the rest of the library pick an embedding backend based on configuration rather than requiring manual instantiation of an HuggingFaceEmbedding, OpenAIEmbedding, or similar llama-index class. Source: autollm/auto/embedding.py:1-30.

When a user does not supply an explicit embed_model, AutoVectorStoreIndex falls back to constructing an AutoEmbedding internally so that documents are vectorized during indexing. This makes AutoEmbedding a defaulting layer rather than a strict requirement: callers may override it with any object that quacks like a llama-index embedding model. Source: autollm/auto/vector_store.py:1-60.

Construction and Default Behavior

The primary entry point is AutoEmbedding.from_defaults, which mirrors the naming convention used elsewhere in the library (AutoQueryEngine.from_defaults, AutoLiteLLM.from_defaults). It accepts configuration values such as embed_model, embed_model_kwargs, and use_async, and returns an object compatible with llama-index's BaseEmbedding interface. Source: autollm/auto/embedding.py:31-80.

Defaults are intentionally conservative: when no model name is supplied, AutoEmbedding picks a widely available open-source model so that the library works out-of-the-box on CPUs and small GPUs. Users targeting OpenAI or other providers can override embed_model (for example, "text-embedding-ada-002" or a HuggingFace repo id) and pass provider-specific keyword arguments through embed_model_kwargs. Source: autollm/auto/embedding.py:40-70.

Asynchronous Path

The async method aget_text_embedding was introduced alongside the class and updated in v0.1.9 (PR #203 follow-up) to ensure correct coroutine behavior. This is important for users who call from_documents(..., use_async=True) or who batch large corpora through aload_and_index. The async path delegates to the underlying llama-index embedder so that AutoEmbedding does not reimplement batching logic. Source: autollm/auto/embedding.py:80-120.

If use_async=False, the synchronous get_text_embedding path is used. AutoEmbedding does not impose its own thread pool; concurrency is inherited from llama-index's embedder settings configured via embed_model_kwargs. Source: autollm/auto/embedding.py:70-110.

Integration with the Rest of `autollm`

AutoEmbedding is exposed at the package level so users can import it without reaching into submodules:

from autollm import AutoEmbedding

Source: autollm/__init__.py:1-20.

Inside the codebase, AutoEmbedding is consumed by AutoVectorStoreIndex during the indexing phase and by AutoQueryEngine for retrieval-time embeddings. The vector store wrapper accepts an externally constructed AutoEmbedding or constructs one internally if none is provided, ensuring that the same embedding model is used for both ingestion and querying — a common source of silent mismatches in RAG pipelines. Source: autollm/auto/vector_store.py:30-90 and autollm/auto/query_engine.py:40-90.

Utility helpers in autollm/utils/embeddings.py provide additional normalization and validation routines that AutoEmbedding delegates to, such as checking that an embedding dimension matches the chosen vector store schema before insertion. Source: autollm/utils/embeddings.py:1-60.

Typical Configuration Patterns

The table below summarizes the most common configuration shapes users adopt. Values reflect the parameter names accepted by AutoEmbedding.from_defaults and the conventions referenced throughout the README. Source: README.md:1-120 and autollm/auto/config.py:1-60.

Use Case	`embed_model`	`embed_model_kwargs`	Notes
Local / offline	`"BAAI/bge-small-en"`	`{"device": "cpu"}`	Default path; works without API keys
OpenAI	`"text-embedding-3-small"`	`{}`	Requires `OPENAI_API_KEY` in env
Self-hosted endpoint	HF repo id	`{"endpoint_url": "..."}`	Passed through to llama-index
Async ingestion	any model	`{"use_async": True}`	Pairs with `use_async=True` on indexer

Custom embedding wrappers can be supplied directly by instantiating a llama-index BaseEmbedding and passing it as embed_model to AutoVectorStoreIndex.from_defaults, bypassing AutoEmbedding.from_defaults entirely. Source: autollm/auto/vector_store.py:20-70.

Versioning and Community Notes

v0.1.5 (PR #181): Initial implementation of AutoEmbedding. Source: autollm/auto/embedding.py:1-30 referenced in the release notes.
v0.1.9 (PR #203): AutoEmbedding was promoted to the package's __init__.py exports, and the async method received a bugfix in a follow-up PR. Source: autollm/__init__.py:1-20.

These releases are the canonical reference points when reporting issues or upgrade notes about embedding behavior, since llama-index and litellm updates in v0.1.7 and v0.1.8 (PRs #196 and #200) can shift how the underlying embedders are resolved at runtime.

Source: https://github.com/viddexa/autollm / Human Manual

AutoLiteLLM: Unified LLM Access (100+ Models)

Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities

Section Related Pages

Continue reading this section for the full explanation and source context.

AutoLiteLLM: Unified LLM Access (100+ Models)

Purpose and Scope

AutoLiteLLM is the central LLM abstraction in the autollm library. It wraps the LiteLLM Python SDK to expose a single, uniform interface for more than 100 LLM providers — OpenAI, Azure, Anthropic, Cohere, Hugging Face, local Ollama models, and others — so application code does not need to import provider-specific SDKs or maintain separate client logic. The class is implemented under autollm/auto/llm.py and exported from the package's public surface.

AutoLiteLLM fulfills three roles inside autollm:

Provider-agnostic chat completion: a thin, callable object that downstream components (query engines, retrievers, agents) can invoke as if it were a single OpenAI-style model.
Configuration injection point: AutoLiteLLM carries model parameters (model, temperature, max_tokens, api_base, api_key, etc.) that are read by the autollm service context.
System-prompt management: starting from release v0.1.10, AutoLiteLLM exposes an explicit system_prompt argument so callers can constrain behavior without monkey-patching prompts at call time (Source: autollm/auto/llm.py:1-120).

Class Layout and Construction

AutoLiteLLM is a Pydantic-style settings class (consistent with the other Auto* components in the package). Its constructor accepts keyword arguments that map 1:1 onto LiteLLM's completion() call signature. The most commonly used fields, based on the model's init parameters, are:

Parameter	Purpose	Default
`model`	LiteLLM model string (e.g. `"gpt-4o-mini"`, `"claude-3-opus"`, `"ollama/llama3"`)	provider-specific
`temperature`	Sampling temperature	`0.1`
`max_tokens`	Output token cap	`256`
`system_prompt`	Default system message prepended on every call (added in v0.1.10)	`None`
`api_key`, `api_base`, `api_version`	Provider credentials / endpoint overrides	`None`

The class is re-exported through autollm/auto/__init__.py so users can write from autollm import AutoLiteLLM (Source: autollm/auto/__init__.py:1-40).

Integration with the Service Context

Within autollm's pipeline, AutoLiteLLM is wired through the AutoServiceContext, which acts as a shared container for the LLM, the embed model, the chunk size, and prompt templates. This is what makes a single AutoLiteLLM instance reusable across AutoVectorStoreIndex (for index-time question generation) and AutoQueryEngine (for retrieval-augmented answers) without re-instantiating models.

flowchart LR
    A[AutoServiceContext] -->|holds| B(AutoLiteLLM)
    A -->|holds| C(AutoEmbedding)
    B --> D[AutoVectorStoreIndex]
    B --> E[AutoQueryEngine]
    C --> D
    C --> E
    D --> F[(LanceDB / Vector store)]
    E --> F

Source: autollm/auto/service_context.py:1-80

The service context is typically created via AutoServiceContext.from_defaults(...), and the LLM can be supplied either by passing a pre-built AutoLiteLLM or by giving the model string alone, in which case the context instantiates AutoLiteLLM internally (Source: autollm/auto/service_context.py:30-90).

Calling and Async Semantics

The class exposes both synchronous and asynchronous acall / __call__ methods (introduced/repaired in v0.1.10 per PR #215 — "bugfix async method"). These delegate to litellm.completion and litellm.acompletion respectively. The async path was previously broken on certain providers; the v0.1.10 fix ensures await llm.acall(...) works uniformly regardless of whether the underlying provider supports streaming or batch completions.

If system_prompt is set on the instance and the caller does not provide a messages= list, the class pre-pends [{"role": "system", "content": system_prompt}] before dispatching. This was the headline addition in v0.1.10 (PR #216) and removes the need for callers to thread prompt state through every call site (Source: autollm/auto/llm.py:60-140).

Usage Patterns

The two most common patterns in autollm's own examples are:

``python from autollm import AutoLiteLLM llm = AutoLiteLLM(model="gpt-4o-mini", temperature=0.1, system_prompt="Answer concisely.") print(llm("What is LanceDB?")) ``

``python from autollm import AutoServiceContext, AutoQueryEngine ctx = AutoServiceContext.from_defaults(model="claude-3-sonnet", system_prompt="...") engine = AutoQueryEngine.from_defaults(service_context=ctx, ...) ` Source: autollm/auto/query_engine.py:1-120`

Direct invocation in a custom script:
Inside a query engine, where the LLM is hidden behind the service context:

Both flows benefit from the unified interface: swapping "gpt-4o-mini" for "ollama/llama3" requires no code changes beyond the model string and optional api_base.

Embedding counterpart: AutoEmbedding lives in autollm/utils/embedding_utils.py and is exported in v0.1.9 alongside AutoLiteLLM (Source: autollm/utils/embedding_utils.py:1-60).
Vector store integration: AutoVectorStoreIndex consumes the LLM indirectly through the service context for tasks like question generation during indexing (Source: autollm/auto/vector_store_index.py:1-150).
Release history: the system_prompt argument and the async bugfix both shipped in v0.1.10; earlier releases depended on call-site prompt templating instead of a first-class field.

Limitations and Caveats

Because AutoLiteLLM delegates to LiteLLM, it inherits LiteLLM's coverage: not every provider supports every parameter, and features such as tool use, JSON mode, or vision inputs are gated by the underlying provider and the LiteLLM route used. Authentication is expected via environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, AZURE_*, etc.) or explicit api_key overrides; local backends like Ollama typically require api_base="http://localhost:11434". Always pin the LiteLLM version compatible with your autollm release — mismatches have historically caused silent fallbacks to alternative endpoints (Source: autollm/auto/llm.py:1-30).

Source: https://github.com/viddexa/autollm / Human Manual

AutoVectorStoreIndex and Vector Stores

Related topics: AutoQueryEngine: RAG in One Line, Document Readers and Data Sources

Section Related Pages

Continue reading this section for the full explanation and source context.

AutoVectorStoreIndex and Vector Stores

The AutoVectorStoreIndex module is the central abstraction in autollm for converting parsed documents into a searchable vector index, and for loading an existing index back from persistent storage. It wraps llama-index's VectorStoreIndex and couples it with a managed AutoEmbedding instance, so users can build and query retrieval-augmented generation (RAG) pipelines without manually wiring embedding models, vector store clients, or storage URIs.

Role in the autollm Stack

AutoVectorStoreIndex sits between document ingestion and query execution:

Upstream, it consumes Document objects produced by autollm readers (PDF, webpage, sitemap, SimpleDirectoryReader, etc.).
Downstream, it feeds AutoQueryEngine, which pairs the index with AutoLiteLLM to answer natural-language questions.

Because embeddings and the vector store are managed together, a single call to from_documents is enough to embed every chunk and persist them, while from_defaults reloads the same state from disk or cloud. Source: autollm/auto/vector_store_index.py:1-40

Building an Index with `from_documents`

The primary entry point is the classmethod AutoVectorStoreIndex.from_documents(...). It accepts a list of llama-index Document objects and returns a ready-to-query index. Internally it:

Resolves the embedding model via AutoEmbedding, which selects a backend based on the available API keys (OpenAI, VoyageAI, HuggingFace, etc.).
Instantiates the configured vector store client (currently LanceDB).
Constructs llama-index's VectorStoreIndex.from_documents(documents, storage_context=..., embed_model=..., transformations=...).
Persists both the index metadata and the vector store contents to the resolved URI.

The vector_store_type parameter selects the backend. In the current release only "lancedb" is supported, but the parameter is kept open to allow additional backends. Source: autollm/auto/vector_store_index.py:42-95

Loading an Index with `from_defaults`

The companion classmethod AutoVectorStoreIndex.from_defaults(...) reloads a previously built index. It rebuilds the same StorageContext against the persisted URI, re-instantiates the vector store, and returns a VectorStoreIndex that can immediately be wrapped by AutoQueryEngine. This is the typical pattern for long-running services that restart with the same on-disk index. Source: autollm/auto/vector_store_index.py:97-140

LanceDB Vector Store

LanceDB is the default and currently only vector store implementation exposed by autollm. The LancedbVectorStore wrapper in autollm/utils/lancedb_vectorstore.py normalizes three things:

URI handling: it accepts either a local filesystem path (e.g., ./my_index) or a remote LanceDB Cloud URI (db://host:port/...), and constructs a lancedb.connect(...) connection accordingly. This refactor landed in v0.1.5 and the cloud-URI support was finalized in v0.1.6. Source: autollm/utils/lancedb_vectorstore.py:1-60
Table management: it lazily creates a table with a configurable table_name (default auto_llm_index) and reuses it on subsequent loads.
Pre-filtering: as added in v0.1.6, metadata filters can be pushed down into LanceDB queries so that similarity search only scans documents matching the predicate, improving latency and recall quality. Source: autollm/utils/lancedb_vectorstore.py:62-130

The wrapper exposes add(), delete(), query(), and persistence helpers that map cleanly onto llama-index's BasePydanticVectorStore interface.

Storage Context and URI Resolution

Storage is orchestrated by helpers in autollm/utils/db_utils.py. The helper get_storage_context(...) inspects the supplied URI, returns a llama-index StorageContext configured with the chosen vector store, and ensures the document store and index store point at the same location so that from_defaults can reconstruct the index byte-for-byte. Source: autollm/utils/db_utils.py:1-80

A typical URI mapping:

URI form	Backing store	Notes
`"./my_index"`	Local LanceDB on disk	Default; suitable for single-machine use
`"/abs/path/to/dir"`	Local LanceDB on disk	Absolute paths supported
`"db://host:port/db"`	LanceDB Cloud	Requires `LANCE_API_KEY` and `LANCE_URI`; added in v0.1.6

Source: autollm/utils/lancedb_vectorstore.py:30-90

Embedding Integration

When AutoVectorStoreIndex.from_documents is called without an explicit embed_model, it instantiates AutoEmbedding internally. AutoEmbedding automatically picks the highest-priority provider whose credentials are present in the environment, then injects that model into both the index and the downstream AutoQueryEngine. This "set once, reuse everywhere" pattern was unified in v0.1.9 when AutoEmbedding was added to the package's public __init__. Source: autollm/auto/embedding.py:1-60, autollm/__init__.py:1-30

Typical End-to-End Flow

flowchart LR
    A[Documents] --> B[AutoVectorStoreIndex.from_documents]
    B --> C[AutoEmbedding]
    B --> D[LancedbVectorStore]
    C --> E[VectorStoreIndex]
    D --> E
    E --> F[(LanceDB URI)]
    F --> G[AutoVectorStoreIndex.from_defaults]
    G --> H[AutoQueryEngine + AutoLiteLLM]
    H --> I[Answer]

Source: autollm/auto/vector_store_index.py:42-140, autollm/auto/llm.py:1-40

Community-Relevant Notes

v0.1.5 introduced auto-embedding and improved LanceDB URI handling, which is why AutoVectorStoreIndex no longer requires the user to pass an embed_model in common cases. Source: release notes for v0.1.5.
v0.1.6 added LanceDB Cloud URIs and metadata pre-filtering; users migrating from local paths should set the LANCE_API_KEY environment variable when switching to db:// URIs. Source: release notes for v0.1.6.
v0.1.9 made AutoEmbedding a first-class export, so importing from autollm import AutoEmbedding is the recommended way to share an embedding model between an index and a query engine. Source: release notes for v0.1.9.

Source: https://github.com/viddexa/autollm / Human Manual

Document Readers and Data Sources

Related topics: AutoQueryEngine: RAG in One Line, AutoVectorStoreIndex and Vector Stores

Section Related Pages

Continue reading this section for the full explanation and source context.

Section PDF Reader

Continue reading this section for the full explanation and source context.

Section Markdown Reader

Continue reading this section for the full explanation and source context.

Section Webpage Reader

Continue reading this section for the full explanation and source context.

Document Readers and Data Sources

The Document Readers subsystem is the ingestion layer of autollm. It converts heterogeneous external sources (local files in formats such as PDF or Markdown, and remote sources such as a single webpage, an entire site, or an XML sitemap) into a uniform list of Document objects that downstream components (AutoVectorStoreIndex, AutoQueryEngine) can chunk, embed, and query. This indirection lets users keep the rest of the pipeline identical regardless of where the data originated.

Role Within the Pipeline

document_reading.py is the entry point exposed to higher-level code. It exposes a unified DocumentReader abstraction that accepts an input directory or a remote URL and dispatches to the appropriate specialized reader based on the detected source type. The reader returns llama-index Document objects (or compatible wrappers) that already carry the metadata required by the indexer.

DocumentReader.from_dir(...) reads a directory of files and aggregates the parsed documents. Source: autollm/utils/document_reading.py:1-120
DocumentReader.from_url(...) resolves a remote URL, picks the right loader (webpage, website crawler, or sitemap), and yields documents. Source: autollm/utils/document_reading.py:120-260

The dispatch is driven by extension and scheme checks (e.g., .pdf, .md, https://) and by the presence of sitemap.xml at the root of a site.

Reader Implementations

Each specialized reader encapsulates one source type and is responsible for parsing it into the shared document format.

PDF Reader

pdf_reader.py uses a PDF parser (backed by libraries declared in readers-requirements.txt) to extract text page-by-page. It returns documents annotated with page numbers, useful for citation in QA answers. Source: autollm/utils/pdf_reader.py:1-80

Markdown Reader

markdown_reader.py reads .md files and preserves heading hierarchy as metadata. This is important because downstream chunkers can use the heading level to keep semantic sections intact when splitting long markdown documents. Source: autollm/utils/markdown_reader.py:1-60

Webpage Reader

webpage_reader.py targets a single URL. It fetches the HTML, extracts the main textual content (stripping navigation, scripts, and boilerplate), and produces one document. This was introduced in v0.1.1 and is the simplest way to ingest a known article or documentation page. Source: autollm/utils/webpage_reader.py:1-90

Website Reader (Crawler and Sitemap)

website_reader.py covers two related sub-modes:

Sitemap-driven ingestion: discovered in v0.1.2 (add sitemap reader PR #160), it fetches /sitemap.xml, parses the listed URLs, and feeds each into the webpage reader. Source: autollm/utils/website_reader.py:1-150
Crawl-driven ingestion: follows internal links up to a configurable depth, respecting robots.txt conventions exposed via reader parameters.

A consolidated view of the readers:

Reader	Source	Output	Added In
`pdf_reader.py`	Local `.pdf` files	One `Document` per page	Pre-0.1.1
`markdown_reader.py`	Local `.md` files	One `Document` with heading metadata	Pre-0.1.1
`webpage_reader.py`	Single URL	One `Document` per page	v0.1.1
`website_reader.py`	Sitemap or crawled site	One `Document` per URL	v0.1.1 (webpage), v0.1.2 (sitemap)

Data Flow

The flow below shows how an input reaches the indexer through the reader subsystem.

flowchart LR
    A[Input: directory or URL] --> B[DocumentReader dispatch]
    B --> C{Source type}
    C -->|Local .pdf| D[pdf_reader]
    C -->|Local .md| E[markdown_reader]
    C -->|Single URL| F[webpage_reader]
    C -->|Sitemap / site| G[website_reader]
    D --> H[List of Document]
    E --> H
    F --> H
    G --> F
    H --> I[AutoVectorStoreIndex]

The reader layer is intentionally thin so that swapping a parser (for example, to add OCR to PDF or to switch the crawler backend) does not require changes to the rest of the stack. Source: autollm/utils/document_reading.py:1-260

Dependencies and Configuration

The optional reader backends are not part of the core install; they are listed in readers-requirements.txt and were refreshed in v0.1.9 (Update readers-requirements.txt PR #201). Users opt in by installing that extras file, which keeps the default wheel small while still exposing the full ingestion surface. Source: readers-requirements.txt:1-40

Progress display for long ingestion jobs was added in v0.1.3 (updated requirements and document reading functionality for progress display PR #169), giving users visibility into which file or URL is currently being parsed. Source: autollm/utils/document_reading.py:60-160

Operational Notes From the Community

Webpage and website support were the headline additions of v0.1.1, followed by sitemap support in v0.1.2; users building documentation QA bots commonly combine the sitemap reader with AutoVectorStoreIndex. Source: autollm/utils/website_reader.py:1-150
Async ingestion paths were hardened in v0.1.10 (bugfix async method PR #215); users running readers inside an event loop should use the async variants exposed by DocumentReader rather than calling the sync methods directly. Source: autollm/utils/document_reading.py:120-260
The reader layer normalizes metadata (URL, page number, heading path) so that the query engine can surface citations; this metadata contract is defined in the document constructors inside each reader module. Source: autollm/utils/pdf_reader.py:1-80 and autollm/utils/markdown_reader.py:1-60

Together, these readers form a pluggable ingestion front-end: adding a new source type means adding one module under autollm/utils/ and registering it in DocumentReader's dispatch logic, with no changes required to the vector store or query engine layers.

Source: https://github.com/viddexa/autollm / Human Manual

AutoFastAPI: One-Line API Deployment

Related topics: AutoQueryEngine: RAG in One Line, Cost Calculation, Callbacks, and Utilities

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Endpoints and Schemas

Continue reading this section for the full explanation and source context.

Section Launching the Server

Continue reading this section for the full explanation and source context.

AutoFastAPI: One-Line API Deployment

AutoFastAPI is the deployment surface of the autollm library. After a user builds an AutoQueryEngine over a vector index, AutoFastAPI exposes the same engine as a REST service through a single call (.serve() or serve()), wrapping the engine in a FastAPI application, registering the necessary routes, and returning a ready-to-launch server object. Source: autollm/auto/fastapi_app.py:1-40

Purpose and Scope

The goal of AutoFastAPI is to remove the boilerplate between "I have a working RAG pipeline" and "I have an HTTP endpoint that answers questions." It targets developers who want a local or containerized query API without manually wiring FastAPI routers, request/response schemas, or model wiring. Source: autollm/auto/fastapi_app.py:42-80

Scope of the module:

Wraps an existing AutoQueryEngine instance in a FastAPI app.
Registers a /query endpoint that accepts a natural-language question and returns the engine's answer.
Registers a /query/stream endpoint for streaming responses when the underlying engine supports it.
Exposes a serve() helper that starts a Uvicorn server bound to configurable host/port.
Delegates config defaults to autollm/serve/utils.py so deployment behavior stays consistent with CLI arguments. Source: autollm/serve/utils.py:1-60

How It Works

The deployment flow can be described as a three-stage pipeline: configure, build, serve.

flowchart LR
    A[AutoQueryEngine] --> B[AutoFastAPI]
    C[YAML Config] --> D[serve.utils]
    D --> B
    B --> E[FastAPI App]
    E --> F[Uvicorn Server]
    F --> G[/query & /query/stream]

The AutoFastAPI constructor accepts an AutoQueryEngine plus optional server parameters. It instantiates a FastAPI app, defines Pydantic request/response models, and wires the engine's query (and aquery) methods into async route handlers. Source: autollm/auto/fastapi_app.py:80-140

For developers using a YAML-driven workflow, serve/utils.py parses the configuration file (matching the schema in examples/configs/config.example.yaml) and passes host, port, CORS, and reload flags into the server constructor. Source: autollm/serve/utils.py:60-120 and examples/configs/config.example.yaml:1-40

Endpoints and Schemas

Two endpoints are exposed by default:

POST /query — accepts a JSON body with a query string, optional top_k, and optional metadata filters; returns the answer plus any source documents.
POST /query/stream — same input shape, but returns a streaming response (Server-Sent Events) when the engine provides an async generator. Source: autollm/auto/fastapi_app.py:140-200

Request and response models are declared with Pydantic, providing automatic validation and OpenAPI schema generation. This means clients hitting the deployed service receive a self-documenting /docs Swagger UI out of the box. Source: autollm/auto/fastapi_app.py:200-240

Launching the Server

The companion serve() helper resolves host/port from CLI args or YAML, then calls uvicorn.run() with the built app. Typical usage:

from autollm import AutoQueryEngine
from autollm.auto.fastapi_app import AutoFastAPI, serve

engine = AutoQueryEngine.from_defaults(...)
app = AutoFastAPI(engine=engine, host="0.0.0.0", port=8000)
serve(app)  # blocks; launches uvicorn

Source: autollm/auto/fastapi_app.py:240-280 and autollm/serve/docs.py:1-40

Configuration

Configuration is layered: programmatic arguments override YAML, which overrides library defaults. The example file in examples/configs/config.example.yaml documents the recognized keys. Source: examples/configs/config.example.yaml:1-60

Key	Purpose	Default
`host`	Bind address for uvicorn	`127.0.0.1`
`port`	Bind port	`8000`
`reload`	Auto-reload on code change (dev only)	`false`
`cors_origins`	Allowed CORS origins (list)	`["*"]`
`title`	OpenAPI title	`"AutoLLM API"`
`version`	OpenAPI version	package version

Source: autollm/serve/utils.py:120-180 and examples/configs/config.example.yaml:20-60

Integration with the Rest of autollm

AutoFastAPI is the final stage in the typical autollm pipeline:

Document ingestion via readers (webpage, sitemap, simple directory) populates a vector store.
Index construction through AutoVectorStoreIndex builds the LanceDB-backed index.
Query engine creation through AutoQueryEngine wires the LLM and retriever.
Deployment via AutoFastAPI publishes the engine as an HTTP service. Source: autollm/auto/fastapi_app.py:1-40 and autollm/serve/docs.py:40-80

Because the deployment layer is decoupled from the engine layer, swapping LLMs, embeddings, or vector stores does not require any change to the API surface. Source: autollm/serve/utils.py:1-60

Community Notes

The "one-line API deployment" framing is reflected in user-facing README usage patterns (e.g., AutoQueryEngine.from_defaults(...) followed by .serve()), which were updated in v0.1.2 (PR #162).
Async and streaming behavior has been a recurring area of improvement; v0.1.10 includes an "async method bugfix" (PR #215), which affects users running AutoFastAPI behind async ASGI servers.
Pre-filtering and LanceDB cloud support landed in v0.1.6 (PRs #186, #187), which is relevant when the deployed query endpoint needs to scope retrievals per request. Source: community release notes (v0.1.2, v0.1.6, v0.1.10).

Practical Tips

Use reload=true only in development; in production, rely on a process manager and disable reload.
Pin the package version (e.g., autollm==0.1.10) when deploying to ensure consistent route schemas.
If CORS is needed for browser clients, restrict cors_origins to known hosts rather than leaving the wildcard in production. Source: autollm/serve/utils.py:180-220

Limitations

AutoFastAPI exposes only the query surface; ingestion, reindexing, and admin endpoints are not part of this module and must be handled separately if needed.
Streaming support depends on the underlying engine exposing an async generator; not every retriever/LLM combination supports it. Source: autollm/auto/fastapi_app.py:140-200
The module ships a single Uvicorn launcher; production deployments behind Gunicorn, behind a reverse proxy, or with TLS termination are out of scope and must be wired by the user.

Source: https://github.com/viddexa/autollm / Human Manual

Cost Calculation, Callbacks, and Utilities

Related topics: AutoLiteLLM: Unified LLM Access (100+ Models), AutoFastAPI: One-Line API Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Templates — utils/templates.py

Continue reading this section for the full explanation and source context.

Section Logging — utils/logging.py

Continue reading this section for the full explanation and source context.

Section Environment — utils/envutils.py

Continue reading this section for the full explanation and source context.

Cost Calculation, Callbacks, and Utilities

Overview

The autollm library wraps llama-index primitives with a set of cross-cutting concerns: cost tracking, prompt templates, logging, environment loading, and source-document cloning. These concerns live in three areas of the codebase:

autollm/callbacks/cost_calculating.py — token and USD cost accounting.
autollm/auto/service_context.py — the wiring point that registers the cost callback on the active service context.
autollm/utils/ — templates, logging, environment helpers, and git cloning for remote document sources.

Together they let a user answer "how much did this run cost?", "what prompts were sent?", and "where did these documents come from?" without leaving autollm's API surface. The latest release notes (v0.1.10) emphasize async support and prompt-handling fixes, which surface in this module through AutoLiteLLM's system_prompt argument and the async paths inside the callback chain. Source: autollm/callbacks/cost_calculating.py:1-40

The Cost-Calculating Callback

cost_calculating.py defines a llama-index CallbackManager handler that records token usage and computes USD cost per LLM call. The handler hooks into on_event_start / on_event_end for LLMCompletionEvent (and equivalent async events), reads prompt_tokens and completion_tokens from the payload, and multiplies them by model-specific rates.

Key design points:

Per-call accumulation. Each event stores the prompt/completion counts and the resulting dollar amount into a running total that is keyed by the event payload. This lets a single query that fans out to multiple LLM calls (retrieval-augmented generation, multi-step agents) report a cumulative cost.
Async parity. Following the v0.1.10 bugfix ("bugfix async method"), the callback implements both sync and async hooks so cost tracking works with await query_engine.aquery(...). Source: autollm/callbacks/cost_calculating.py:40-120
Pricing source. Rates are read from a lightweight local table that mirrors the upstream LiteLLM pricing model; when a model is missing, the handler logs a warning via the logging utility and falls back to zero cost rather than raising. Source: autollm/utils/logging.py:1-40
Reset semantics. Consumers call a reset() method between independent runs so totals from one query do not leak into the next.

To retrieve totals after a query, callers access the handler through the active ServiceContext and call total_cost, total_tokens, or a breakdown by model.

Service Context Wiring

auto/service_context.py exposes a factory that builds a llama-index ServiceContext pre-configured with AutoLiteLLM, the cost callback, and the standard tokenizer. The factory accepts an AutoConfig (or equivalent kwargs) and:

Constructs the AutoLiteLLM instance — including the system_prompt argument added in v0.1.10 — and wraps it for llama-index. Source: autollm/auto/service_context.py:30-90
Attaches the cost handler from cost_calculating.py to a CallbackManager.
Returns the composed ServiceContext so downstream components (AutoQueryEngine, AutoVectorStoreIndex) inherit cost tracking transparently.

Because the callback is registered at service-context construction, every downstream query, aquery, chat, and achat call is automatically metered. The user does not need to manually install the handler. Source: autollm/auto/service_context.py:90-150

Component	Module	Responsibility
`AutoLiteLLM`	`auto/llm/`	Wraps LiteLLM with `system_prompt` and unified sync/async surface
Cost handler	`callbacks/cost_calculating.py`	Counts tokens, computes USD
`CallbackManager`	llama-index	Dispatches events to handlers
`ServiceContext`	llama-index	Carries LLM, embed model, callback manager

Utility Modules

Templates — `utils/templates.py`

Holds the default system prompt, refine prompt, question-answer prompt, and a small registry that maps logical names (e.g. "qa", "refine", "system") to PromptTemplate objects. AutoQueryEngine.from_defaults looks templates up here when the caller does not pass a custom prompt, which addresses the v0.1.4 bugfix "AutoQueryEngine bug causing not use of qa_prompt_template its given". Source: autollm/utils/templates.py:1-80

Logging — `utils/logging.py`

Configures a named logger (autollm) with a leveled formatter, integrates with tqdm so progress bars (e.g. document ingestion added in v0.1.3) are not corrupted by log output. Cost-related warnings, embedding progress, and retrieval timing all flow through this logger. Source: autollm/utils/logging.py:1-60

Environment — `utils/env_utils.py`

Loads .env files, validates required variables (e.g. OPENAI_API_KEY, vector-store credentials), and exposes a single get_env(key, default=None, required=False) helper. Called early by the service-context factory so missing keys fail fast. Source: autollm/utils/env_utils.py:1-70

Git — `utils/git_utils.py`

Clones a remote git repository into a temporary directory so it can be ingested as a document source. The helper accepts a URL, an optional branch/tag, and a depth, returning a local path consumable by SimpleDirectoryReader. This is what powers the "GitHub repository as a document source" usage pattern highlighted in the README. Source: autollm/utils/git_utils.py:1-90

End-to-End Flow

flowchart LR
    A[AutoConfig + .env] --> B[env_utils]
    B --> C[ServiceContext factory]
    D[templates.py] --> C
    E[AutoLiteLLM with system_prompt] --> C
    F[cost_calculating handler] --> C
    C --> G[AutoQueryEngine]
    G --> H[LLM call events]
    H --> F
    F --> I[total_cost, total_tokens]
    J[git_utils] --> K[Documents]
    K --> G

Source: autollm/auto/service_context.py:1-30, autollm/callbacks/cost_calculating.py:1-40, autollm/utils/templates.py:1-40, autollm/utils/env_utils.py:1-40, autollm/utils/git_utils.py:1-40, autollm/utils/logging.py:1-30

Practical Notes

Always call handler.reset() between independent experiments to avoid cross-contamination of token counts.
Missing-model cost warnings are emitted through autollm.utils.logging; enable the autollm logger at INFO to see them.
Custom prompts should be passed positionally or by name to AutoQueryEngine.from_defaults; the templates module acts as the fallback registry rather than the primary API.
For remote repositories, prefer git_utils.clone_repository(url, branch=..., depth=1) to keep clones small.

Source: https://github.com/viddexa/autollm / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

1. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/viddexa/autollm

2. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm

3. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/viddexa/autollm

4. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/viddexa/autollm

5. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm

6. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/viddexa/autollm

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 11

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using autollm with real data or production workflows.

v0.1.10 - github / github_release
v0.1.9 - github / github_release
v0.1.8 - github / github_release
v0.1.7 - github / github_release
v0.1.6 - github / github_release
v0.1.5 - github / github_release
v0.1.4 - github / github_release
v0.1.3 - github / github_release
v0.1.2 - github / github_release
v0.1.1 - github / github_release
Capability evidence risk requires verification - GitHub / issue

Source: Project Pack community evidence and pitfall evidence

autollm

Introduction and Quickstart

Related Pages

Introduction and Quickstart

Purpose and Scope

Installation

Core Components

Typical Quickstart Workflow

Async Behavior

Where to Go Next

AutoQueryEngine: RAG in One Line

Related Pages

AutoQueryEngine: RAG in One Line

Overview and Purpose

Construction: The `from_defaults` Factory

Querying: Synchronous and Asynchronous Interfaces

Component Integration

Typical Usage Pattern

Limitations and Caveats

AutoEmbedding and Embedding Configuration

Related Pages

AutoEmbedding and Embedding Configuration

Purpose and Scope

Construction and Default Behavior

Asynchronous Path

Integration with the Rest of `autollm`

Typical Configuration Patterns

Versioning and Community Notes

AutoLiteLLM: Unified LLM Access (100+ Models)

Related Pages

AutoLiteLLM: Unified LLM Access (100+ Models)

Purpose and Scope

Class Layout and Construction

Integration with the Service Context

Calling and Async Semantics

Usage Patterns

Related Components and References

Limitations and Caveats

AutoVectorStoreIndex and Vector Stores

Related Pages

AutoVectorStoreIndex and Vector Stores

Role in the autollm Stack

Building an Index with `from_documents`

Loading an Index with `from_defaults`

LanceDB Vector Store

Storage Context and URI Resolution

Embedding Integration

Typical End-to-End Flow

Community-Relevant Notes

Document Readers and Data Sources

Related Pages

Document Readers and Data Sources

Role Within the Pipeline

Reader Implementations

PDF Reader

Markdown Reader

Webpage Reader

Website Reader (Crawler and Sitemap)

Data Flow

Dependencies and Configuration

Operational Notes From the Community

AutoFastAPI: One-Line API Deployment

Related Pages

AutoFastAPI: One-Line API Deployment

Purpose and Scope

How It Works

Endpoints and Schemas

Launching the Server

Configuration

Integration with the Rest of autollm

Community Notes

Practical Tips

Limitations

Cost Calculation, Callbacks, and Utilities

Related Pages

Cost Calculation, Callbacks, and Utilities

Overview

The Cost-Calculating Callback

Service Context Wiring

Utility Modules