# https://github.com/sinaptik-ai/pandas-ai Project Manual Generated at: 2026-06-22 18:11:28 UTC ## Table of Contents - [Overview, Installation, and Quickstart](#page-1) - [Code Execution, Sandbox, and Security Model](#page-2) - [LLM Backends, Local Models, and Extension Ecosystem](#page-3) - [Agent Lifecycle, Prompts, and Semantic Layer](#page-4) ## Overview, Installation, and Quickstart ### Related Pages Related topics: [Code Execution, Sandbox, and Security Model](#page-2), [LLM Backends, Local Models, and Extension Ecosystem](#page-3)

Related Source Files

The following source files were used to generate this page: - [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md) - [CONTRIBUTING.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/CONTRIBUTING.md) - [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) - [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) - [pandasai/core/prompts/generate_system_message.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) - [pandasai/vectorstores/vectorstore.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py) - [extensions/llms/openai/pandasai_openai/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py) - [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py)

# Overview, Installation, and Quickstart ## What is PandasAI PandasAI is a Python library that lets users ask questions about their data in natural language. It targets two audiences: non-technical users who want to query datasets conversationally, and technical users who want to accelerate exploratory data analysis. The library is distributed as the core `pandasai` package on PyPI, with separate extension packages for additional LLM providers and vector stores. Source: [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md). At a high level, the library works by sending the user's question together with a serialized representation of the dataframe to a Large Language Model (LLM), receiving generated Python code in response, and executing that code to produce a result. This flow is reflected in the prompt architecture: a `BasePrompt` renders Jinja2 templates, which are passed to an LLM that extends `pandasai.llm.base.LLM`. Source: [pandasai/core/prompts/base.py:1-45](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py), [pandasai/llm/base.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py). The system message prompt class, for example, loads its template from disk through `BasePrompt.template_path`, which is how instructions are injected into the LLM context. Source: [pandasai/core/prompts/generate_system_message.py:1-5](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py). ## Installation PandasAI requires **Python 3.8+ up to 3.11** at the time of v3.0.0. This constraint is enforced through a dependency on `scipy==1.10.1`, which itself caps Python at `<3.12`. Community requests to support Python 3.12 are tracked in issues #1850, #1787, and #1872. Source: [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md). Install the core library and an LLM provider extension with either `pip` or `poetry`: ```bash # pip pip install pandasai pip install pandasai-litellm # poetry poetry add pandasai poetry add pandasai-litellm ``` The `pandasai-litellm` extension is a common choice because it routes requests through [LiteLLM](https://github.com/BerriAI/litellm), supporting many providers with a single interface. Source: [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md). ### Development install Contributors are instructed to use Poetry (not pip or conda) and to install all extras plus dev dependencies: ```bash poetry install --all-extras --with dev pre-commit install ``` The project uses `ruff` for linting and `pytest` for tests. Source: [CONTRIBUTING.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/CONTRIBUTING.md). ### Optional extensions | Extension | Purpose | Source | |-----------|---------|--------| | `pandasai-litellm` | Multi-provider LLM routing | [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md) | | `pandasai-openai` | Native OpenAI / Azure OpenAI | [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py) | | `pandasai-chromadb` | ChromaDB vector store (EE) | [extensions/ee/vectorstores/chromadb/](https://github.com/sinaptik-ai/pandas-ai/tree/main/extensions/ee/vectorstores/chromadb) | | `pandasai-pinecone` | Pinecone vector store (EE) | [extensions/ee/vectorstores/pinecone/](https://github.com/sinaptik-ai/pandas-ai/tree/main/extensions/ee/vectorstores/pinecone) | | `pandasai-milvus` | Milvus vector store (EE) | [extensions/ee/vectorstores/milvus/](https://github.com/sinaptik-ai/pandas-ai/tree/main/extensions/ee/vectorstores/milvus) | | `pandasai-qdrant` | Qdrant vector store (EE) | [extensions/ee/vectorstores/qdrant/](https://github.com/sinaptik-ai/pandas-ai/tree/main/extensions/ee/vectorstores/qdrant) | Vector-store extensions fall under the **Sinaptik GmbH Enterprise License** and are intended for commercial use under that license. Source: [extensions/ee/vectorstores/pinecone/README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/pinecone/README.md), [extensions/ee/vectorstores/qdrant/README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/qdrant/README.md). ## Quickstart The minimal end-to-end example uses `pandasai` together with the LiteLLM extension. Source: [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md). ```python import pandasai as pai from pandasai_litellm.litellm import LiteLLM # 1. Configure the LLM llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY") pai.config.set("llm", llm) # 2. Load a dataframe df = pai.read_csv("employees.csv") # 3. Ask a question print(df.chat("Which employee has the highest salary?")) ``` Each `.chat()` call serializes the dataframe, builds prompts, calls the LLM, and executes the returned code. The default OpenAI extension supports a wide range of chat and completion model IDs, including `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `gpt-4o`, and `gpt-4o-mini`. Source: [extensions/llms/openai/pandasai_openai/openai.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py). The `BaseOpenAI` class exposes standard inference parameters such as `temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`, and `seed`, all of which are forwarded to the underlying OpenAI client. Source: [extensions/llms/openai/pandasai_openai/base.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py). ## Architecture and Data Flow The following diagram summarizes the request path for a single `chat()` call: ```mermaid flowchart LR A[User question] --> B[SmartDataframe / Agent] B --> C[Prompt Builder
BasePrompt + Jinja2] C --> D[LLM
LLM subclass] D --> E[Generated Python code] E --> F[Code Executor] F --> G[Result] B -.optional.-> H[Vector Store
SemanticLayer / Memory] D <-. context .-> H ``` - The **prompt builder** uses `BasePrompt`, which supports both inline `template` strings and `template_path` files loaded via Jinja2's `FileSystemLoader`. Source: [pandasai/core/prompts/base.py:1-45](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py). - The **LLM** is any subclass of `pandasai.llm.base.LLM`. The base class exposes `is_pandasai_llm`, `type`, and a `_polish_code` helper that strips leading `python` markers and stray backticks from generated snippets. Source: [pandasai/llm/base.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py). - The **vector store** is an optional component used to retrieve relevant documents or past question/answer pairs. `VectorStore` defines abstract methods such as `add_docs`, `update_docs`, `delete_docs`, `get_relevant_docs`, and `get_relevant_qa_documents`. Source: [pandasai/vectorstores/vectorstore.py:1-90](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py). ## Common Failure Modes and Limitations A few caveats from the v3.0.0 release and community discussions are worth noting during setup: - **Python version pin.** Running on Python 3.12 will fail to install `scipy==1.10.1`. Use 3.8–3.11 or wait for upstream support. Source: [README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/README.md), community issue [#1872](https://github.com/sinaptik-ai/pandas-ai/issues/1872). - **Code execution is not sandboxed by default.** The default code executor runs LLM-generated code with full builtins available, so untrusted model output can lead to arbitrary code execution. Community issues [#1893](https://github.com/sinaptik-ai/pandas-ai/issues/1893) and [#1895](https://github.com/sinaptik-ai/pandas-ai/issues/1895) describe this risk; production deployments should add a sandbox or restrict input sources. - **Pillow CVE.** Older transitive dependencies on `pillow ^10.1.0` carry an out-of-bounds CVE; community issue [#1871](https://github.com/sinaptik-ai/pandas-ai/issues/1871) requests upgrading to 12.1.1. - **Reasoning models.** GPT-5 and other reasoning models are not yet supported out of the box. See community issue [#1867](https://github.com/sinaptik-ai/pandas-ai/issues/1867). - **Local models.** Local LLM support (Ollama, LM Studio, Open WebUI) is requested frequently in community issues [#187](https://github.com/sinaptik-ai/pandas-ai/issues/187), [#799](https://github.com/sinaptik-ai/pandas-ai/issues/799), [#1181](https://github.com/sinaptik-ai/pandas-ai/issues/1181), and [#1888](https://github.com/sinaptik-ai/pandas-ai/issues/1888). ## See Also - LLM base class and extension model: `pandasai/llm/base.py`, `extensions/llms/openai/pandasai_openai/` - Prompt system: `pandasai/core/prompts/base.py`, `pandasai/core/prompts/generate_system_message.py` - Vector store abstractions and EE backends: `pandasai/vectorstores/vectorstore.py`, `extensions/ee/vectorstores/` - Contributing and dev setup: `CONTRIBUTING.md` --- ## Code Execution, Sandbox, and Security Model ### Related Pages Related topics: [Overview, Installation, and Quickstart](#page-1), [LLM Backends, Local Models, and Extension Ecosystem](#page-3)

Related Source Files

The following source files were used to generate this page: - [pandasai/sandbox/sandbox.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/sandbox/sandbox.py) - [pandasai/sandbox/__init__.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/sandbox/__init__.py) - [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) - [pandasai/core/prompts/generate_system_message.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) - [pandasai/core/prompts/generate_python_code_with_sql.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_python_code_with_sql.py) - [pandasai/core/prompts/correct_output_type_error_prompt.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/correct_output_type_error_prompt.py) - [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) - [extensions/llms/openai/pandasai_openai/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py) - [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py)

# Code Execution, Sandbox, and Security Model ## Overview PandasAI converts natural-language questions into Python (and sometimes SQL) code via a language model, then executes that code against the user's data. The path that LLM-generated code travels — from prompt construction, to LLM call, to code polishing, to `exec` — is therefore a security boundary. The repository exposes a `Sandbox` abstraction as the designated extension point for isolating execution, but the default code path runs generated code in-process with no sandbox attached. The community has flagged this surface as a significant risk vector (see issues #1893 and #1895), and understanding where isolation is — and is not — applied is essential for any production deployment. ## The Sandbox Abstraction The sandbox contract is defined in [pandasai/sandbox/sandbox.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/sandbox/sandbox.py) and re-exported from [pandasai/sandbox/__init__.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/sandbox/__init__.py). The base class declares four abstract methods that concrete implementations must provide: | Method | Purpose | |---|---| | `start()` | Boot the isolated runtime (e.g. container, microVM) | | `stop()` | Tear the runtime down | | `execute(code, environment)` | Run generated `code` inside the sandbox with the supplied `environment` namespace | | `transfer_file(csv_data, filename)` | Move a CSV payload into the sandbox | | `_exec_code(code, environment)` | Internal worker that performs the actual execution | `execute()` lazily calls `start()` on first use, then delegates to `_exec_code()`. The base class also defines `_extract_sql_queries_from_code()`, a small `ast.NodeVisitor` that walks generated Python source looking for `SELECT`/`WITH` query string assignments and call arguments — useful for routing SQL fragments to a query engine rather than executing them via Python `eval`. Because every method other than `execute()` raises `NotImplementedError`, any subclass must implement the full lifecycle, and a missing sandbox means the agent's executor falls back to a non-isolated path. ## The Code Generation and Prompt Pipeline Generated code is shaped by the prompt templates before it ever reaches an executor. [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) defines `BasePrompt`, which renders either an inline `template` string or a file-loaded Jinja2 template from a `templates/` sibling directory, collapses runs of three or more newlines, and caches the resolved string in `_resolved_prompt`. Subclasses specialize the rendering surface: - [pandasai/core/prompts/generate_system_message.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) → `generate_system_message.tmpl` - [pandasai/core/prompts/generate_python_code_with_sql.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_python_code_with_sql.py) → `generate_python_code_with_sql.tmpl` - [pandasai/core/prompts/correct_output_type_error_prompt.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/correct_output_type_error_prompt.py) → `correct_output_type_error_prompt.tmpl`, which injects the failing `code`, the `error_trace`, the conversation memory, and the `agent_description` system prompt into a self-correction request. Once the LLM responds, [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) runs the raw response through `_polish_code()`, which strips leading `python`/`py` markers, removes surrounding backtick fences, and trims non-code preamble. The polished string is what the executor ultimately receives. ```mermaid flowchart LR A[BasePrompt render] --> B[LLM call] B --> C[Raw response] C --> D["_polish_code()"] D --> E{Sandbox configured?} E -- yes --> F["Sandbox.execute()"] E -- no --> G["In-process exec"] F --> H[Result] G --> H[Result] ``` ## LLM Integration and the Security Boundary The LLM transport layer is itself relevant to the threat model because the prompt — and therefore any data, schema, or instruction the model sees — is fully under the caller's control until it leaves the boundary. [extensions/llms/openai/pandasai_openai/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py) shows the two transport paths: `completion()` prepends a system prompt to a raw string and hits the legacy completions endpoint, while `chat_completion()` builds an OpenAI-style message list from `Memory.to_openai_messages()` and calls the chat endpoint. [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py) fixes the default model to `gpt-4.1-mini`, supports a broad set of `gpt-4.1*` chat models, and reads `OPENAI_API_KEY`, `OPENAI_API_BASE`, and `OPENAI_PROXY` from the environment. Critically, none of these layers sanitize the LLM's output before it is executed — `_polish_code()` only normalizes formatting. ## Security Implications and Community Concerns Because the default executor is a plain `exec` with no sandbox and no `__builtins__` restriction, the system trusts the LLM's output completely. Two community issues document the resulting exposure: - **Issue #1895** — "Default code executor runs LLM-generated code with full builtins (no sandbox by default) → RCE via indirect prompt injection." The reporter notes that the default namespace exposes `pd`, `plt`, and `np` and leaves `__builtins__` unrestricted, so any prompt-injection payload that reaches the model can return arbitrary Python that runs in the host process. - **Issue #1893** — "Code Injection in `CodeExecutor.execute` Allows Arbitrary Code Execution via LLM-Generated Code" in `pandasai 3.0.0`. The same pattern is reported against the v3.0.0 release line, indicating the exposure is not historical. The architectural mitigation present in the codebase is the `Sandbox` class itself: a deployment can substitute a hardened `Sandbox` subclass (e.g. a container-based runner) and wire it into the agent so that `execute()` is invoked with the generated code and a deliberately minimal `environment` dictionary. Until such a sandbox is configured, the practical guidance from the source is that PandasAI should be treated as running LLM-generated code with full local privileges, and untrusted data sources should not be allowed to flow into the prompt without external filtering. ## See Also - [LLM Backends and Prompt Rendering](#) - [Agent State and Memory](#) - [Vector Store Extensions (ChromaDB, Milvus, Pinecone)](#) --- ## LLM Backends, Local Models, and Extension Ecosystem ### Related Pages Related topics: [Overview, Installation, and Quickstart](#page-1), [Code Execution, Sandbox, and Security Model](#page-2), [Agent Lifecycle, Prompts, and Semantic Layer](#page-4)

Related Source Files

The following source files were used to generate this page: - [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) - [pandasai/llm/__init__.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/__init__.py) - [pandasai/config.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/config.py) - [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) - [pandasai/core/prompts/generate_system_message.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) - [pandasai/vectorstores/vectorstore.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py) - [extensions/llms/litellm/pandasai_litellm/litellm.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/litellm/pandasai_litellm/litellm.py) - [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py) - [extensions/llms/openai/pandasai_openai/azure_openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/azure_openai.py) - [extensions/llms/openai/pandasai_openai/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py) - [extensions/ee/vectorstores/pinecone/pandasai_pinecone/pinecone.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/pinecone/pandasai_pinecone/pinecone.py) - [extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py) - [extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py) - [extensions/ee/vectorstores/pinecone/README.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/pinecone/README.md)

# LLM Backends, Local Models, and Extension Ecosystem ## Overview PandasAI ships with a pluggable LLM abstraction so that the same conversational dataframe interface can be driven by hosted providers, local inference servers, or enterprise vector stores. The `LLM` base class defines the contract every backend must implement, while individual backends live in optional `extensions/` packages that can be installed independently. This design lets users switch providers without modifying their analytics code. The base interface is intentionally minimal: an LLM must expose a `type` property, a `call(instruction, context)` method, and code-polishing helpers. The full set of `pandasai/llm/__init__.py` re-exports only the `LLM` symbol, indicating that the package is a framework for subclasses rather than a list of preconfigured clients. Source: [pandasai/llm/__init__.py:1-4](). Source: [pandasai/llm/base.py:1-15](). ## Built-in LLM Base Class The `LLM` class in [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) provides the contract that every backend must satisfy. The constructor stores an optional `api_key` and additional keyword arguments, while the `is_pandasai_llm()` method returns `True` so the agent loop can recognize first-party backends. The `type` property raises `APIKeyNotFoundError` if a subclass does not override it, enforcing that each backend declares an identifier. Source: [pandasai/llm/base.py:25-65](). The `_polish_code` helper strips Markdown code fences, leading language tags, and stray backticks so the LLM-generated snippet can be fed directly to the code executor. The `call()` method is declared `abstractmethod`, requiring every concrete backend to translate a `BasePrompt` instruction into a string response. Source: [pandasai/llm/base.py:67-110](). Prompts themselves are Jinja2 templates rendered through the `BasePrompt` class in [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py), which supports both inline `template` strings and external `template_path` files such as `generate_system_message.tmpl`. Source: [pandasai/core/prompts/base.py:1-65](). Source: [pandasai/core/prompts/generate_system_message.py:1-7](). ## Extension Ecosystem PandasAI organizes optional backends under a top-level `extensions/` directory, split into two tiers: | Tier | Location | License | Examples | |------|----------|---------|----------| | LLM backends | `extensions/llms//pandasai_/` | Open source | `openai`, `litellm` | | Enterprise extensions | `extensions/ee//pandasai_/` | Sinaptik GmbH Enterprise | `pinecone`, `milvus`, `chromadb` | ### LLM Backend Extensions The **OpenAI** extension in [extensions/llms/openai/pandasai_openai/openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py) declares a `model` default of `gpt-4.1-mini` and lists supported chat models including `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, and their dated snapshots. The constructor resolves the API token from the `OPENAI_API_KEY` environment variable, raises `APIKeyNotFoundError` when missing, and supports a custom `api_base` for OpenAI-compatible proxies. Source: [extensions/llms/openai/pandasai_openai/openai.py:1-60](). The **Azure OpenAI** backend in [extensions/llms/openai/pandasai_openai/azure_openai.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/azure_openai.py) extends the OpenAI client with `azure_endpoint`, `api_version`, and `deployment_name` parameters, and validates each one with explicit `APIKeyNotFoundError` and `MissingModelError` exceptions. The shared [base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py) supplies `completion()` and `chat_completion()` helpers, default sampling parameters (`temperature=0`, `max_tokens=1000`, `presence_penalty=0.6`), and an `http_client` hook for custom transport configuration. Source: [extensions/llms/openai/pandasai_openai/base.py:1-80]()). Source: [extensions/llms/openai/pandasai_openai/azure_openai.py:1-70](). The **LiteLLM** wrapper in [extensions/llms/litellm/pandasai_litellm/litellm.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/litellm/pandasai_litellm/litellm.py) is the community-favoured universal adapter. It accepts a `model` string plus arbitrary `**kwargs` that LiteLLM forwards to the underlying provider, and overrides `call()` to call `litellm.completion` directly. The README example shows the recommended usage: ```python from pandasai_litellm.litellm import LiteLLM llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY") pai.config.set({"llm": llm}) ``` Source: [extensions/llms/litellm/pandasai_litellm/litellm.py:1-50](). Source: [README.md:1-30](). ### Vector Store Extensions PandasAI extends its semantic-layer caching with vector-store adapters under `extensions/ee/vectorstores/`. The abstract base in [pandasai/vectorstores/vectorstore.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py) defines the contract: `add_docs`, `update_docs`, `delete_question_and_answers`, `get_relevant_docs`, and `get_relevant_qa_documents`. Each concrete backend must implement these methods or inherit the `NotImplementedError` defaults. Source: [pandasai/vectorstores/vectorstore.py:1-80](). | Vendor | Package | Notable Method | |--------|---------|----------------| | Pinecone | `pandasai-pinecone` | `_filter_docs_based_on_distance` cosine threshold | | Milvus | `pandasai-milvus` | `_initiate_docs_collection` with `COSINE` index params | | ChromaDB | `pandasai-chromadb` | `_filter_docs_based_on_distance` over `QueryResult` | All three Enterprise extensions are licensed under the Sinaptik GmbH Enterprise License, as stated in the [Pinecone README](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/pinecone/README.md). Source: [extensions/ee/vectorstores/pinecone/README.md:1-20](). Source: [extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py:1-60](). Source: [extensions/ee/vectorstores/pinecone/pandasai_pinecone/pinecone.py:1-30](). Source: [extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py:1-20](). ## Local Model Support and Community Demand A large share of community engagement is driven by requests for self-hosted inference. Issue [#187](https://github.com/sinaptik-ai/pandas-ai/issues/187) (38 comments) calls for StarCoder/MPT support, [#799](https://github.com/sinaptik-ai/pandas-ai/issues/799) (15 comments) requests LM Studio, and [#1181](https://github.com/sinaptik-ai/pandas-ai/issues/1181) requests Open WebUI compatibility. The historical `LocalLLM` import path (`from pandasai.llm.local_llm import LocalLLM`) raised `ModuleNotFoundError` in v3.0.0 as reported in issue [#1888](https://github.com/sinaptik-ai/pandas-ai/issues/1888). The recommended pattern for local models is therefore to point an OpenAI-compatible backend (LiteLLM or the base OpenAI extension) at a local server, or to use LiteLLM's broad provider coverage. The `pai.config.set({"llm": llm})` call in [pandasai/config.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/config.py) is the single integration point regardless of which backend is chosen. Source: [pandasai/config.py:1-30](). ## Configuration and Operational Notes Two recurring operational themes appear in community discussions. First, Python 3.12 compatibility is blocked by an upper-bound dependency on `scipy==1.10.1` and is tracked in issues [#1850](https://github.com/sinaptik-ai/pandas-ai/issues/1850) and [#1787](https://github.com/sinaptik-ai/pandas-ai/issues/1787). Second, the default code executor in v3.0.0 invokes LLM-generated code via `exec` with `__builtins__` exposed, which has been flagged as a remote-code-execution risk in issues [#1893](https://github.com/sinaptik-ai/pandas-ai/issues/1893) and [#1895](https://github.com/sinaptik-ai/pandas-ai/issues/1895); users handling untrusted data should sandbox the executor or restrict `__builtins__` explicitly. ```mermaid flowchart LR User[User Prompt] --> Agent[Agent / SmartDataframe] Agent --> Config[pai.config] Config --> LLM[LLM Backend] LLM -->|completion| Provider[(OpenAI / Azure / LiteLLM / Local)] Provider --> Code[Generated Code] Code --> Executor[CodeExecutor] Executor -->|result| Agent Agent --> Vector[(Vector Store\nPinecone / Milvus / ChromaDB)] Agent --> Response[Answer + Chart] ``` ## See Also - SmartDataframe and Agent architecture - Prompt templates and code generation pipeline - Code sandboxing and security best practices - Contributing guide and pre-commit setup ([CONTRIBUTING.md](https://github.com/sinaptik-ai/pandas-ai/blob/main/CONTRIBUTING.md)) --- ## Agent Lifecycle, Prompts, and Semantic Layer ### Related Pages Related topics: [Overview, Installation, and Quickstart](#page-1), [Code Execution, Sandbox, and Security Model](#page-2), [LLM Backends, Local Models, and Extension Ecosystem](#page-3)

Related Source Files

The following source files were used to generate this page: - [pandasai/agent/__init__.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/agent/__init__.py) - [pandasai/llm/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) - [pandasai/core/prompts/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) - [pandasai/core/prompts/generate_system_message.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) - [pandasai/core/prompts/correct_output_type_error_prompt.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/correct_output_type_error_prompt.py) - [pandasai/vectorstores/vectorstore.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py) - [extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py) - [extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py) - [extensions/llms/openai/pandasai_openai/base.py](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py)

# Agent Lifecycle, Prompts, and Semantic Layer PandasAI exposes a thin conversational layer on top of pandas through an `Agent` class that orchestrates prompt construction, LLM invocation, code execution, and (optionally) a semantic layer for retrieval-augmented few-shot prompting. This page describes how an Agent is instantiated, how prompts are rendered and sent to the LLM, and how the semantic-layer vector stores plug in to supply prior question/answer context. ## 1. Agent Entry Point and Lifecycle The public Agent surface is intentionally narrow. `pandasai/agent/__init__.py` re-exports a single symbol: ```python from .base import Agent __all__ = ["Agent"] ``` Source: [pandasai/agent/__init__.py:1-3](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/agent/__init__.py) Internally, the `Agent` keeps an `AgentState` that holds datasets, memory of past turns, the most recent generated code, and the configured LLM. Each call to `chat()` is expected to be a "clean start" turn, but community reports (#1855) document that `agent.chat()` sometimes fails to fully reset `last_code_generated`, so residual state can leak into the next prompt. When this happens, the LLM sees stale code mixed with the new user question and may produce an answer that depends on identifiers from the previous turn. The LLM contract is defined abstractly in `pandasai/llm/base.py`. Every concrete LLM (OpenAI, Azure, LiteLLM, etc.) must implement: - `call(instruction, context)` – execute the prompt against the model. - `type` – a string identifier (e.g. `"openai"`, `"litellm"`, `"azure-openai"`). - `generate_code(instruction, context)` – wraps `call` and extracts a runnable Python code block via `_extract_code` / `_polish_code`. Source: [pandasai/llm/base.py:96-115](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/llm/base.py) The base class also exposes helpers that the Agent uses to assemble a turn: `prepend_system_prompt(prompt, memory)` and `get_messages(memory)`, which read the conversation history from the `Memory` object before the request is dispatched. ## 2. Prompt Construction Pipeline All prompts in pandas-ai inherit from `BasePrompt` defined in `pandasai/core/prompts/base.py`. A prompt is either an inline Jinja2 string (`template`) or a file loaded from the `templates/` directory next to the module (`template_path`). The class resolves the template at construction time and caches the rendered output in `_resolved_prompt`, exposed via `to_string()` / `__str__`. A `to_json()` hook lets structured prompts (e.g. the SQL prompt or the error-correction prompt) serialise themselves for chat-style APIs. Source: [pandasai/core/prompts/base.py:13-58](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/base.py) The system message used to instruct the model that it must answer with Python code is built by `GenerateSystemMessagePrompt`, which simply loads `generate_system_message.tmpl`. This template is rendered with the agent description, the conversation memory, and any custom instructions. Source: [pandasai/core/prompts/generate_system_message.py:1-6](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/generate_system_message.py) A specialised structured prompt, `CorrectOutputTypeErrorPrompt`, is rendered into JSON and serialised the conversation, datasets, system prompt, the failing code, the exception trace, and the expected `output_type` whenever the executor returns the wrong type. The LLM is then asked to produce a corrected snippet. Source: [pandasai/core/prompts/correct_output_type_error_prompt.py:1-28](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/core/prompts/correct_output_type_error_prompt.py) The diagram below summarises the lifecycle from user input to executed code: ```mermaid flowchart LR A[User query] --> B[Agent.chat] B --> C[Build system prompt
GenerateSystemMessagePrompt] C --> D[Render instruction prompt
BasePrompt.to_string] B --> M[Query semantic layer
VectorStore.get_relevant_qa_documents] M --> D D --> E[LLM.call] E --> F[extract_code / polish_code] F --> G[CodeExecutor.execute] G -->|type error| H[CorrectOutputTypeErrorPrompt] H --> E G --> I[Result] ``` A known bug in this pipeline (#1853) is that `agent.description` is extracted on the Python side but never reaches the LLM because the corresponding Jinja template omits the `System Prompt` placeholder. A second bug (#1856) causes the SQL variant (`generate_python_code_with_sql.tmpl`) to skip the conversation-history block, so multi-turn SQL agents lose context. ## 3. LLM Backends and the Semantic Layer PandasAI ships several concrete LLM implementations. The OpenAI family (`extensions/llms/openai/pandasai_openai/`) shares `BaseOpenAI`, which sets defaults for `temperature`, `max_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`, and supports an injectable `http_client` and proxy. `OpenAI` validates the API key and the model name against a whitelist that includes `gpt-4.1-mini`, `gpt-4.1-mini-2025-04-14`, and `gpt-3.5-turbo-instruct`. Source: [extensions/llms/openai/pandasai_openai/base.py:18-43](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/base.py), [extensions/llms/openai/pandasai_openai/openai.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/llms/openai/pandasai_openai/openai.py) The semantic layer is built on top of an abstract `VectorStore` (`pandasai/vectorstores/vectorstore.py`) that defines a contract with two collections – documents and question/answer pairs – and a uniform set of methods: | Method | Purpose | | --- | --- | | `add_docs` / `update_docs` | Insert or update free-form documents | | `add_question_answer` / `update_question_answer` | Insert or update few-shot Q/A examples | | `get_relevant_docs(question, k)` | Retrieve similar documents for a query | | `get_relevant_question_answers(question, k)` | Retrieve similar prior Q/A pairs | | `delete_docs` / `delete_question_and_answers` | Remove entries by ID | Source: [pandasai/vectorstores/vectorstore.py:1-90](https://github.com/sinaptik-ai/pandas-ai/blob/main/pandasai/vectorstores/vectorstore.py) Concrete backends implement this contract. `ChromaDBVectorStore` queries two collections and post-filters by a similarity threshold; `MilvusVectorStore` creates explicit schemas with `VARCHAR` IDs and `FLOAT_VECTOR` embeddings indexed by `COSINE` distance; `LanceDB` and `Pinecone` follow the same dual-collection pattern. Source: [extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py), [extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py:1-40](https://github.com/sinaptik-ai/pandas-ai/blob/main/extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py) Community issue #1874 reports that the `limit` attribute on `SemanticLayerSchema` appears to have no effect on how many rows are included in the prompt; this is consistent with the observation that schema-level controls are not always wired through the rendering pipeline. ## 4. Known Failure Modes Several recurring failure modes surface from the community and are reflected in the code: - **Unsafe code execution (#1893, #1895)** – the default executor runs LLM-generated code with full `__builtins__`, exposing the host to RCE through indirect prompt injection. Mitigations must be applied at the executor level, not via prompts. - **Dead system-prompt placeholder (#1853)** – `agent.description` is dropped before the LLM call. - **Missing conversation context in SQL prompt (#1856)** – the SQL template does not render prior turns. - **Stale state on `chat()` (#1855)** – `last_code_generated` is not cleared, contaminating the next prompt. - **Python version constraints (#1850, #1872)** – the package pins `<3.12` because of `scipy==1.10.1`. - **Local-model ergonomics (#187, #799, #1181, #1888)** – repeated requests for first-class Ollama, LM Studio, and Open WebUI support; these depend on the `LocalLLM` shim and a working `BaseOpenAI`-style client. ## See Also - SmartDataframe and SmartDatalake - Vector store extensions (ChromaDB, Milvus, LanceDB, Pinecone) - LLM backends (OpenAI, Azure OpenAI, LiteLLM, Bedrock) --- --- ## Pitfall Log Project: sinaptik-ai/pandas-ai Summary: Found 19 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification. ## 1. Installation risk - Installation risk requires verification - Severity: high - Evidence strength: source_linked - Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1872 ## 2. Runtime risk - Runtime risk requires verification - Severity: high - Evidence strength: source_linked - Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1896 ## 3. Installation risk - Installation risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1868 ## 4. Installation risk - Installation risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1853 ## 5. Configuration risk - Configuration risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1856 ## 6. Capability evidence risk - Capability evidence risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: README/documentation is current enough for a first validation pass. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: capability.assumptions | https://github.com/sinaptik-ai/pandas-ai ## 7. Runtime risk - Runtime risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1888 ## 8. Runtime risk - Runtime risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: packet_text.keyword_scan | https://github.com/sinaptik-ai/pandas-ai ## 9. Maintenance risk - Maintenance risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1874 ## 10. Maintenance risk - Maintenance risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1855 ## 11. Maintenance risk - Maintenance risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: evidence.maintainer_signals | https://github.com/sinaptik-ai/pandas-ai ## 12. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: no_demo - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: downstream_validation.risk_items | https://github.com/sinaptik-ai/pandas-ai ## 13. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: no_demo - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: risks.scoring_risks | https://github.com/sinaptik-ai/pandas-ai ## 14. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1893 ## 15. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1895 ## 16. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1887 ## 17. Security or permission risk - Security or permission risk requires verification - Severity: medium - Evidence strength: source_linked - Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow. - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1871 ## 18. Maintenance risk - Maintenance risk requires verification - Severity: low - Evidence strength: source_linked - Finding: issue_or_pr_quality=unknown。 - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: evidence.maintainer_signals | https://github.com/sinaptik-ai/pandas-ai ## 19. Maintenance risk - Maintenance risk requires verification - Severity: low - Evidence strength: source_linked - Finding: release_recency=unknown。 - User impact: May increase setup, validation, or first-run risk for the user. - Evidence: evidence.maintainer_signals | https://github.com/sinaptik-ai/pandas-ai