Doramagic Project Pack · Human Manual
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
Overview, Installation, and Quickstart
Related topics: Code Execution, Sandbox, and Security Model, LLM Backends, Local Models, and Extension Ecosystem
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Code Execution, Sandbox, and Security Model, LLM Backends, Local Models, and Extension Ecosystem
Overview, Installation, and Quickstart
What is PandasAI
PandasAI is a Python library that lets users ask questions about their data in natural language. It targets two audiences: non-technical users who want to query datasets conversationally, and technical users who want to accelerate exploratory data analysis. The library is distributed as the core pandasai package on PyPI, with separate extension packages for additional LLM providers and vector stores. Source: README.md.
At a high level, the library works by sending the user's question together with a serialized representation of the dataframe to a Large Language Model (LLM), receiving generated Python code in response, and executing that code to produce a result. This flow is reflected in the prompt architecture: a BasePrompt renders Jinja2 templates, which are passed to an LLM that extends pandasai.llm.base.LLM. Source: pandasai/core/prompts/base.py:1-45, pandasai/llm/base.py:1-40.
The system message prompt class, for example, loads its template from disk through BasePrompt.template_path, which is how instructions are injected into the LLM context. Source: pandasai/core/prompts/generate_system_message.py:1-5.
Installation
PandasAI requires Python 3.8+ up to 3.11 at the time of v3.0.0. This constraint is enforced through a dependency on scipy==1.10.1, which itself caps Python at <3.12. Community requests to support Python 3.12 are tracked in issues #1850, #1787, and #1872. Source: README.md.
Install the core library and an LLM provider extension with either pip or poetry:
# pip
pip install pandasai
pip install pandasai-litellm
# poetry
poetry add pandasai
poetry add pandasai-litellm
The pandasai-litellm extension is a common choice because it routes requests through LiteLLM, supporting many providers with a single interface. Source: README.md.
Development install
Contributors are instructed to use Poetry (not pip or conda) and to install all extras plus dev dependencies:
poetry install --all-extras --with dev
pre-commit install
The project uses ruff for linting and pytest for tests. Source: CONTRIBUTING.md.
Optional extensions
| Extension | Purpose | Source |
|---|---|---|
pandasai-litellm | Multi-provider LLM routing | README.md |
pandasai-openai | Native OpenAI / Azure OpenAI | extensions/llms/openai/pandasai_openai/openai.py |
pandasai-chromadb | ChromaDB vector store (EE) | extensions/ee/vectorstores/chromadb/ |
pandasai-pinecone | Pinecone vector store (EE) | extensions/ee/vectorstores/pinecone/ |
pandasai-milvus | Milvus vector store (EE) | extensions/ee/vectorstores/milvus/ |
pandasai-qdrant | Qdrant vector store (EE) | extensions/ee/vectorstores/qdrant/ |
Vector-store extensions fall under the Sinaptik GmbH Enterprise License and are intended for commercial use under that license. Source: extensions/ee/vectorstores/pinecone/README.md, extensions/ee/vectorstores/qdrant/README.md.
Quickstart
The minimal end-to-end example uses pandasai together with the LiteLLM extension. Source: README.md.
import pandasai as pai
from pandasai_litellm.litellm import LiteLLM
# 1. Configure the LLM
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")
pai.config.set("llm", llm)
# 2. Load a dataframe
df = pai.read_csv("employees.csv")
# 3. Ask a question
print(df.chat("Which employee has the highest salary?"))
Each .chat() call serializes the dataframe, builds prompts, calls the LLM, and executes the returned code. The default OpenAI extension supports a wide range of chat and completion model IDs, including gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, and gpt-4o-mini. Source: extensions/llms/openai/pandasai_openai/openai.py:1-40.
The BaseOpenAI class exposes standard inference parameters such as temperature, max_tokens, top_p, frequency_penalty, presence_penalty, and seed, all of which are forwarded to the underlying OpenAI client. Source: extensions/llms/openai/pandasai_openai/base.py:1-40.
Architecture and Data Flow
The following diagram summarizes the request path for a single chat() call:
flowchart LR
A[User question] --> B[SmartDataframe / Agent]
B --> C[Prompt Builder<br/>BasePrompt + Jinja2]
C --> D[LLM<br/>LLM subclass]
D --> E[Generated Python code]
E --> F[Code Executor]
F --> G[Result]
B -.optional.-> H[Vector Store<br/>SemanticLayer / Memory]
D <-. context .-> H- The prompt builder uses
BasePrompt, which supports both inlinetemplatestrings andtemplate_pathfiles loaded via Jinja2'sFileSystemLoader. Source: pandasai/core/prompts/base.py:1-45. - The LLM is any subclass of
pandasai.llm.base.LLM. The base class exposesis_pandasai_llm,type, and a_polish_codehelper that strips leadingpythonmarkers and stray backticks from generated snippets. Source: pandasai/llm/base.py:1-40. - The vector store is an optional component used to retrieve relevant documents or past question/answer pairs.
VectorStoredefines abstract methods such asadd_docs,update_docs,delete_docs,get_relevant_docs, andget_relevant_qa_documents. Source: pandasai/vectorstores/vectorstore.py:1-90.
Common Failure Modes and Limitations
A few caveats from the v3.0.0 release and community discussions are worth noting during setup:
- Python version pin. Running on Python 3.12 will fail to install
scipy==1.10.1. Use 3.8–3.11 or wait for upstream support. Source: README.md, community issue #1872. - Code execution is not sandboxed by default. The default code executor runs LLM-generated code with full builtins available, so untrusted model output can lead to arbitrary code execution. Community issues #1893 and #1895 describe this risk; production deployments should add a sandbox or restrict input sources.
- Pillow CVE. Older transitive dependencies on
pillow ^10.1.0carry an out-of-bounds CVE; community issue #1871 requests upgrading to 12.1.1. - Reasoning models. GPT-5 and other reasoning models are not yet supported out of the box. See community issue #1867.
- Local models. Local LLM support (Ollama, LM Studio, Open WebUI) is requested frequently in community issues #187, #799, #1181, and #1888.
See Also
- LLM base class and extension model:
pandasai/llm/base.py,extensions/llms/openai/pandasai_openai/ - Prompt system:
pandasai/core/prompts/base.py,pandasai/core/prompts/generate_system_message.py - Vector store abstractions and EE backends:
pandasai/vectorstores/vectorstore.py,extensions/ee/vectorstores/ - Contributing and dev setup:
CONTRIBUTING.md
Source: https://github.com/sinaptik-ai/pandas-ai / Human Manual
Code Execution, Sandbox, and Security Model
Related topics: Overview, Installation, and Quickstart, LLM Backends, Local Models, and Extension Ecosystem
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation, and Quickstart, LLM Backends, Local Models, and Extension Ecosystem
Code Execution, Sandbox, and Security Model
Overview
PandasAI converts natural-language questions into Python (and sometimes SQL) code via a language model, then executes that code against the user's data. The path that LLM-generated code travels — from prompt construction, to LLM call, to code polishing, to exec — is therefore a security boundary. The repository exposes a Sandbox abstraction as the designated extension point for isolating execution, but the default code path runs generated code in-process with no sandbox attached. The community has flagged this surface as a significant risk vector (see issues #1893 and #1895), and understanding where isolation is — and is not — applied is essential for any production deployment.
The Sandbox Abstraction
The sandbox contract is defined in pandasai/sandbox/sandbox.py and re-exported from pandasai/sandbox/__init__.py. The base class declares four abstract methods that concrete implementations must provide:
| Method | Purpose |
|---|---|
start() | Boot the isolated runtime (e.g. container, microVM) |
stop() | Tear the runtime down |
execute(code, environment) | Run generated code inside the sandbox with the supplied environment namespace |
transfer_file(csv_data, filename) | Move a CSV payload into the sandbox |
_exec_code(code, environment) | Internal worker that performs the actual execution |
execute() lazily calls start() on first use, then delegates to _exec_code(). The base class also defines _extract_sql_queries_from_code(), a small ast.NodeVisitor that walks generated Python source looking for SELECT/WITH query string assignments and call arguments — useful for routing SQL fragments to a query engine rather than executing them via Python eval. Because every method other than execute() raises NotImplementedError, any subclass must implement the full lifecycle, and a missing sandbox means the agent's executor falls back to a non-isolated path.
The Code Generation and Prompt Pipeline
Generated code is shaped by the prompt templates before it ever reaches an executor. pandasai/core/prompts/base.py defines BasePrompt, which renders either an inline template string or a file-loaded Jinja2 template from a templates/ sibling directory, collapses runs of three or more newlines, and caches the resolved string in _resolved_prompt. Subclasses specialize the rendering surface:
- pandasai/core/prompts/generate_system_message.py →
generate_system_message.tmpl - pandasai/core/prompts/generate_python_code_with_sql.py →
generate_python_code_with_sql.tmpl - pandasai/core/prompts/correct_output_type_error_prompt.py →
correct_output_type_error_prompt.tmpl, which injects the failingcode, theerror_trace, the conversation memory, and theagent_descriptionsystem prompt into a self-correction request.
Once the LLM responds, pandasai/llm/base.py runs the raw response through _polish_code(), which strips leading python/py markers, removes surrounding backtick fences, and trims non-code preamble. The polished string is what the executor ultimately receives.
flowchart LR
A[BasePrompt render] --> B[LLM call]
B --> C[Raw response]
C --> D["_polish_code()"]
D --> E{Sandbox configured?}
E -- yes --> F["Sandbox.execute()"]
E -- no --> G["In-process exec"]
F --> H[Result]
G --> H[Result]LLM Integration and the Security Boundary
The LLM transport layer is itself relevant to the threat model because the prompt — and therefore any data, schema, or instruction the model sees — is fully under the caller's control until it leaves the boundary. extensions/llms/openai/pandasai_openai/base.py shows the two transport paths: completion() prepends a system prompt to a raw string and hits the legacy completions endpoint, while chat_completion() builds an OpenAI-style message list from Memory.to_openai_messages() and calls the chat endpoint. extensions/llms/openai/pandasai_openai/openai.py fixes the default model to gpt-4.1-mini, supports a broad set of gpt-4.1* chat models, and reads OPENAI_API_KEY, OPENAI_API_BASE, and OPENAI_PROXY from the environment. Critically, none of these layers sanitize the LLM's output before it is executed — _polish_code() only normalizes formatting.
Security Implications and Community Concerns
Because the default executor is a plain exec with no sandbox and no __builtins__ restriction, the system trusts the LLM's output completely. Two community issues document the resulting exposure:
- Issue #1895 — "Default code executor runs LLM-generated code with full builtins (no sandbox by default) → RCE via indirect prompt injection." The reporter notes that the default namespace exposes
pd,plt, andnpand leaves__builtins__unrestricted, so any prompt-injection payload that reaches the model can return arbitrary Python that runs in the host process. - Issue #1893 — "Code Injection in
CodeExecutor.executeAllows Arbitrary Code Execution via LLM-Generated Code" inpandasai 3.0.0. The same pattern is reported against the v3.0.0 release line, indicating the exposure is not historical.
The architectural mitigation present in the codebase is the Sandbox class itself: a deployment can substitute a hardened Sandbox subclass (e.g. a container-based runner) and wire it into the agent so that execute() is invoked with the generated code and a deliberately minimal environment dictionary. Until such a sandbox is configured, the practical guidance from the source is that PandasAI should be treated as running LLM-generated code with full local privileges, and untrusted data sources should not be allowed to flow into the prompt without external filtering.
See Also
Source: https://github.com/sinaptik-ai/pandas-ai / Human Manual
LLM Backends, Local Models, and Extension Ecosystem
Related topics: Overview, Installation, and Quickstart, Code Execution, Sandbox, and Security Model, Agent Lifecycle, Prompts, and Semantic Layer
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation, and Quickstart, Code Execution, Sandbox, and Security Model, Agent Lifecycle, Prompts, and Semantic Layer
LLM Backends, Local Models, and Extension Ecosystem
Overview
PandasAI ships with a pluggable LLM abstraction so that the same conversational dataframe interface can be driven by hosted providers, local inference servers, or enterprise vector stores. The LLM base class defines the contract every backend must implement, while individual backends live in optional extensions/ packages that can be installed independently. This design lets users switch providers without modifying their analytics code.
The base interface is intentionally minimal: an LLM must expose a type property, a call(instruction, context) method, and code-polishing helpers. The full set of pandasai/llm/__init__.py re-exports only the LLM symbol, indicating that the package is a framework for subclasses rather than a list of preconfigured clients. Source: pandasai/llm/__init__.py:1-4. Source: pandasai/llm/base.py:1-15.
Built-in LLM Base Class
The LLM class in pandasai/llm/base.py provides the contract that every backend must satisfy. The constructor stores an optional api_key and additional keyword arguments, while the is_pandasai_llm() method returns True so the agent loop can recognize first-party backends. The type property raises APIKeyNotFoundError if a subclass does not override it, enforcing that each backend declares an identifier. Source: pandasai/llm/base.py:25-65.
The _polish_code helper strips Markdown code fences, leading language tags, and stray backticks so the LLM-generated snippet can be fed directly to the code executor. The call() method is declared abstractmethod, requiring every concrete backend to translate a BasePrompt instruction into a string response. Source: pandasai/llm/base.py:67-110. Prompts themselves are Jinja2 templates rendered through the BasePrompt class in pandasai/core/prompts/base.py, which supports both inline template strings and external template_path files such as generate_system_message.tmpl. Source: pandasai/core/prompts/base.py:1-65. Source: pandasai/core/prompts/generate_system_message.py:1-7.
Extension Ecosystem
PandasAI organizes optional backends under a top-level extensions/ directory, split into two tiers:
| Tier | Location | License | Examples |
|---|---|---|---|
| LLM backends | extensions/llms/<provider>/pandasai_<provider>/ | Open source | openai, litellm |
| Enterprise extensions | extensions/ee/<category>/pandasai_<vendor>/ | Sinaptik GmbH Enterprise | pinecone, milvus, chromadb |
LLM Backend Extensions
The OpenAI extension in extensions/llms/openai/pandasai_openai/openai.py declares a model default of gpt-4.1-mini and lists supported chat models including gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, and their dated snapshots. The constructor resolves the API token from the OPENAI_API_KEY environment variable, raises APIKeyNotFoundError when missing, and supports a custom api_base for OpenAI-compatible proxies. Source: extensions/llms/openai/pandasai_openai/openai.py:1-60.
The Azure OpenAI backend in extensions/llms/openai/pandasai_openai/azure_openai.py extends the OpenAI client with azure_endpoint, api_version, and deployment_name parameters, and validates each one with explicit APIKeyNotFoundError and MissingModelError exceptions. The shared base.py supplies completion() and chat_completion() helpers, default sampling parameters (temperature=0, max_tokens=1000, presence_penalty=0.6), and an http_client hook for custom transport configuration. Source: extensions/llms/openai/pandasai_openai/base.py:1-80). Source: extensions/llms/openai/pandasai_openai/azure_openai.py:1-70.
The LiteLLM wrapper in extensions/llms/litellm/pandasai_litellm/litellm.py is the community-favoured universal adapter. It accepts a model string plus arbitrary **kwargs that LiteLLM forwards to the underlying provider, and overrides call() to call litellm.completion directly. The README example shows the recommended usage:
from pandasai_litellm.litellm import LiteLLM
llm = LiteLLM(model="gpt-4.1-mini", api_key="YOUR_OPENAI_API_KEY")
pai.config.set({"llm": llm})
Source: extensions/llms/litellm/pandasai_litellm/litellm.py:1-50. Source: README.md:1-30.
Vector Store Extensions
PandasAI extends its semantic-layer caching with vector-store adapters under extensions/ee/vectorstores/. The abstract base in pandasai/vectorstores/vectorstore.py defines the contract: add_docs, update_docs, delete_question_and_answers, get_relevant_docs, and get_relevant_qa_documents. Each concrete backend must implement these methods or inherit the NotImplementedError defaults. Source: pandasai/vectorstores/vectorstore.py:1-80.
| Vendor | Package | Notable Method |
|---|---|---|
| Pinecone | pandasai-pinecone | _filter_docs_based_on_distance cosine threshold |
| Milvus | pandasai-milvus | _initiate_docs_collection with COSINE index params |
| ChromaDB | pandasai-chromadb | _filter_docs_based_on_distance over QueryResult |
All three Enterprise extensions are licensed under the Sinaptik GmbH Enterprise License, as stated in the Pinecone README. Source: extensions/ee/vectorstores/pinecone/README.md:1-20. Source: extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py:1-60. Source: extensions/ee/vectorstores/pinecone/pandasai_pinecone/pinecone.py:1-30. Source: extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py:1-20.
Local Model Support and Community Demand
A large share of community engagement is driven by requests for self-hosted inference. Issue #187 (38 comments) calls for StarCoder/MPT support, #799 (15 comments) requests LM Studio, and #1181 requests Open WebUI compatibility. The historical LocalLLM import path (from pandasai.llm.local_llm import LocalLLM) raised ModuleNotFoundError in v3.0.0 as reported in issue #1888.
The recommended pattern for local models is therefore to point an OpenAI-compatible backend (LiteLLM or the base OpenAI extension) at a local server, or to use LiteLLM's broad provider coverage. The pai.config.set({"llm": llm}) call in pandasai/config.py is the single integration point regardless of which backend is chosen. Source: pandasai/config.py:1-30.
Configuration and Operational Notes
Two recurring operational themes appear in community discussions. First, Python 3.12 compatibility is blocked by an upper-bound dependency on scipy==1.10.1 and is tracked in issues #1850 and #1787. Second, the default code executor in v3.0.0 invokes LLM-generated code via exec with __builtins__ exposed, which has been flagged as a remote-code-execution risk in issues #1893 and #1895; users handling untrusted data should sandbox the executor or restrict __builtins__ explicitly.
flowchart LR User[User Prompt] --> Agent[Agent / SmartDataframe] Agent --> Config[pai.config] Config --> LLM[LLM Backend] LLM -->|completion| Provider[(OpenAI / Azure / LiteLLM / Local)] Provider --> Code[Generated Code] Code --> Executor[CodeExecutor] Executor -->|result| Agent Agent --> Vector[(Vector Store\nPinecone / Milvus / ChromaDB)] Agent --> Response[Answer + Chart]
See Also
- SmartDataframe and Agent architecture
- Prompt templates and code generation pipeline
- Code sandboxing and security best practices
- Contributing guide and pre-commit setup (CONTRIBUTING.md)
Source: https://github.com/sinaptik-ai/pandas-ai / Human Manual
Agent Lifecycle, Prompts, and Semantic Layer
Related topics: Overview, Installation, and Quickstart, Code Execution, Sandbox, and Security Model, LLM Backends, Local Models, and Extension Ecosystem
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation, and Quickstart, Code Execution, Sandbox, and Security Model, LLM Backends, Local Models, and Extension Ecosystem
Agent Lifecycle, Prompts, and Semantic Layer
PandasAI exposes a thin conversational layer on top of pandas through an Agent class that orchestrates prompt construction, LLM invocation, code execution, and (optionally) a semantic layer for retrieval-augmented few-shot prompting. This page describes how an Agent is instantiated, how prompts are rendered and sent to the LLM, and how the semantic-layer vector stores plug in to supply prior question/answer context.
1. Agent Entry Point and Lifecycle
The public Agent surface is intentionally narrow. pandasai/agent/__init__.py re-exports a single symbol:
from .base import Agent
__all__ = ["Agent"]
Source: pandasai/agent/__init__.py:1-3
Internally, the Agent keeps an AgentState that holds datasets, memory of past turns, the most recent generated code, and the configured LLM. Each call to chat() is expected to be a "clean start" turn, but community reports (#1855) document that agent.chat() sometimes fails to fully reset last_code_generated, so residual state can leak into the next prompt. When this happens, the LLM sees stale code mixed with the new user question and may produce an answer that depends on identifiers from the previous turn.
The LLM contract is defined abstractly in pandasai/llm/base.py. Every concrete LLM (OpenAI, Azure, LiteLLM, etc.) must implement:
call(instruction, context)– execute the prompt against the model.type– a string identifier (e.g."openai","litellm","azure-openai").generate_code(instruction, context)– wrapscalland extracts a runnable Python code block via_extract_code/_polish_code.
Source: pandasai/llm/base.py:96-115
The base class also exposes helpers that the Agent uses to assemble a turn: prepend_system_prompt(prompt, memory) and get_messages(memory), which read the conversation history from the Memory object before the request is dispatched.
2. Prompt Construction Pipeline
All prompts in pandas-ai inherit from BasePrompt defined in pandasai/core/prompts/base.py. A prompt is either an inline Jinja2 string (template) or a file loaded from the templates/ directory next to the module (template_path). The class resolves the template at construction time and caches the rendered output in _resolved_prompt, exposed via to_string() / __str__. A to_json() hook lets structured prompts (e.g. the SQL prompt or the error-correction prompt) serialise themselves for chat-style APIs.
Source: pandasai/core/prompts/base.py:13-58
The system message used to instruct the model that it must answer with Python code is built by GenerateSystemMessagePrompt, which simply loads generate_system_message.tmpl. This template is rendered with the agent description, the conversation memory, and any custom instructions.
Source: pandasai/core/prompts/generate_system_message.py:1-6
A specialised structured prompt, CorrectOutputTypeErrorPrompt, is rendered into JSON and serialised the conversation, datasets, system prompt, the failing code, the exception trace, and the expected output_type whenever the executor returns the wrong type. The LLM is then asked to produce a corrected snippet.
Source: pandasai/core/prompts/correct_output_type_error_prompt.py:1-28
The diagram below summarises the lifecycle from user input to executed code:
flowchart LR
A[User query] --> B[Agent.chat]
B --> C[Build system prompt<br/>GenerateSystemMessagePrompt]
C --> D[Render instruction prompt<br/>BasePrompt.to_string]
B --> M[Query semantic layer<br/>VectorStore.get_relevant_qa_documents]
M --> D
D --> E[LLM.call]
E --> F[extract_code / polish_code]
F --> G[CodeExecutor.execute]
G -->|type error| H[CorrectOutputTypeErrorPrompt]
H --> E
G --> I[Result]A known bug in this pipeline (#1853) is that agent.description is extracted on the Python side but never reaches the LLM because the corresponding Jinja template omits the System Prompt placeholder. A second bug (#1856) causes the SQL variant (generate_python_code_with_sql.tmpl) to skip the conversation-history block, so multi-turn SQL agents lose context.
3. LLM Backends and the Semantic Layer
PandasAI ships several concrete LLM implementations. The OpenAI family (extensions/llms/openai/pandasai_openai/) shares BaseOpenAI, which sets defaults for temperature, max_tokens, top_p, frequency_penalty, presence_penalty, and supports an injectable http_client and proxy. OpenAI validates the API key and the model name against a whitelist that includes gpt-4.1-mini, gpt-4.1-mini-2025-04-14, and gpt-3.5-turbo-instruct.
Source: extensions/llms/openai/pandasai_openai/base.py:18-43, extensions/llms/openai/pandasai_openai/openai.py:1-40
The semantic layer is built on top of an abstract VectorStore (pandasai/vectorstores/vectorstore.py) that defines a contract with two collections – documents and question/answer pairs – and a uniform set of methods:
| Method | Purpose |
|---|---|
add_docs / update_docs | Insert or update free-form documents |
add_question_answer / update_question_answer | Insert or update few-shot Q/A examples |
get_relevant_docs(question, k) | Retrieve similar documents for a query |
get_relevant_question_answers(question, k) | Retrieve similar prior Q/A pairs |
delete_docs / delete_question_and_answers | Remove entries by ID |
Source: pandasai/vectorstores/vectorstore.py:1-90
Concrete backends implement this contract. ChromaDBVectorStore queries two collections and post-filters by a similarity threshold; MilvusVectorStore creates explicit schemas with VARCHAR IDs and FLOAT_VECTOR embeddings indexed by COSINE distance; LanceDB and Pinecone follow the same dual-collection pattern.
Source: extensions/ee/vectorstores/chromadb/pandasai_chromadb/chroma.py:1-40, extensions/ee/vectorstores/milvus/pandasai_milvus/milvus.py:1-40
Community issue #1874 reports that the limit attribute on SemanticLayerSchema appears to have no effect on how many rows are included in the prompt; this is consistent with the observation that schema-level controls are not always wired through the rendering pipeline.
4. Known Failure Modes
Several recurring failure modes surface from the community and are reflected in the code:
- Unsafe code execution (#1893, #1895) – the default executor runs LLM-generated code with full
__builtins__, exposing the host to RCE through indirect prompt injection. Mitigations must be applied at the executor level, not via prompts. - Dead system-prompt placeholder (#1853) –
agent.descriptionis dropped before the LLM call. - Missing conversation context in SQL prompt (#1856) – the SQL template does not render prior turns.
- Stale state on
chat()(#1855) –last_code_generatedis not cleared, contaminating the next prompt. - Python version constraints (#1850, #1872) – the package pins
<3.12because ofscipy==1.10.1. - Local-model ergonomics (#187, #799, #1181, #1888) – repeated requests for first-class Ollama, LM Studio, and Open WebUI support; these depend on the
LocalLLMshim and a workingBaseOpenAI-style client.
See Also
- SmartDataframe and SmartDatalake
- Vector store extensions (ChromaDB, Milvus, LanceDB, Pinecone)
- LLM backends (OpenAI, Azure OpenAI, LiteLLM, Bedrock)
Source: https://github.com/sinaptik-ai/pandas-ai / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 19 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1872
2. Runtime risk: Runtime risk requires verification
- Severity: high
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1896
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1868
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1853
5. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1856
6. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/sinaptik-ai/pandas-ai
7. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1888
8. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: packet_text.keyword_scan | https://github.com/sinaptik-ai/pandas-ai
9. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1874
10. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/sinaptik-ai/pandas-ai/issues/1855
11. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/sinaptik-ai/pandas-ai
12. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/sinaptik-ai/pandas-ai
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using pandas-ai with real data or production workflows.
- Free trial is mentioned as no cc required but it expects a cc - github / github_issue
- Default code executor runs LLM-generated code with full builtins (no san - github / github_issue
- Agent ignores Conversation History: Missing placeholders in generate_pyt - github / github_issue
- agent.chat() Fails to Reset Code Context (State Residue) - github / github_issue
- agent.description (System Prompt) is dead-code - github / github_issue
- SemanticLayerSchema limit seems to do nothing - github / github_issue
- Code Injection in CodeExecutor.execute Allows Arbitrary Code Execution v - github / github_issue
- Upgrade supported Python version >=3.12 for pandasai - github / github_issue
- pillow of version ^10.1.0 has OOB CVE - github / github_issue
- Feature request: optional WFGY 16-problem RAG debugger for pandas-ai - github / github_issue
- Support for GPT-5 or above Reasoning Model - github / github_issue
- Issue on Ollama Models - github / github_issue
Source: Project Pack community evidence and pitfall evidence