# https://github.com/Cinnamon/kotaemon Project Manual

Generated at: 2026-06-22 15:48:56 UTC

## Table of Contents

- [Overview and System Architecture](#page-overview)
- [RAG Pipeline, LLMs, and Reasoning](#page-rag-pipeline)
- [Document Loading, Parsing, and Indexing](#page-document-indexing)
- [Deployment, Configuration, and Extensibility](#page-deployment-config)

<a id='page-overview'></a>

## Overview and System Architecture

### Related Pages

Related topics: [RAG Pipeline, LLMs, and Reasoning](#page-rag-pipeline), [Document Loading, Parsing, and Indexing](#page-document-indexing), [Deployment, Configuration, and Extensibility](#page-deployment-config)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md)
- [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)
- [libs/kotaemon/kotaemon/agents/tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py)
- [libs/kotaemon/kotaemon/agents/tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py)
- [libs/kotaemon/kotaemon/agents/tools/wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py)
- [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py)
- [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/prompt.py)
- [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)
- [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)
- [libs/kotaemon/kotaemon/loaders/docx_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docx_loader.py)
- [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)
- [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)
</details>

# Overview and System Architecture

## Purpose and Scope

Kotaemon is an open-source RAG (Retrieval-Augmented Generation) UI and framework designed for two audiences: end users who want a clean interface to chat with their documents, and developers who want to build custom RAG pipelines. The repository is organized as a monorepo with two principal Python packages: a reusable core library (`kotaemon`) and an application/UI layer (`ktem`) that consumes it.

The project positions itself as a framework, not just an app. According to the top-level README, contributors "make `kotaemon` better" by opening pull requests, while developers "have `import kotaemon` somewhere in your project", and end users "use an app like the one in the demo" [README.md:1-60](). The library is also published independently via `pip install kotaemon@git+...` from the `libs/kotaemon` sub-project [libs/kotaemon/README.md:1-30]().

## Two-Tier Architecture

Kotaemon follows a clear two-tier separation. The lower tier is the `kotaemon` Python package, which provides reusable building blocks — base abstractions, document loaders, embedding/LLM clients, retrievers, and agents. The upper tier is the `ktem` package, which implements the Gradio-based web application, UI rendering helpers, and user-facing flows.

```mermaid
graph TD
    A[End User] -->|Gradio UI| B[ktem Application Layer]
    B -->|Imports & orchestrates| C[kotaemon Core Library]
    C --> D[Document Loaders]
    C --> E[Base Components & LLMs]
    C --> F[Agent & Tool System]
    D --> D1[OCR / Docling / DOCX / Mathpix / PaddleOCR]
    F --> F1[ReWOO Agent]
    F --> F2[Tools: MCP, Wikipedia, Google, LLM]
```

### Core Library (`kotaemon`)

The core library contains the abstractions that power any RAG pipeline built on top of kotaemon. The agent subsystem is structured around a `BaseTool` interface and a set of concrete tool implementations exposed via the `kotaemon.agents.tools` package [libs/kotaemon/kotaemon/agents/tools/__init__.py:1-30](). The available tools include `GoogleSearchTool`, `LLMTool`, `WikipediaTool`, and `MCPTool`, along with helper functions (`build_args_model`, `create_tools_from_config`, `discover_tools_info`, `format_tool_list`, `parse_mcp_config`) for tool discovery and configuration.

The MCP bridge is particularly noteworthy: `MCPTool` wraps the MCP SDK's tool schema and converts MCP server tools into kotaemon-compatible `BaseTool` instances, including a JSON-Schema-to-Pydantic conversion via `build_args_model` [libs/kotaemon/kotaemon/agents/tools/mcp.py:1-80](). This makes MCP-compatible servers plug-and-play for agent pipelines.

The `WikipediaTool` provides a thin wrapper around the `wikipedia` Python package, returning either a `Document` (with a `page` URL in metadata) or a fallback string listing similar entries when the page is ambiguous or missing [libs/kotaemon/kotaemon/agents/tools/wikipedia.py:1-90](). A Pydantic `WikipediaArgs` model defines the `query` argument schema, following the same pattern as other tools in the package.

### Application Layer (`ktem`)

The `ktem` package wraps the core library into a usable product. Rendering helpers in `ktem.utils.render` transform retrieval results, markdown tables, and PDF page references into HTML suitable for the Gradio UI. For example, `Render.collapsible` builds a `<details>` evidence block, and PDF citations inject a `data-page` / `data-search` anchor that links to the in-browser PDF viewer [libs/ktem/ktem/utils/render.py:1-100](). The `get_header` helper composes citation headers from `page_label` and `file_name` metadata, which is the same metadata produced by the loaders in the core library.

## Agent System: ReWOO Pattern

The agent implementation in `kotaemon.agents.rewoo` follows a three-stage ReWOO (Reasoning WithOut Observation) pattern: a **Planner** decomposes a task into a sequence of plans and tool calls, a **Worker** executes those calls and gathers evidence, and a **Solver** synthesizes the final answer from the accumulated evidence [libs/kotaemon/kotaemon/agents/rewoo/agent.py:1-60]().

The prompts enforce a strict `#Plan1` / `#E1` / `#Plan2` / `#E2` notation, where each `#En` is a named evidence slot that later plans can reference as input — for example `#E2: Calculator[#E1^3]` [libs/kotaemon/kotaemon/agents/rewoo/prompt.py:1-60](). The solver prompt then summarizes each step in natural language and produces a final conclusion, with optional citation generation via a `CitationPipeline` [libs/kotaemon/kotaemon/agents/rewoo/agent.py:30-60]().

## Document Loading Subsystem

The `kotaemon.loaders` package provides a pluggable loader architecture for ingesting heterogeneous documents. Concrete loaders target specific formats and extraction strategies:

| Loader | Target | Notes |
| --- | --- | --- |
| `OCRReader` / `ImageReader` | Scanned PDFs, images | Calls an OCR FullOCR endpoint, focused on table extraction [libs/kotaemon/kotaemon/loaders/ocr_loader.py:1-50]() |
| `DoclingLoader` | Complex PDFs | Uses Docling's `data.grid` table representation and converts bounding boxes from bottom-left to top-left origin [libs/kotaemon/kotaemon/loaders/docling_loader.py:1-60]() |
| `DocxReader` | Microsoft Word | Splits tables and text into separate `Document` objects with `type` metadata [libs/kotaemon/kotaemon/loaders/docx_loader.py:1-40]() |
| `MathpixLoader` | PDFs (math-heavy) | Streams pages with explicit `table_origin` and `page_label` metadata [libs/kotaemon/kotaemon/loaders/mathpix_loader.py:1-50]() |
| `PaddleOCR` (VL + adapter) | Scanned docs, charts, seals | Added in v0.12.0; the adapter normalizes PPStructureV3 / PaddleOCRVL output into kotaemon `Document`s with `text`, `table`, `image`, and `formula` block categories [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py:1-50](), [libs/kotaemon/kotaemon/loaders/paddleocr_loader/paddleocr_vl_loader.py:1-40]() |

All loaders yield `Document` objects with consistent metadata keys (`page_label`, `file_name`, `type`), which the rendering layer in `ktem` consumes directly for citation headers and PDF page previews.

## Community-Driven Extensibility

The architecture is intentionally open to extension, which is reflected in the most upvoted community requests. Issue #392 requests AWS Bedrock as an LLM provider (Anthropic, Meta, Mistral, Cohere via the Converse API), #160 requests Gemini API support, and #243 requests integration of alternative GraphRAG backends such as `nano-graphrag`. Issue #647 requests exposing FastAPI endpoints alongside the Gradio UI for programmatic access. Each of these maps cleanly onto existing extension points: new LLM clients plug into the base component layer, new tools conform to `BaseTool`, and the MCP bridge already demonstrates how external server schemas can be auto-wrapped.

## See Also

- Agents and Tools — detailed coverage of `BaseTool`, `MCPTool`, and the ReWOO planner/worker/solver pipeline.
- Document Loaders — full reference for OCR, Docling, Mathpix, and PaddleOCR loaders.
- Rendering and UI — how `ktem.utils.render` produces HTML evidence, tables, and PDF previews.
- LLM Providers — supported API and local model integrations.

---

<a id='page-rag-pipeline'></a>

## RAG Pipeline, LLMs, and Reasoning

### Related Pages

Related topics: [Overview and System Architecture](#page-overview), [Document Loading, Parsing, and Indexing](#page-document-indexing), [Deployment, Configuration, and Extensibility](#page-deployment-config)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md)
- [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)
- [libs/kotaemon/kotaemon/agents/tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py)
- [libs/kotaemon/kotaemon/agents/tools/wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py)
- [libs/kotaemon/kotaemon/agents/tools/google.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/google.py)
- [libs/kotaemon/kotaemon/agents/tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py)
- [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py)
- [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/prompt.py)
- [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)
- [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)
- [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)
- [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py)
- [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)
</details>

# RAG Pipeline, LLMs, and Reasoning

## Overview

kotaemon is an open-source RAG (Retrieval-Augmented Generation) UI and framework that supports the full lifecycle of document question answering: ingestion, indexing, retrieval, reasoning, and answer rendering. According to the project README, it ships a "Hybrid RAG pipeline" with full-text and vector retrieval plus re-ranking, supports multiple LLM providers (OpenAI, Azure, Ollama, Groq, local llama.cpp), and exposes advanced reasoning via question decomposition, `ReAct`, and `ReWOO` agents. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).

The framework is split into two layers:

| Layer | Package | Role |
| --- | --- | --- |
| Core library | `libs/kotaemon/kotaemon/` | Reusable components: loaders, LLMs, agents, tools, embeddings, retrievers |
| Application | `libs/ktem/` | Gradio-based UI, conversation flow, rendering, settings |

The core library is pip-installable (`pip install -e ".[dev]"`) and is the integration point for developers building custom RAG pipelines. Source: [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md).

## Multi-Modal Document Loading

A defining feature of kotaemon is its pluggable document loader layer, which converts heterogeneous inputs (PDFs, images, scanned pages, math-heavy documents) into normalized `Document` objects that downstream retrievers can consume.

- **OCRReader / ImageReader** reads PDFs through an external OCR endpoint, yielding `Document` objects with `type: "table"` or `type: "text"` and `page_label` metadata. Source: [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py).
- **DoclingLoader** parses complex PDFs and returns three streams: text pages, markdown tables (built from `data.grid`), and figures with bounding-box coordinates normalized to the page. Source: [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py).
- **MathpixLoader** preserves raw `table_origin` and per-page metadata for math/equation-heavy PDFs. Source: [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py).
- **PaddleOCRLoader adapter** (added in v0.12.0, see release notes) wraps PaddleOCR's `PPStructureV3` and `PaddleOCRVL` outputs, classifying blocks by `block_label` (text, table, image, formula) and optionally cropping figures to base64 data URLs. Source: [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py).

```mermaid
flowchart LR
    PDF[PDF / Image] --> Loader{Loader Selection}
    Loader -->|Scanned| OCR[OCRReader]
    Loader -->|Structured| Docling[DoclingLoader]
    Loader -->|Math| Mathpix[MathpixLoader]
    Loader -->|Layout VL| PaddleOCR[PaddleOCRLoader]
    OCR --> Docs[Document objects]
    Docling --> Docs
    Mathpix --> Docs
    PaddleOCR --> Docs
    Docs --> Index[Vector + Full-text Index]
    Index --> QA[RAG QA Pipeline]
```

## Reasoning Agents

For complex or multi-hop questions, kotaemon offers agent-based reasoning on top of the standard RAG pipeline. The `agents` package exposes pluggable planners, solvers, and tool registries.

### ReWOO Agent

The ReWOO (`Reasoning WithOut Observation`) agent decomposes a task into a plan-and-execute pipeline. According to the README, the project supports `ReAct`, `ReWOO`, and other agentic strategies. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).

Internally, `ReWOOAgent` runs in three stages:

1. **Plan**: A planner LLM emits a textual plan, with steps referencing evidence variables `#E1`, `#E2`, ... Source: [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/prompt.py).
2. **Work**: `_get_worker_evidences` executes the tool calls referenced by each `#E` symbol, accumulating plugin cost and token usage. Source: [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py).
3. **Solve**: A solver LLM condenses the evidence into a final answer. A `CitationPipeline` can optionally attach citations to the final output. Source: [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py).

Both zero-shot and few-shot prompt templates are provided for planner and solver, allowing developers to bias the agent toward particular reasoning styles. Source: [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/prompt.py).

## Agent Tools

Tools are the building blocks that let agents act on the world outside the indexed documents. The `agents.tools` module exports a uniform `BaseTool` interface that all concrete tools implement. Source: [libs/kotaemon/kotaemon/agents/tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py).

| Tool | Class | Purpose | Source |
| --- | --- | --- | --- |
| Wikipedia | `WikipediaTool` | Fetch holistic knowledge about a topic; uses the `wikipedia` package, returns a `Document` with `metadata.page` URL on hit | [tools/wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py) |
| Google Search | `GoogleSearchTool` | Short, succinct web answers via `SerpAPIWrapper` | [tools/google.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/google.py) |
| LLM-as-tool | `LLMTool` | Wrap any LLM component as a callable tool for other agents | [tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py) |
| MCP | `MCPTool` | Bridge Model Context Protocol server tools into kotaemon; auto-generates a Pydantic args model from the MCP JSON Schema | [tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py) |

The MCP integration is particularly notable for extensibility: `build_args_model` converts a JSON Schema into a typed Pydantic model, while `discover_tools_info`, `create_tools_from_config`, and `parse_mcp_config` let developers wire an MCP server into a ReAct or ReWOO agent declaratively. Source: [libs/kotaemon/kotaemon/agents/tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py).

## Rendering, Citations, and UI Integration

The application layer (`libs/ktem/`) consumes the core library to render answers, citations, and evidence previews. `Render.collapsible` produces the collapsible evidence blocks shown beside each answer, and a `pdf-link` helper embeds an in-browser PDF previewer with `data-page` and `data-search` attributes that drive the highlight-on-page feature advertised in the README. Source: [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py).

This is what makes kotaemon's "Advanced citations with document preview" feature work end-to-end: loaders attach `page_label` and table origin metadata, retrievers attach relevance scores, and the renderer turns the scored evidence into a navigable, highlighted document view. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).

## Community Considerations

Several community requests map directly to the LLM and reasoning layer covered on this page:

- **Provider coverage** — Requests for AWS Bedrock (#392) and Gemini API (#160) reflect the ongoing effort to widen the LLM adapter set beyond the defaults documented in the README. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).
- **GraphRAG backends** — Issue #243 (integrate `nano-graphrag`) relates to the "Support multiple strategies for document indexing & retrieval" claim in the README and sits adjacent to the reasoning layer, since GraphRAG outputs are typically consumed by agentic pipelines. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).
- **Programmatic API access** — Request #647 for FastAPI endpoints inside the Docker image would expose the same RAG and agent pipelines currently driven by the Gradio UI. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md).
- **Apple Silicon / local install** — Issue #138 documents a workaround path for M1 Macs that bypasses the published Docker image, which is relevant for users running local LLMs through `llama-cpp-python` or `Ollama`.

## See Also

- Document loaders and multi-modal ingestion
- Agent framework (ReAct / ReWOO)
- Tool integrations (Wikipedia, Google, MCP)
- Gradio UI and rendering pipeline

---

<a id='page-document-indexing'></a>

## Document Loading, Parsing, and Indexing

### Related Pages

Related topics: [Overview and System Architecture](#page-overview), [RAG Pipeline, LLMs, and Reasoning](#page-rag-pipeline), [Deployment, Configuration, and Extensibility](#page-deployment-config)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [libs/kotaemon/kotaemon/loaders/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/__init__.py)
- [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)
- [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)
- [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)
- [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py)
- [libs/kotaemon/kotaemon/loaders/docx_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docx_loader.py)
- [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)
- [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md)
- [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)
</details>

# Document Loading, Parsing, and Indexing

## Overview

Kotaemon is an open-source RAG UI for chatting with documents. At the heart of any RAG workflow is the **document ingestion pipeline**, which converts raw files (PDF, DOCX, images) into structured `Document` objects that can be chunked, embedded, indexed, and later retrieved. The `kotaemon.loaders` package provides a pluggable set of readers, each targeting a different file format or extraction quality tier ([libs/kotaemon/kotaemon/loaders/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/__init__.py)).

The pipeline follows a consistent contract: every loader returns one or more `Document` objects with `text` and `metadata` fields, where `metadata` typically carries `page_label`, `file_name`, and `type` (e.g., `"text"` or `"table"`). Downstream indexers and the `ktem` UI consume this uniform representation ([libs/kotaemon/kotaemon/loaders/docling_loader.py:50-58](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)).

```mermaid
flowchart LR
    A[Raw file<br/>PDF / DOCX / Image] --> B{Loader}
    B --> C1[OCRLoader]
    B --> C2[DoclingLoader]
    B --> C3[MathpixLoader]
    B --> C4[DocxLoader]
    B --> C5[PaddleOCR]
    C1 --> D[Document objects<br/>text + metadata]
    C2 --> D
    C3 --> D
    C4 --> D
    C5 --> D
    D --> E[Chunking & Embedding]
    E --> F[Vector / Graph Index]
    F --> G[Render utilities<br/>evidence + preview]
    G --> H[Gradio UI]
```

## Available Document Loaders

Kotaemon ships several loaders, each tuned for a specific content type. The list below summarizes the most prominent ones and the metadata they emit.

| Loader | Best For | Key Metadata Emitted | Source |
|---|---|---|---|
| `OCRLoader` / `ImageReader` | PDFs with scanned tables and images | `table_origin`, `type`, `page_label` | [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py) |
| `DoclingLoader` | Structured PDFs with layout-aware text | `page_label`, `file_name`, `file_path` | [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py) |
| `MathpixLoader` | Scientific PDFs (equations, tables) | `table_origin`, `source`, `page_label` | [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py) |
| `DocxLoader` | Microsoft Word documents | `type` (table/text), `page_label: 1` | [libs/kotaemon/kotaemon/loaders/docx_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docx_loader.py) |
| PaddleOCR adapter | OCR via PPStructureV3 / PaddleOCRVL | `block_label`, `block_content`, cropped image data URL | [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py) |

### OCR-Based Loading

The OCR loader wraps a FullOCR HTTP endpoint and is geared toward table-heavy PDFs. By default it targets `http://127.0.0.1:8000/v2/ai/infer/`, overridable via `OCR_READER_ENDPOINT`. Tables are stripped of markdown special characters and emitted as `Document` objects with `type: "table"` and `table_origin` carrying the raw text ([libs/kotaemon/kotaemon/loaders/ocr_loader.py:1-30](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)).

### Layout-Aware Loading with Docling

`DoclingLoader` groups raw text by `page_no` from the docling provenance (`prov`) field, joins the text on each page, and emits one `Document` per page alongside separately parsed tables and figures. Bounding boxes returned by docling are converted from bottom-left to top-left coordinates so the UI can overlay highlights correctly ([libs/kotaemon/kotaemon/loaders/docling_loader.py:30-70](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)).

### Scientific PDFs with Mathpix

`MathpixLoader` first sends the PDF to the Mathpix API and receives back markdown-style content. It then regex-splits the response into tables and prose sections per page, yielding `Document` objects with `type: "text"` or `type: "table"`. If the response is empty, a single fallback `Document` with `page_label: 1` is produced ([libs/kotaemon/kotaemon/loaders/mathpix_loader.py:1-60](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py)).

### DOCX and PaddleOCR

`DocxLoader` extracts tables as CSV strings (via `pandas.DataFrame.to_csv`) and joins each page's prose into one `Document` with `page_label: 1`, since DOCX lacks reliable pagination ([libs/kotaemon/kotaemon/loaders/docx_loader.py:1-20](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docx_loader.py)). The PaddleOCR adapter, introduced in v0.12.0, normalizes raw OCR output (figures, tables, formulas) into `Document` instances and can crop images out of the source page when `crop_image` is available, returning base64 data URLs ([libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py:1-50](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)).

## Parsing, Indexing, and Rendering

### Uniform `Document` Schema

Every loader must produce objects compatible with the project's `Document` base class. The common shape is:

```python
Document(
    text="<chunk text>",
    metadata={
        "page_label": <int>,
        "file_name": "<name>",
        "type": "text" | "table",
        # optional: table_origin, source, bbox, ...
    },
)
```

This uniformity allows downstream indexers to treat text and table chunks the same way during embedding and retrieval, while preserving the metadata needed for evidence citation in the UI ([libs/kotaemon/kotaemon/loaders/ocr_loader.py:10-25](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)).

### Custom Indexing Pipelines

The `ktem` application ships with pluggable indexing pipelines. The README points developers to the GraphRAG example at `libs/ktem/ktem/index/file/graph` for implementing custom flows. Community interest in alternative GraphRAG backends (e.g., `nano-graphrag`) is tracked in issue #243, indicating that the indexing layer is intentionally modular ([README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md)).

### Evidence Rendering in the UI

Once retrieved, `RetrievedDocument` objects are rendered into HTML by the `Render` utility class in `libs/ktem/ktem/utils/render.py`. Notable helpers include:

- `collapsible(header, content)` — wraps evidence in a `<details>` block ([libs/ktem/ktem/utils/render.py:30-40](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)).
- `get_header(doc)` — builds an evidence header such as `[Page 3] report.pdf` from metadata ([libs/ktem/ktem/utils/render.py:18-25](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)).
- `highlight(text, elem_id)` — wraps matched text in `<mark id="mark-...">` so the front-end can scroll to the highlighted span ([libs/ktem/ktem/utils/render.py:50-60](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)).
- A PDF preview link generator that uses `fast_langdetect` to decide whether to highlight the full snippet or just the first line, and emits a `<a class="pdf-link">` tag pointing at the served file with `data-page` and `data-search` attributes ([libs/ktem/ktem/utils/render.py:60-110](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)).

This rendering layer is what closes the loop between ingestion and UX: the metadata produced by loaders (`page_label`, `file_name`, `table_origin`) is consumed directly by `Render` to build evidence panels and click-to-preview anchors.

## Common Pitfalls and Community Notes

- **Local installation on Apple Silicon (M1/M2).** Issue #138 highlights that the published Docker image is `amd64`-only and is out of date; many users instead run the native install via `pip install -e ".[dev]"` from the `libs/kotaemon` package ([libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)).
- **OCR endpoint availability.** `OCRLoader` requires a FullOCR service reachable at the configured endpoint; otherwise ingestion silently fails or returns empty pages.
- **DOCX pagination.** Because DOCX has no native page numbers, `DocxLoader` always emits `page_label: 1`. Users expecting per-section pagination should post-process.
- **PaddleOCR dependency.** The new PaddleOCR adapter (v0.12.0) requires the `crop_image` helper and the PaddleOCR runtime; without these, figure extraction silently returns `None` ([libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py:30-50](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)).

## See Also

- [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md) — project overview and quick start
- [User Guide](https://cinnamon.github.io/kotaemon/) — UI walkthrough and configuration
- [Developer Guide](https://cinnamon.github.io/kotaemon/development/) — extending loaders and index pipelines
- [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md) — package install and contribution guide

---

<a id='page-deployment-config'></a>

## Deployment, Configuration, and Extensibility

### Related Pages

Related topics: [Overview and System Architecture](#page-overview), [RAG Pipeline, LLMs, and Reasoning](#page-rag-pipeline), [Document Loading, Parsing, and Indexing](#page-document-indexing)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md)
- [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)
- [libs/kotaemon/kotaemon/agents/tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py)
- [libs/kotaemon/kotaemon/agents/tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py)
- [libs/kotaemon/kotaemon/agents/tools/wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py)
- [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py)
- [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/prompt.py)
- [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/ocr_loader.py)
- [libs/kotaemon/kotaemon/loaders/docx_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docx_loader.py)
- [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/docling_loader.py)
- [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/mathpix_loader.py)
- [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)
- [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py)
</details>

# Deployment, Configuration, and Extensibility

Kotaemon positions itself as an open-source RAG UI that serves three audiences simultaneously — end users chatting with documents, developers building pipelines on top of `kotaemon`, and contributors extending the framework itself. This page documents the deployment surface, the configuration points, and the extension mechanisms exposed by the core library, all backed by source-level evidence.

## 1. Installation and Deployment Surface

The library is distributed as the `kotaemon` package and is installed in editable mode for development. The recommended workflow is a Conda environment on Python 3.10+ followed by an editable install with the dev extras and pre-commit hooks. Source: [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md)

```shell
conda create -n kotaemon python=3.10
conda activate kotaemon
pip install -e ".[dev]"
pre-commit install
pytest tests
```

For end users, the top-level README advertises "Easy Installation: Simple scripts to get you started quickly" together with a Gradio-based web UI. Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md) The same README points to an "Online Install" flow and a Colab notebook for local RAG, indicating that the project supports both a hosted demo path and a self-hosted local path. The community has noted friction with the published Docker image on Apple Silicon (issue #138 — "I did not use the docker image because it is out of date and because it does not match my amd64 platform"), which is why the editable Conda flow is the most reliable path for developers on M-series Macs.

A rendering helper in the `ktem` app package resolves the `GR_FILE_ROOT_PATH` environment variable to construct absolute URLs for evidence files used by the UI's PDF preview link generation. Source: [libs/ktem/ktem/utils/render.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/ktem/ktem/utils/render.py) This is the most visible runtime configuration knob for production deployments: the UI must know the file root from which to serve user-uploaded documents and evidence artifacts.

## 2. Configuration: Models, Credentials, and Runtime

Kotaemon is designed to be model-agnostic. The README states the framework is "Compatible with LLM API providers (OpenAI, AzureOpenAI, Cohere, etc.) and local LLMs (via `ollama` and `llama-cpp-python`)" with the explicit goal to "Organize your LLM & Embedding models." Source: [README.md](https://github.com/Cinnamon/kotaemon/blob/main/README.md) The release notes for v0.12.0 also highlight that embedding and chat model slots can be swapped independently through the configuration layer (release line: "feat: integrate PaddleOCR as document loaders + enhance chat/index UX").

The core library uses `python-dotenv` for credential management in development, but with a strict boundary: dotenv-based credentials are explicitly allowed only in `examples/`, not inside the main source (`kotaemon/`, `tests/`). Secrets are encrypted at rest with `git-secret` and `gpg`. Source: [libs/kotaemon/README.md](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/README.md) This separation keeps production credentials injectable via environment variables while protecting examples and shared configs.

Community requests make the configuration extension points concrete. Issue #392 requests an AWS Bedrock provider (Converse API for Anthropic, Meta, Mistral, Cohere), and issue #160 requests a Gemini API provider. Both are integration-shape requests — they target the model-provider configuration slot rather than core retrieval logic — which confirms that adding a new provider is a configuration and adapter task, not a framework change. Similarly, issue #243 ("Integrate another GraphRAG backend") targets the GraphRAG adapter slot, while issue #647 requests exposing FastAPI endpoints alongside the existing Gradio UI for headless deployments.

## 3. Extensibility: Tools, Loaders, and Agents

The extensibility surface is structured around three orthogonal axes, each with a stable base class and a registration point.

### 3.1 Tools

The agent tool registry is centralized in `kotaemon/agents/tools/__init__.py`, which exports `BaseTool`, `ComponentTool`, `GoogleSearchTool`, `LLMTool`, `MCPTool`, and `WikipediaTool`. Source: [libs/kotaemon/kotaemon/agents/tools/__init__.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py) Each tool subclasses `BaseTool` and is described by a name, a natural-language description, and a Pydantic `args_schema` — the schema is what an LLM planner uses to decide when and how to call the tool.

The Wikipedia tool is a representative example: it defines `WikipediaArgs` with a single `query` field, lazily imports the `wikipedia` package, and returns a `Document` carrying the page URL in its metadata. Source: [libs/kotaemon/kotaemon/agents/tools/wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py)

The MCP integration shows the depth of the extension API. The module bridges the MCP SDK's JSON-Schema-based tool definitions to Pydantic models via `build_args_model`, which maps JSON types (`string`, `integer`, `number`, `boolean`, `object`, `array`) to Python types and marks required fields with `Field(...)`. Source: [libs/kotaemon/kotaemon/agents/tools/mcp.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/mcp.py) The `parse_mcp_config` helper normalizes a JSON config into `{transport, command, args, env}` and even handles the common mistake of pasting a full shell command (e.g., `"npx -y mcp-remote https://..."`) into the `command` field by splitting on whitespace with `shlex.split`. Helpers `create_tools_from_config`, `discover_tools_info`, and `format_tool_list` are all exported and used by agent runners to materialize `MCPTool` instances from a declarative config block.

### 3.2 Loaders

Document loaders are pluggable readers that yield `Document` objects. The codebase ships with specialized loaders for the formats that are hardest to handle correctly:

- **DOCX** — produces text and `type: "table"` documents with CSV-encoded table origin metadata. Source: [libs/kotaemon/kotaemon/loaders/docx_loader.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/docx_loader.py)
- **OCR (FullOCR pipeline)** — `ImageReader` targets an HTTP endpoint (default `http://127.0.0.1:8000/v2/ai/infer/`, overridable via the `OCR_READER_ENDPOINT` environment variable) and emits tables with `table_origin` plus per-page text. Source: [libs/kotaemon/kotaemon/loaders/ocr_loader.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/ocr_loader.py)
- **Mathpix** — yields per-page text and table documents with `page_label` and `page_number` metadata, plus a fallback that emits a single text document when no structure is parsed. Source: [libs/kotaemon/kotaemon/loaders/mathpix_loader.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/mathpix_loader.py)
- **Docling** — groups parsed text by `page_no`, converts bbox coordinates from bottom-left to top-left, and renders tables from Docling's grid format into Markdown. Source: [libs/kotaemon/kotaemon/loaders/docling_loader.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/docling_loader.py)
- **PaddleOCR** (added in v0.12.0) — `Adapter` normalizes both PPStructureV3 and PaddleOCR-VL outputs into kotaemon `Document` objects, classifying blocks into text, table, image, and formula label sets and cropping figure regions from page images when available. Source: [libs/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/paddleocr_loader/adapter.py)

### 3.3 Agents

The ReWOO agent implements a three-stage plan–work–solve loop. The planner produces `#Plan1` / `#E1` style outputs that reference prior evidence variables (e.g., `#E1^3`); the worker executes the tool calls; the solver produces the final answer. Source: [libs/kotaemon/kotaemon/agents/rewoo/prompt.py](https://github.com/Cinnamon/kotaemon/kotaemon/agents/rewoo/prompt.py) The `Agent.stream` method runs the planner, accumulates token and cost totals, optionally invokes a `CitationPipeline` on the worker log, and returns an `AgentOutput` with `text`, `agent_type`, `status`, `total_tokens`, `total_cost`, and `metadata` (including the worker log and citation). Source: [libs/kotaemon/kotaemon/agents/rewoo/agent.py](https://github.com/Cinnamon/kotaemon/kotaemon/agents/rewoo/agent.py)

## 4. Common Failure Modes and Operational Notes

| Symptom | Likely cause | Source-side mitigation |
|---|---|---|
| `Could not import wikipedia python package` | Optional dep missing | Tool raises a clear `ValueError` instructing `pip install wikipedia` [wikipedia.py](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/wikipedia.py) |
| OCR endpoint unreachable | Sidecar service down | `ImageReader` reads `OCR_READER_ENDPOINT` and falls back to a default URL [ocr_loader.py](https://github.com/Cinnamon/kotaemon/kotaemon/loaders/ocr_loader.py) |
| Docker image stale or wrong arch | Published image lags releases | Use the editable Conda flow described in libs/kotaemon/README.md |
| PDF preview links broken | `GR_FILE_ROOT_PATH` unset | `Render` class in libs/ktem/ktem/utils/render.py uses it as a base for `data-src` URLs |

## See Also

- README.md — High-level overview and feature list
- libs/kotaemon/README.md — Developer setup, testing, and credential handling
- [Agent Tools registry](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/tools/__init__.py) — Tool export list
- [ReWOO agent implementation](https://github.com/Cinnamon/kotaemon/blob/main/libs/kotaemon/kotaemon/agents/rewoo/agent.py) — Plan–work–solve loop and citation pipeline

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: Cinnamon/kotaemon

Summary: Found 12 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Cinnamon/kotaemon/issues/833

## 2. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Cinnamon/kotaemon/issues/834

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: runtime_trace
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Repro command: `docker run -e GRADIO_SERVER_NAME=0.0.0.0 -e GRADIO_SERVER_PORT=7860 -v ./ktem_app_data:/app/ktem_app_data -p 7860:7860 -it --rm ghcr.io/cinnamon/kotaemon:main-full`
- Evidence: identity.distribution | https://github.com/Cinnamon/kotaemon

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Cinnamon/kotaemon/issues/839

## 5. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Cinnamon/kotaemon/issues/821

## 6. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Cinnamon/kotaemon

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Cinnamon/kotaemon

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Cinnamon/kotaemon

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Cinnamon/kotaemon

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Cinnamon/kotaemon/issues/841

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Cinnamon/kotaemon

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Cinnamon/kotaemon

<!-- canonical_name: Cinnamon/kotaemon; human_manual_source: deepwiki_human_wiki -->
