# minirag - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 minirag 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **想在安装前理解开源项目价值和边界的用户**：当前证据主要来自项目文档。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `pip install -e .` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `pip install lightrag-hku` 证据：`README.md` Claim：`clm_0004` supported 0.86

## 继续前判断卡

- **当前建议**：先做权限沙盒试用
- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。

### 30 秒判断

- **现在怎么做**：先做权限沙盒试用
- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装
- **先别相信**：工具权限边界不能在安装前相信。
- **继续会触碰**：命令执行、本地环境或项目文件、宿主 AI 上下文

### 现在可以相信

- **适合人群线索：想在安装前理解开源项目价值和边界的用户**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **工具权限边界不能在安装前相信。**（unverified）：MCP/tool 类项目通常会触碰文件、网络、浏览器或外部 API，必须真实检查权限和日志。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0005` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0006` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：69
- 重要文件覆盖：40/69
- 证据索引条目：39
- 角色 / Skill 条目：8

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 minirag 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 minirag 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 minirag 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 8 个角色 / Skill / 项目文档条目。

- **MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation**（project_doc）：MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **LiHua-World Dataset**（project_doc）：! LiHuaWorld https://files.mdnice.com/user/87760/39923168-2267-4caf-b715-7f28764549de.jpg 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`dataset/LiHua-World/README.md`
- **Install with API Support**（project_doc）：MiniRAG now provides optional API support through FastAPI servers that add RAG capabilities to existing LLM services. You can install MiniRAG with API support in two ways: using MiniRAG is the same as LightRAG 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`minirag/api/README.md`
- **LightRag Webui**（project_doc）：LightRag Webui A simple webui to interact with the lightrag datalake 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`minirag/api/static/README.md`
- **MiniRAG: 迈向极简检索增强生成**（project_doc）：! MiniRAG https://files.mdnice.com/user/87760/ff711e74-c382-4432-bec2-e6f2aa787df1.jpg 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README_CN.md`
- **LiHua-World 数据集**（project_doc）：! LiHuaWorld https://files.mdnice.com/user/87760/39923168-2267-4caf-b715-7f28764549de.jpg 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`dataset/LiHua-World/README_CN.md`
- **Communication**（project_doc）：We provide QR codes for joining the HKUDS discussion groups on WeChat and Feishu. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`Communication.md`
- **MiniRAG: 極めてシンプルな検索強化生成に向けて**（project_doc）：! MiniRAG https://files.mdnice.com/user/87760/ff711e74-c382-4432-bec2-e6f2aa787df1.jpg 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README_JA.md`

## 证据索引

- 共索引 39 条证据。

- **MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation**（documentation）：MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation 证据：`README.md`
- **LiHua-World Dataset**（documentation）：! LiHuaWorld https://files.mdnice.com/user/87760/39923168-2267-4caf-b715-7f28764549de.jpg 证据：`dataset/LiHua-World/README.md`
- **Install with API Support**（documentation）：MiniRAG now provides optional API support through FastAPI servers that add RAG capabilities to existing LLM services. You can install MiniRAG with API support in two ways: using MiniRAG is the same as LightRAG 证据：`minirag/api/README.md`
- **LightRag Webui**（documentation）：LightRag Webui A simple webui to interact with the lightrag datalake 证据：`minirag/api/static/README.md`
- **License**（source_file）：Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files the "Software" , to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 证据：`LICENSE`
- **MiniRAG: 迈向极简检索增强生成**（documentation）：! MiniRAG https://files.mdnice.com/user/87760/ff711e74-c382-4432-bec2-e6f2aa787df1.jpg 证据：`README_CN.md`
- **LiHua-World 数据集**（documentation）：! LiHuaWorld https://files.mdnice.com/user/87760/39923168-2267-4caf-b715-7f28764549de.jpg 证据：`dataset/LiHua-World/README_CN.md`
- **A toy query**（source_file）：EMBEDDING MODEL = "sentence-transformers/all-MiniLM-L6-v2" ⋮---- def get args ⋮---- parser = argparse.ArgumentParser description="MiniRAG" ⋮---- args = parser.parse args ⋮---- args = get args ⋮---- LLM MODEL = "microsoft/Phi-3.5-mini-instruct" ⋮---- LLM MODEL = "THUDM/glm-edge-1.5b-chat" ⋮---- LLM MODEL = "openbmb/MiniCPM3-4B" ⋮---- LLM MODEL = "Qwen/Qwen2.5-3B-Instruct" ⋮---- WORKING DIR = args.workingdir DATA PATH = args.datapath QUERY PATH = args.querypath OUTPUT PATH = args.outputpath ⋮---- rag = MiniRAG def find txt files root path ⋮---- txt files = ⋮---- WEEK LIST = find txt files DATA PATH ⋮---- id = WEEK LIST.index WEEK ⋮---- A toy query query = 'What does LiHua predict will happen… 证据：`main.py`
- **Init**（source_file）：version = "0.0.2" author = "Tianyu Fan" url = "https://github.com/HKUDS/MiniRAG" 证据：`minirag/__init__.py`
- **Base**（source_file）：TextChunkSchema = TypedDict T = TypeVar "T" ⋮---- @dataclass class QueryParam ⋮---- mode: Literal "light", "naive", "mini" = "mini" only need context: bool = False only need prompt: bool = False response type: str = "Multiple Paragraphs" stream: bool = False top k: int = int os.getenv "TOP K", "60" max token for text unit: int = 4000 max token for global context: int = 4000 max token for local context: int = 4000 max token for node context: int = 500 hl keywords: list str = field default factory=list ll keywords: list str = field default factory=list conversation history: list dict = field history turns: int = ⋮---- @dataclass class StorageNameSpace ⋮---- namespace: str global config: dict… 证据：`minirag/base.py`
- **Exceptions**（source_file）：class APIStatusError Exception ⋮---- response: httpx.Response status code: int request id: str None ⋮---- class APIConnectionError Exception class BadRequestError APIStatusError ⋮---- status code: Literal 400 = 400 class AuthenticationError APIStatusError ⋮---- status code: Literal 401 = 401 class PermissionDeniedError APIStatusError ⋮---- status code: Literal 403 = 403 class NotFoundError APIStatusError ⋮---- status code: Literal 404 = 404 class ConflictError APIStatusError ⋮---- status code: Literal 409 = 409 class UnprocessableEntityError APIStatusError ⋮---- status code: Literal 422 = 422 class RateLimitError APIStatusError ⋮---- status code: Literal 429 = 429 class APITimeoutError APIC… 证据：`minirag/exceptions.py`
- **Llm**（source_file）：class Model BaseModel ⋮---- gen func: Callable Any , str = Field kwargs: Dict str, Any = Field class Config ⋮---- arbitrary types allowed = True class MultiModel ⋮---- def init self, models: List Model def next model self ⋮---- next model = self. next model args = dict ⋮---- async def main ⋮---- result = await gpt 4o mini complete "How are you?" 证据：`minirag/llm.py`
- **RAGmode: str = 'minirag'**（source_file）：STORAGES = { ⋮---- def lazy external import module name: str, class name: str ⋮---- caller frame = inspect.currentframe .f back module = inspect.getmodule caller frame package = module. package if module else None def import class args, kwargs ⋮---- module = importlib.import module module name, package=package cls = getattr module, class name ⋮---- def always get an event loop - asyncio.AbstractEventLoop ⋮---- current loop = asyncio.get event loop ⋮---- new loop = asyncio.new event loop ⋮---- @dataclass class MiniRAG ⋮---- working dir: str = field RAGmode: str = 'minirag' kv storage: str = field default="JsonKVStorage" vector storage: str = field default="NanoVectorDBStorage" graph storage:… 证据：`minirag/minirag.py`
- **use llm func is wrapped in ascynio.Semaphore, limiting max async callings**（source_file）：tokens = encode string by tiktoken content, model name=tiktoken model results = ⋮---- chunk content = decode tokens by tiktoken ⋮---- tiktoken model name = global config "tiktoken model name" summary max tokens = global config "entity summary to max tokens" tokens = encode string by tiktoken description, model name=tiktoken model name ⋮---- entity name = clean str record attributes 1 .upper ⋮---- entity type = clean str record attributes 2 .upper entity description = clean str record attributes 3 entity source id = chunk key ⋮---- source = clean str record attributes 1 .upper target = clean str record attributes 2 .upper edge description = clean str record attributes 3 edge keywords = clean… 证据：`minirag/operate.py`
- **Prompt**（source_file）：GRAPH FIELD SEP = " " PROMPTS = {} 证据：`minirag/prompt.py`
- **Regular expression to find all Unicode escape sequences of the form \uXXXX**（source_file）：ENCODER = None logger = logging.getLogger "minirag" def set logger log file: str ⋮---- file handler = logging.FileHandler log file ⋮---- formatter = logging.Formatter ⋮---- @dataclass class EmbeddingFunc ⋮---- embedding dim: int max token size: int func: callable async def call self, args, kwargs - np.ndarray def compute mdhash id content, prefix: str = "" def compute args hash args, cache type: str None = None - str ⋮---- args str = "".join str arg for arg in args ⋮---- args str = f"{cache type}:{args str}" ⋮---- def clean text text: str - str ⋮---- """Clean text by removing null bytes 0x00 and whitespace""" ⋮---- def get content summary content: str, max length: int = 100 - str ⋮---- """G… 证据：`minirag/utils.py`
- **Step 0 Index**（source_file）：EMBEDDING MODEL = "sentence-transformers/all-MiniLM-L6-v2" ⋮---- def get args ⋮---- parser = argparse.ArgumentParser description="MiniRAG" ⋮---- args = parser.parse args ⋮---- args = get args ⋮---- LLM MODEL = "microsoft/Phi-3.5-mini-instruct" ⋮---- LLM MODEL = "THUDM/glm-edge-1.5b-chat" ⋮---- LLM MODEL = "openbmb/MiniCPM3-4B" ⋮---- LLM MODEL = "Qwen/Qwen2.5-3B-Instruct" ⋮---- WORKING DIR = args.workingdir DATA PATH = args.datapath QUERY PATH = args.querypath OUTPUT PATH = args.outputpath ⋮---- rag = MiniRAG def find txt files root path ⋮---- txt files = ⋮---- WEEK LIST = find txt files DATA PATH ⋮---- id = WEEK LIST.index WEEK 证据：`reproduce/Step_0_index.py`
- **if name == " main ":**（source_file）：EMBEDDING MODEL = "sentence-transformers/all-MiniLM-L6-v2" ⋮---- def get args ⋮---- parser = argparse.ArgumentParser description="MiniRAG" ⋮---- args = parser.parse args ⋮---- args = get args ⋮---- LLM MODEL = "microsoft/Phi-3.5-mini-instruct" ⋮---- LLM MODEL = "THUDM/glm-edge-1.5b-chat" ⋮---- LLM MODEL = "openbmb/MiniCPM3-4B" ⋮---- LLM MODEL = "Qwen/Qwen2.5-3B-Instruct" ⋮---- WORKING DIR = args.workingdir DATA PATH = args.datapath QUERY PATH = args.querypath OUTPUT PATH = args.outputpath ⋮---- rag = MiniRAG QUESTION LIST = GA LIST = ⋮---- reader = csv.DictReader question file ⋮---- def run experiment output path ⋮---- headers = "Question", "Gold Answer", "minirag" q already = ⋮---- row cou… 证据：`reproduce/Step_1_QA.py`
- **database packages**（source_file）：accelerate aiofiles aiohttp configparser graspologic json repair httpx 证据：`requirements.txt`
- **Setup**（source_file）：def read long description def read requirements ⋮---- deps = ⋮---- deps = line.strip for line in f if line.strip ⋮---- def read api requirements ⋮---- api deps = ⋮---- api deps = line.strip for line in f if line.strip ⋮---- long description = read long description requirements = read requirements 证据：`setup.py`
- **Init**（source_file）：api version = "1.0.3" 证据：`minirag/api/__init__.py`
- **Calculate estimated token count**（source_file）：scan progress: Dict = { progress lock = threading.Lock ⋮---- def estimate tokens text: str - int ⋮---- chinese chars = len re.findall r" \u4e00-\u9fff ", text non chinese chars = len re.findall r" ^\u4e00-\u9fff ", text Calculate estimated token count tokens = chinese chars 1.5 + non chinese chars 0.25 ⋮---- class OllamaServerInfos ⋮---- Constants for emulated Ollama model information LIGHTRAG NAME = "minirag" LIGHTRAG TAG = os.getenv "OLLAMA EMULATING MODEL TAG", "latest" LIGHTRAG MODEL = f"{LIGHTRAG NAME}:{LIGHTRAG TAG}" LIGHTRAG SIZE = 7365960935 it's a dummy value LIGHTRAG CREATED AT = "2024-01-15T00:00:00Z" LIGHTRAG DIGEST = "sha256:minirag" KV STORAGE = "JsonKVStorage" DOC STATUS STOR… 证据：`minirag/api/minirag_server.py`
- **Requirements**（source_file）：ascii colors fastapi nest asyncio numpy pipmaster python-dotenv python-multipart tenacity tiktoken torch tqdm uvicorn json repair 证据：`minirag/api/requirements.txt`
- **Milvus Impl**（source_file）：@dataclass class MilvusVectorDBStorge BaseVectorStorage ⋮---- def post init self async def upsert self, data: dict str, dict ⋮---- list data = contents = v "content" for v in data.values batches = async def wrapped task batch ⋮---- result = await self.embedding func batch ⋮---- embedding tasks = wrapped task batch for batch in batches pbar = tqdm async embeddings list = await asyncio.gather embedding tasks embeddings = np.concatenate embeddings list ⋮---- results = self. client.upsert collection name=self.namespace, data=list data ⋮---- async def query self, query, top k=5 ⋮---- embedding = await self.embedding func query results = self. client.search 证据：`minirag/kg/milvus_impl.py`
- **Convert None to 0 for addition**（source_file）：@dataclass class Neo4JStorage BaseGraphStorage ⋮---- @staticmethod def load nx graph file name def init self, namespace, global config, embedding func ⋮---- URI = os.environ "NEO4J URI" USERNAME = os.environ "NEO4J USERNAME" PASSWORD = os.environ "NEO4J PASSWORD" MAX CONNECTION POOL SIZE = os.environ.get "NEO4J MAX CONNECTION POOL SIZE", 800 DATABASE = os.environ.get ⋮---- database name = "home database" if DATABASE is None else f"database {DATABASE}" ⋮---- def post init self async def close self async def aexit self, exc type, exc, tb async def index done callback self async def has node self, node id: str - bool ⋮---- entity name label = node id.strip '"' ⋮---- query = result = await sess… 证据：`minirag/kg/neo4j_impl.py`
- **print v "ddl"**（source_file）：class OracleDB ⋮---- def init self, config, kwargs def numpy converter in self, value ⋮---- """Convert numpy array to array.array""" ⋮---- dtype = "d" ⋮---- dtype = "f" ⋮---- dtype = "b" ⋮---- def input type handler self, cursor, value, arraysize def numpy converter out self, value ⋮---- dtype = np.int8 ⋮---- dtype = np.float32 ⋮---- dtype = np.float64 ⋮---- def output type handler self, cursor, metadata async def check tables self ⋮---- print v "ddl" ⋮---- columns = column 0 .lower for column in cursor.description ⋮---- rows = await cursor.fetchall ⋮---- data = dict zip columns, row for row in rows ⋮---- data = ⋮---- row = await cursor.fetchone ⋮---- data = dict zip columns, row ⋮---- data… 证据：`minirag/kg/oracle_impl.py`
- **INSERT METHODS**（source_file）：class PostgreSQLDB ⋮---- def init self, config, kwargs async def initdb self async def check tables self ⋮---- rows = await connection.fetch sql, params.values ⋮---- rows = await connection.fetch sql ⋮---- columns = col for col in rows 0 .keys data = dict zip columns, row for row in rows ⋮---- data = ⋮---- columns = rows 0 .keys data = dict zip columns, rows 0 ⋮---- data = None ⋮---- @staticmethod async def prerequisite conn: asyncpg.Connection, graph name: str ⋮---- @dataclass class PGKVStorage BaseKVStorage ⋮---- db: PostgreSQLDB = None def post init self async def get by id self, id: str - Union dict, None ⋮---- sql = SQL TEMPLATES "get by id " + self.namespace params = {"workspace": sel… 证据：`minirag/kg/postgres_impl.py`
- **Filter fields if specified**（source_file）：@dataclass class RedisKVStorage BaseKVStorage ⋮---- def post init self ⋮---- redis url = os.environ.get "REDIS URI", "redis://localhost:6379" ⋮---- async def all keys self - list str ⋮---- keys = await self. redis.keys f"{self.namespace}: " ⋮---- async def get by id self, id ⋮---- data = await self. redis.get f"{self.namespace}:{id}" ⋮---- async def get by ids self, ids, fields=None ⋮---- pipe = self. redis.pipeline ⋮---- results = await pipe.execute ⋮---- Filter fields if specified ⋮---- async def filter keys self, data: list str - set str ⋮---- existing ids = {data i for i, exists in enumerate results if exists} ⋮---- async def upsert self, data: dict str, dict async def drop self 证据：`minirag/kg/redis_impl.py`
- **Communication**（documentation）：We provide QR codes for joining the HKUDS discussion groups on WeChat and Feishu. 证据：`Communication.md`
- **MiniRAG: 極めてシンプルな検索強化生成に向けて**（documentation）：! MiniRAG https://files.mdnice.com/user/87760/ff711e74-c382-4432-bec2-e6f2aa787df1.jpg 证据：`README_JA.md`
- **Query Set**（structured_config）：{ "0": { "question": "Did Adam Smith send a message to Li Hua about the upcoming building maintenance schedule before the administrators announced a temporary change in the construction schedule due to weather conditions?", "answer": "Yes", "evidence": "20260121 10:00 20260701 10:00", "type": "Multi" }, "1": { "question": "Did Wolfgang ask Li Hua about watching \"Star Wars: A New Hope\" after he asked Li Hua about going to see \"Overwatch 3\"?", "answer": "Yes", "evidence": "20260121 13:00 20261009 17:00", "type": "Multi" }, "2": { "question": "Did Li Hua agree to go out for dinner after Wolfgang first asked him if he wanted to go out for dinner?", "answer": "Yes", "evidence": "20260123 17:… 证据：`dataset/LiHua-World/qa/query_set.json`
- **Remove config.ini from repo**（source_file）：pycache .egg-info dickens/ book.txt lightrag-dev/ .idea/ dist/ env/ local neo4jWorkDir/ neo4jWorkDir/ ignore this.txt .venv/ .ignore. .ruff cache/ gui/ .log .vscode inputs rag storage .env venv/ examples/input/ examples/output/ .DS Store Remove config.ini from repo .ini build/ minirag-venv/ 证据：`.gitignore`
- **.Pre Commit Config**（source_file）：repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v5.0.0 hooks: - id: trailing-whitespace - id: end-of-file-fixer - id: requirements-txt-fixer - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.6.4 hooks: - id: ruff-format - id: ruff args: --fix, --ignore=E402 - repo: https://github.com/mgedmin/check-manifest rev: "0.49" hooks: - id: check-manifest stages: manual 证据：`.pre-commit-config.yaml`
- **Build stage**（source_file）：Build stage FROM python:3.11-slim as builder 证据：`Dockerfile`
- **Manifest**（source_file）：include README.md include requirements.txt include minirag/api/requirements.txt" MANIFEST.in 证据：`MANIFEST.in`
- **Docker Compose**（source_file）：version: '3.8' services: lightrag: build: . ports: - "${PORT:-9721}:9721" volumes: - ./data/rag storage:/app/data/rag storage - ./data/inputs:/app/data/inputs env file: - .env environment: - TZ=UTC restart: unless-stopped networks: - lightrag net extra hosts: - "host.docker.internal:host-gateway" networks: lightrag net: driver: bridge 证据：`docker-compose.yml`
- **Graph With Html**（source_file）：G = nx.read graphml "./LiHua-World/graph chunk entity relation.graphml" net = Network height="100vh", notebook=True 证据：`graph-visuals/graph_with_html.py`
- **Graph With Neo4J**（source_file）：WORKING DIR = "./LiHua-World" BATCH SIZE NODES = 500 BATCH SIZE EDGES = 100 NEO4J URI = "bolt://localhost:7687" NEO4J USERNAME = "neo4j" NEO4J PASSWORD = "your password" def convert xml to json xml path, output path ⋮---- json data = xml to json xml path ⋮---- def process in batches tx, query, data, batch size ⋮---- batch = data i : i + batch size ⋮---- def main ⋮---- xml file = os.path.join WORKING DIR, "graph chunk entity relation.graphml" json file = os.path.join WORKING DIR, "graph data.json" json data = convert xml to json xml file, json file ⋮---- nodes = json data.get "nodes", edges = json data.get "edges", create nodes query = """ create edges query = """ set displayname and labels… 证据：`graph-visuals/graph_with_neo4j.py`
- **Pyproject**（source_file）：build-system requires = "setuptools =45", "wheel" build-backend = "setuptools.build meta" 证据：`pyproject.toml`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `dataset/LiHua-World/README.md`, `minirag/api/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `dataset/LiHua-World/README.md`, `minirag/api/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **项目概述、安装与快速开始**：importance `high`
  - source_paths: README.md, README_CN.md, setup.py, requirements.txt, pyproject.toml
- **核心架构、异构图索引与检索算法**：importance `high`
  - source_paths: minirag/minirag.py, minirag/operate.py, minirag/base.py, minirag/prompt.py, minirag/utils.py
- **异构存储后端与 LLM/Embedding 绑定**：importance `high`
  - source_paths: minirag/kg/__init__.py, minirag/kg/neo4j_impl.py, minirag/kg/oracle_impl.py, minirag/kg/postgres_impl.py, minirag/kg/redis_impl.py
- **API 服务、Docker 部署与社区常见问题**：importance `high`
  - source_paths: minirag/api/minirag_server.py, minirag/api/README.md, minirag/api/__init__.py, minirag/api/requirements.txt, minirag/api/.env.aoi.example

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `e204d239421f45004852953679927fdf6733f236`
- inspected_files: `Dockerfile`, `README.md`, `docker-compose.yml`, `pyproject.toml`, `requirements.txt`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：MiniRAG re-runs entity extraction on existing chunks without cache checks

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：MiniRAG re-runs entity extraction on existing chunks without cache checks
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/104 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：[bug] In hybrid mode, the context is excessively long and fails to be truncated correctly.

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[bug] In hybrid mode, the context is excessively long and fails to be truncated correctly.
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/108 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：New project. Missing one dependency not listed in requirements.txt

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：New project. Missing one dependency not listed in requirements.txt
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/97 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | https://github.com/HKUDS/MiniRAG | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：发现了一个bug，导致无法获取type_pool

- Trigger: GitHub 社区证据显示该项目存在一个运行相关的待验证问题：发现了一个bug，导致无法获取type_pool
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/95 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 来源证据：Indicate explicitly minimum or recommended python version

- Trigger: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：Indicate explicitly minimum or recommended python version
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/102 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | https://github.com/HKUDS/MiniRAG | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | https://github.com/HKUDS/MiniRAG | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | https://github.com/HKUDS/MiniRAG | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 来源证据：[Performance] MiniRAG underperforms NaiveRAG significantly with Phi-3.5-mini（MiniRAG 在 Phi-3.5-mini 模型上的表现显著不如 NaiveRAG）

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[Performance] MiniRAG underperforms NaiveRAG significantly with Phi-3.5-mini（MiniRAG 在 Phi-3.5-mini 模型上的表现显著不如 NaiveRAG）
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/HKUDS/MiniRAG/issues/109 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。