# nano-graphrag - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 nano-graphrag 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **想在安装前理解开源项目价值和边界的用户**：当前证据主要来自项目文档。 Claim：`clm_0002` unverified 0.25

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`readme.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `pip install -e .` 证据：`readme.md` Claim：`clm_0003` supported 0.86
- `pip install nano-graphrag` 证据：`readme.md` Claim：`clm_0004` supported 0.86
- `curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt` 证据：`readme.md` Claim：`clm_0005` supported 0.86

## 继续前判断卡

- **当前建议**：需要管理员/安全审批
- **为什么**：继续前可能涉及密钥、账号、外部服务或敏感上下文，建议先经过管理员或安全审批。

### 30 秒判断

- **现在怎么做**：需要管理员/安全审批
- **最小安全下一步**：先跑 Prompt Preview；若涉及凭证或企业环境，先审批再试装
- **先别相信**：工具权限边界不能在安装前相信。
- **继续会触碰**：命令执行、本地环境或项目文件、环境变量 / API Key

### 现在可以相信

- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`readme.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`readme.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **工具权限边界不能在安装前相信。**（unverified）：MCP/tool 类项目通常会触碰文件、网络、浏览器或外部 API，必须真实检查权限和日志。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`readme.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`readme.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`readme.md`
- **环境变量 / API Key**：项目入口文档明确出现 API key、token、secret 或账号凭证配置。 原因：如果真实安装需要凭证，应先使用测试凭证并经过权限/合规判断。 证据：`readme.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **不要使用真实生产凭证**：环境变量/API key 一旦进入宿主或工具链，可能产生账号和合规风险。（适用：出现 API、TOKEN、KEY、SECRET 等环境线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **准备撤销测试 API key 或 token**：测试凭证泄露或误用时，可以快速止损。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0006` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`readme.md` Claim：`clm_0007` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`readme.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：53
- 重要文件覆盖：40/53
- 证据索引条目：51
- 角色 / Skill 条目：8

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 nano-graphrag 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 nano-graphrag 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 nano-graphrag 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 8 个角色 / Skill / 项目文档条目。

- **Contributing to nano-graphrag**（project_doc）：Submit your Contribution through PR 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/CONTRIBUTING.md`
- **Install**（project_doc）：A simple, easy-to-hack GraphRAG implementation =3.9.11-blue" 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`readme.md`
- **Leiden.EmptyNetworkError:EmptyNetworkError**（project_doc）：Leiden.EmptyNetworkError:EmptyNetworkError 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/FAQ.md`
- **Next Version**（project_doc）：- Add DSpy for prompt-tuning to make small models Qwen2 7B, Llama 3.1 8B... can extract entities. @NumberChiffre @gusye1234 - Optimize Algorithm: add global local query method, globally rewrite query then perform local search. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/ROADMAP.md`
- **Chain Of Thought Prompting with DSPy-AI v2.4.16**（project_doc）：Chain Of Thought Prompting with DSPy-AI v2.4.16 Main Takeaways - Time difference: 156.99 seconds - Execution time with DSPy-AI: 304.38 seconds - Execution time without DSPy-AI: 147.39 seconds - Entities extracted: 22 without DSPy-AI vs 37 with DSPy-AI - Relationships extracted: 21 without DSPy-AI vs 36 with DSPy-AI 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/benchmark-dspy-entity-extraction.md`
- **Index Benchmark**（project_doc）：- We use A Christmas Carol https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock data.txt by Dickens as the benchmark corpus. - We use 61b5eea34783c58074b3c53f1689ad8a5ba6b6ee commit of Official GraphRAG implementation https://github.com/microsoft/graphrag/tree/main - Both GraphRAG and nano-graphrag use OpenAI Embedding and gpt-4o . - Not Cache for both. On the same device and network connection.… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/benchmark-en.md`
- **Index Benchmark**（project_doc）：- We use 三国演义 https://github.com/tennessine/corpus/blob/master/%E4%B8%89%E5%9B%BD%E6%BC%94%E4%B9%89.txt by 罗贯中 as the benchmark corpus. - We use 61b5eea34783c58074b3c53f1689ad8a5ba6b6ee commit of Official GraphRAG implementation https://github.com/microsoft/graphrag/tree/main - Both GraphRAG and nano-graphrag use OpenAI Embedding and gpt-4o . - Not Cache for both. - On the same device and network connection. - Grapg… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/benchmark-zh.md`
- **Use Neo4J For Graphrag**（project_doc）：1. Install Neo4j https://neo4j.com/docs/operations-manual/current/installation/ please use 5.x version 2. Install Neo4j GDS graph data science plugin https://neo4j.com/docs/graph-data-science/current/installation/neo4j-server/ 3. Start neo4j server 4. Get the NEO4J URL , NEO4J USER and NEO4J PASSWORD - By default, NEO4J URL is neo4j://localhost:7687 , NEO4J USER is neo4j and NEO4J PASSWORD is neo4j 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/use_neo4j_for_graphrag.md`

## 证据索引

- 共索引 51 条证据。

- **Contributing to nano-graphrag**（documentation）：Submit your Contribution through PR 证据：`docs/CONTRIBUTING.md`
- **Install**（documentation）：A simple, easy-to-hack GraphRAG implementation =3.9.11-blue" 证据：`readme.md`
- **License**（source_file）：Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files the "Software" , to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 证据：`LICENSE`
- **Leiden.EmptyNetworkError:EmptyNetworkError**（documentation）：Leiden.EmptyNetworkError:EmptyNetworkError 证据：`docs/FAQ.md`
- **Next Version**（documentation）：- Add DSpy for prompt-tuning to make small models Qwen2 7B, Llama 3.1 8B... can extract entities. @NumberChiffre @gusye1234 - Optimize Algorithm: add global local query method, globally rewrite query then perform local search. 证据：`docs/ROADMAP.md`
- **Chain Of Thought Prompting with DSPy-AI v2.4.16**（documentation）：Chain Of Thought Prompting with DSPy-AI v2.4.16 Main Takeaways - Time difference: 156.99 seconds - Execution time with DSPy-AI: 304.38 seconds - Execution time without DSPy-AI: 147.39 seconds - Entities extracted: 22 without DSPy-AI vs 37 with DSPy-AI - Relationships extracted: 21 without DSPy-AI vs 36 with DSPy-AI 证据：`docs/benchmark-dspy-entity-extraction.md`
- **Index Benchmark**（documentation）：- We use A Christmas Carol https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock data.txt by Dickens as the benchmark corpus. - We use 61b5eea34783c58074b3c53f1689ad8a5ba6b6ee commit of Official GraphRAG implementation https://github.com/microsoft/graphrag/tree/main - Both GraphRAG and nano-graphrag use OpenAI Embedding and gpt-4o . - Not Cache for both. On the same device and network connection. - GrapgRAG Max Async API request: 25 - nano-graphrag Max Async API request: 16 证据：`docs/benchmark-en.md`
- **Index Benchmark**（documentation）：- We use 三国演义 https://github.com/tennessine/corpus/blob/master/%E4%B8%89%E5%9B%BD%E6%BC%94%E4%B9%89.txt by 罗贯中 as the benchmark corpus. - We use 61b5eea34783c58074b3c53f1689ad8a5ba6b6ee commit of Official GraphRAG implementation https://github.com/microsoft/graphrag/tree/main - Both GraphRAG and nano-graphrag use OpenAI Embedding and gpt-4o . - Not Cache for both. - On the same device and network connection. - GrapgRAG Max Async API request: 25 - nano-graphrag Max Async API request: 16 证据：`docs/benchmark-zh.md`
- **Use Neo4J For Graphrag**（documentation）：1. Install Neo4j https://neo4j.com/docs/operations-manual/current/installation/ please use 5.x version 2. Install Neo4j GDS graph data science plugin https://neo4j.com/docs/graph-data-science/current/installation/neo4j-server/ 3. Start neo4j server 4. Get the NEO4J URL , NEO4J USER and NEO4J PASSWORD - By default, NEO4J URL is neo4j://localhost:7687 , NEO4J USER is neo4j and NEO4J PASSWORD is neo4j 证据：`docs/use_neo4j_for_graphrag.md`
- **Have to re-enable the standard pragma**（source_file）：report exclude lines = Have to re-enable the standard pragma pragma: no cover 证据：`.coveragerc`
- **.Env.Example**（source_file）：API KEY EMB=" " AZURE ENDPOINT EMB=" " API VERSION EMB=" " 证据：`.env.example.azure`
- **Created by https://www.toptal.com/developers/gitignore/api/python**（source_file）：Created by https://www.toptal.com/developers/gitignore/api/python Edit at https://www.toptal.com/developers/gitignore?templates=python test cache.json run test .py nano graphrag cache / .txt examples/benchmarks/fixtures/ tests/original workflow.txt Python Byte-compiled / optimized / DLL files pycache / .py cod $py.class .vscode .DS Store C extensions .so 证据：`.gitignore`
- **Manifest**（source_file）：include readme.md 证据：`MANIFEST.in`
- **Finetune Entity Relationship Dspy**（source_file）：{ "cells": { "cell type": "markdown", "metadata": {}, "source": " Evaluating Entity Relationship Extraction with DSPy & Fine-Tune Prompt Instructions\n", "\n", " Steps\n", "- Load DSPy examples separated into train, val, dev that are saved locally.\n", "- Evaluate the extraction module with the dev examples to determine the baseline scores, i.e: the fine-tuned extraction module should score higher.\n", "- Run bootstrapping with random search with train examples, evaluate its compiled extraction module on the same dev examples to compare against baseline scores.\n", "- Run MIPROv2 with train and dev examples, evaluate its compiled extraction module on the same dev examples to compare against… 证据：`examples/finetune_entity_relationship_dspy.ipynb`
- **Generate Entity Relationship Dspy**（source_file）：{ "cells": { "cell type": "markdown", "metadata": {}, "source": " Generate Examples for Entity Relationship Extraction\n", "\n", "- Taking datasets from Huggingface containing news articles and generate entities and relationships out of each news article.\n", "- Save them as DSPy examples locally to be used for fine-tuning prompt instructions." }, { "cell type": "code", "execution count": null, "metadata": {}, "outputs": , "source": "import nest asyncio\n", "nest asyncio.apply " }, { "cell type": "code", "execution count": 2, "metadata": {}, "outputs": { "name": "stderr", "output type": "stream", "text": "/opt/homebrew/Caskroom/miniconda/base/envs/nano-graphrag/lib/python3.10/site-packages/… 证据：`examples/generate_entity_relationship_dspy.ipynb`
- **main function**（source_file）：def graphml to json graphml file ⋮---- G = nx.read graphml graphml file data = nx.node link data G ⋮---- def create html html path ⋮---- html content = ''' ⋮---- def create json json data, json path ⋮---- json data = "var graphJson = " + json data.replace '\\"', '' .replace "'", "\\'" .replace "\n", "" ⋮---- def start server port ⋮---- handler = http.server.SimpleHTTPRequestHandler ⋮---- main function def visualize graphml graphml file, html path, port=8000 ⋮---- json data = graphml to json graphml file html dir = os.path.dirname html path ⋮---- json path = os.path.join html dir, 'graph json.js' ⋮---- start server in background server thread = threading.Thread target=start server port ⋮----… 证据：`examples/graphml_visualize.py`
- **No Openai Key At All**（source_file）：WORKING DIR = "./nano graphrag cache ollama TEST" MODEL = "qwen2" EMBED MODEL = SentenceTransformer ⋮---- async def local embedding texts: list str - np.ndarray ⋮---- ollama client = ollama.AsyncClient messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await ollama client.chat model=MODEL, messages=messages, kwargs result = response "message" "content" ⋮---- def remove if exist file def query ⋮---- rag = GraphRAG ⋮---- def insert ⋮---- FAKE TEXT = f.read ⋮---- start = time 证据：`examples/no_openai_key_at_all.py`
- **Using Amazon Bedrock**（source_file）：graph func = GraphRAG ⋮---- prompt = "What are the top themes in this story?" 证据：`examples/using_amazon_bedrock.py`
- **Using Custom Chunking Method**（source_file）：results = ⋮---- chunk token = lengths = ⋮---- chunk token = tiktoken model.decode batch chunk token ⋮---- WORKING DIR = "./nano graphrag cache local embedding TEST" rag = GraphRAG 证据：`examples/using_custom_chunking_method.py`
- **Using Deepseek Api As Llm+Glm Api As Embedding**（source_file）：GLM API KEY = "XXXX" DEEPSEEK API KEY = "sk-XXXX" MODEL = "deepseek-chat" ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- def remove if exist file ⋮---- @dataclass class EmbeddingFunc ⋮---- embedding dim: int max token size: int func: callable async def call self, args, kwargs - np.ndarray def wrap embedding func with attrs kwargs ⋮---- def final decro func - EmbeddingFunc ⋮---- new func = EmbeddingFunc kwargs, func=func ⋮---- @wrap embedding func… 证据：`examples/using_deepseek_api_as_llm+glm_api_as_embedding.py`
- **Using Deepseek As Llm**（source_file）：DEEPSEEK API KEY = "sk-XXXX" MODEL = "deepseek-chat" ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- def remove if exist file WORKING DIR = "./nano graphrag cache deepseek TEST" def query ⋮---- rag = GraphRAG ⋮---- def insert ⋮---- FAKE TEXT = f.read ⋮---- start = time 证据：`examples/using_deepseek_as_llm.py`
- **Using Dspy Entity Extraction**（source_file）：WORKING DIR = "./nano graphrag cache using dspy entity extraction" ⋮---- EMBED MODEL = SentenceTransformer ⋮---- async def local embedding texts: list str - np.ndarray ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash model, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- def remove if exist file def insert ⋮---- FAKE TEXT = f.read ⋮---- rag = GraphRAG start = time ⋮---- def query ⋮---- lm = dspy.LM 证据：`examples/using_dspy_entity_extraction.py`
- **Using Faiss As Vextordb**（source_file）：WORKING DIR = "./nano graphrag cache faiss TEST" ⋮---- @dataclass class FAISSStorage BaseVectorStorage ⋮---- def post init self async def upsert self, data: dict str, dict ⋮---- contents = v "content" for v in data.values batches = embeddings list = await asyncio.gather embeddings = np.concatenate embeddings list ids = ⋮---- id = xxhash.xxh32 intdigest k.encode metadata = {k1: v1 for k1, v1 in v.items if k1 in self.meta fields} ⋮---- ids = np.array ids, dtype=np.int64 ⋮---- async def query self, query, top k=5 ⋮---- embedding = await self.embedding func query ⋮---- results = ⋮---- metadata = self. metadata id ⋮---- async def index done callback self ⋮---- graph func = GraphRAG 证据：`examples/using_faiss_as_vextorDB.py`
- **Using Hnsw As Vectordb**（source_file）：WORKING DIR = "./nano graphrag cache using hnsw as vectorDB" ⋮---- EMBED MODEL = SentenceTransformer ⋮---- async def local embedding texts: list str - np.ndarray ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash model, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- def remove if exist file def insert ⋮---- FAKE TEXT = f.read ⋮---- rag = GraphRAG start = time ⋮---- def query 证据：`examples/using_hnsw_as_vectorDB.py`
- **Using Llm Api As Llm+Ollama Embedding**（source_file）：LLM BASE URL = "https://your.api.url" LLM API KEY = "your api key" MODEL = "your model name" EMBEDDING MODEL = "nomic-embed-text" EMBEDDING MODEL DIM = 768 EMBEDDING MODEL MAX TOKENS = 8192 ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- def remove if exist file WORKING DIR = "./nano graphrag cache llm TEST" def query ⋮---- rag = GraphRAG ⋮---- def insert ⋮---- FAKE TEXT = f.read ⋮---- start = time ⋮---- async def ollama embedding texts :list str -… 证据：`examples/using_llm_api_as_llm+ollama_embedding.py`
- **Using Local Embedding Model**（source_file）：WORKING DIR = "./nano graphrag cache local embedding TEST" EMBED MODEL = SentenceTransformer ⋮---- async def local embedding texts: list str - np.ndarray rag = GraphRAG ⋮---- FAKE TEXT = f.read 证据：`examples/using_local_embedding_model.py`
- **Using Milvus As Vectordb**（source_file）：@dataclass class MilvusLiteStorge BaseVectorStorage ⋮---- @staticmethod def create collection if not exist client, collection name: str, kwargs def post init self async def upsert self, data: dict str, dict ⋮---- list data = contents = v "content" for v in data.values batches = embeddings list = await asyncio.gather embeddings = np.concatenate embeddings list ⋮---- results = self. client.upsert collection name=self.namespace, data=list data ⋮---- async def query self, query, top k=5 ⋮---- embedding = await self.embedding func query results = self. client.search ⋮---- def insert ⋮---- data = "YOUR TEXT DATA HERE", "YOUR TEXT DATA HERE" rag = GraphRAG ⋮---- def query 证据：`examples/using_milvus_as_vectorDB.py`
- **Using Ollama As Llm**（source_file）：MODEL = "qwen2" ⋮---- ollama client = ollama.AsyncClient messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await ollama client.chat model=MODEL, messages=messages, kwargs result = response "message" "content" ⋮---- def remove if exist file WORKING DIR = "./nano graphrag cache ollama TEST" def query ⋮---- rag = GraphRAG ⋮---- def insert ⋮---- FAKE TEXT = f.read ⋮---- start = time 证据：`examples/using_ollama_as_llm.py`
- **Using Ollama As Llm And Embedding**（source_file）：MODEL = "your model name" EMBEDDING MODEL = "nomic-embed-text" EMBEDDING MODEL DIM = 768 EMBEDDING MODEL MAX TOKENS = 8192 ⋮---- ollama client = ollama.AsyncClient messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash MODEL, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await ollama client.chat model=MODEL, messages=messages, kwargs result = response "message" "content" ⋮---- def remove if exist file WORKING DIR = "./nano graphrag cache ollama TEST" def query ⋮---- rag = GraphRAG ⋮---- def insert ⋮---- FAKE TEXT = f.read ⋮---- start = time ⋮---- async def ollama embedding texts: list str - np.ndarray… 证据：`examples/using_ollama_as_llm_and_embedding.py`
- **Using Qdrant As Vectordb**（source_file）：@dataclass class QdrantStorage BaseVectorStorage ⋮---- def post init self async def upsert self, data: dict str, dict ⋮---- list data = contents = v "content" for v in data.values batches = embeddings list = await asyncio.gather embeddings = np.concatenate embeddings list points = results = self. client.upsert collection name=self.namespace, points=points ⋮---- async def query self, query, top k=5 ⋮---- embedding = await self.embedding func query results = self. client.query points ⋮---- def insert ⋮---- data = "YOUR TEXT DATA HERE", "YOUR TEXT DATA HERE" rag = GraphRAG ⋮---- def query 证据：`examples/using_qdrant_as_vectorDB.py`
- **Init**（source_file）：version = "0.0.8.2" author = "Jianbai Ye" url = "https://github.com/gusye1234/nano-graphrag" 证据：`nano_graphrag/__init__.py`
- **Llm**（source_file）：global openai async client = None global azure openai async client = None global amazon bedrock async client = None def get openai async client instance ⋮---- global openai async client = AsyncOpenAI ⋮---- def get azure openai async client instance ⋮---- global azure openai async client = AsyncAzureOpenAI ⋮---- def get amazon bedrock async client instance ⋮---- global amazon bedrock async client = aioboto3.Session ⋮---- openai async client = get openai async client instance hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None messages = ⋮---- args hash = compute args hash model, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.c… 证据：`nano_graphrag/_llm.py`
- **add this record as a node in the G**（source_file）：results = ⋮---- chunk token = lengths = ⋮---- chunk texts = tokenizer wrapper.decode batch chunk token ⋮---- separators = tokenizer wrapper.encode s for s in PROMPTS "default text separator" splitter = SeparatorSplitter ⋮---- chunk tokens = splitter.split tokens tokens lengths = len c for c in chunk tokens decoded chunks = tokenizer wrapper.decode batch chunk tokens ⋮---- def get chunks new docs, chunk func=chunking by token size, tokenizer wrapper: TokenizerWrapper = None, chunk func params ⋮---- inserting chunks = {} new docs list = list new docs.items docs = new doc 1 "content" for new doc in new docs list doc keys = new doc 0 for new doc in new docs list tokens = tokenizer wrapper.encod… 证据：`nano_graphrag/_op.py`
- **Splitter**（source_file）：class SeparatorSplitter ⋮---- def split tokens self, tokens: List int - List List int ⋮---- splits = self. split tokens with separators tokens ⋮---- def split tokens with separators self, tokens: List int - List List int ⋮---- splits = current split = i = 0 ⋮---- separator found = False ⋮---- separator found = True ⋮---- def merge splits self, splits: List List int - List List int ⋮---- merged splits = current chunk = ⋮---- current chunk = split ⋮---- def split chunk self, chunk: List int - List List int ⋮---- result = ⋮---- new chunk = chunk i:i + self. chunk size ⋮---- def enforce overlap self, chunks: List List int - List List int ⋮---- overlap = chunks i-1 -self. chunk overlap: new chun… 证据：`nano_graphrag/_splitter.py`
- **+++ 新增 +++: 增加一个批量解码的方法以提高效率，并保持接口一致性**（source_file）：logger = logging.getLogger "nano-graphrag" ⋮---- def always get an event loop - asyncio.AbstractEventLoop ⋮---- loop = asyncio.get event loop ⋮---- loop = asyncio.new event loop ⋮---- def extract first complete json s: str ⋮---- stack = first json start = None ⋮---- first json start = i ⋮---- start = stack.pop ⋮---- first json str = s first json start:i+1 ⋮---- def parse value value: str ⋮---- """Convert a string value to its appropriate type int, float, bool, None, or keep as string . Work as a more broad 'eval '""" value = value.strip ⋮---- def extract values from json json string, keys= "reasoning", "answer", "data" , allow no quotes=False ⋮---- extracted values = {} regex pattern = r' ?… 证据：`nano_graphrag/_utils.py`
- **Base**（source_file）：@dataclass class QueryParam ⋮---- mode: Literal "local", "global", "naive" = "global" only need context: bool = False response type: str = "Multiple Paragraphs" level: int = 2 top k: int = 20 naive max token for text unit = 12000 local max token for text unit: int = 4000 local max token for local context: int = 4800 local max token for community report: int = 3200 local community single one: bool = False global min community rating: float = 0 global max consider community: float = 512 global max token for community report: int = 16384 global special community map llm kwargs: dict = field TextChunkSchema = TypedDict SingleCommunitySchema = TypedDict class CommunitySchema SingleCommunitySchem… 证据：`nano_graphrag/base.py`
- **graph mode**（source_file）：@dataclass class GraphRAG ⋮---- working dir: str = field graph mode enable local: bool = True enable naive rag: bool = False text chunking tokenizer type: str = "tiktoken" tiktoken model name: str = "gpt-4o" huggingface model name: str = "bert-base-uncased" chunk func: Callable chunk token size: int = 1200 chunk overlap token size: int = 100 entity extract max gleaning: int = 1 entity summary to max tokens: int = 500 graph cluster algorithm: str = "leiden" max graph cluster size: int = 10 graph cluster seed: int = 0xDEADBEEF node embedding algorithm: str = "node2vec" node2vec params: dict = field special community report llm kwargs: dict = field embedding func: EmbeddingFunc = field default… 证据：`nano_graphrag/graphrag.py`
- **Prompt**（source_file）：GRAPH FIELD SEP = " " PROMPTS = {} 证据：`nano_graphrag/prompt.py`
- **Setup**（source_file）：long description = fh.read vars2find = " author ", " version ", " url " vars2readme = {} ⋮---- line = line.replace " ", "" .replace '"', "" .replace "'", "" .strip ⋮---- deps = 证据：`setup.py`
- **Dspy Entity**（source_file）：WORKING DIR = "./nano graphrag cache dspy entity" ⋮---- logger = logging.getLogger "nano-graphrag" ⋮---- openai async client = AsyncOpenAI messages = ⋮---- hashing kv: BaseKVStorage = kwargs.pop "hashing kv", None ⋮---- args hash = compute args hash model, messages if cache return = await hashing kv.get by id args hash ⋮---- response = await openai async client.chat.completions.create ⋮---- async def benchmark entity extraction text: str, system prompt: str, use dspy: bool = False ⋮---- working dir = os.path.join WORKING DIR, f"use dspy={use dspy}" ⋮---- start time = time.time graph storage = NetworkXStorage namespace="test", global config={ chunks = {compute mdhash id text, prefix="chunk-"… 证据：`examples/benchmarks/dspy_entity.py`
- **Eval Naive Graphrag On Multi Hop**（source_file）：{ "cells": { "cell type": "markdown", "metadata": {}, "source": "In this tutorial, we are going to evaluate the performance of the naive RAG and the GraphRAG algorithm on a multi-hop RAG task https://github.com/yixuantt/MultiHop-RAG ." }, { "cell type": "markdown", "metadata": {}, "source": " Setup\n", "Make sure you install the necessary dependencies by running the following commands:" }, { "cell type": "code", "execution count": null, "metadata": {}, "outputs": , "source": "!pip install ragas nest asyncio datasets" }, { "cell type": "markdown", "metadata": {}, "source": "Import the necessary libraries, and set up your openai api key if needed:" }, { "cell type": "code", "execution count":… 证据：`examples/benchmarks/eval_naive_graphrag_on_multi_hop.ipynb`
- **Hnsw Vs Nano Vector Storage**（source_file）：WORKING DIR = "./nano graphrag cache benchmark hnsw vs nano vector storage" DATA LEN = 100 000 FAKE DIM = 1024 BATCH SIZE = 100000 ⋮---- @wrap embedding func with attrs embedding dim=FAKE DIM, max token size=8192 async def sample embedding texts: list str - np.ndarray def generate test data async def benchmark storage storage class, name ⋮---- rag = GraphRAG working dir=WORKING DIR, embedding func=sample embedding storage = storage class test data = generate test data ⋮---- start time = time.time ⋮---- batch = {k: test data k for k in list test data.keys i:i+BATCH SIZE } ⋮---- insert time = time.time - start time save start time = time.time ⋮---- save time = time.time - save start time ⋮---… 证据：`examples/benchmarks/hnsw_vs_nano_vector_storage.py`
- **Md5 Vs Xxhash**（source_file）：def xxhash ids data: list str - np.ndarray def md5 ids data: list str - np.ndarray ⋮---- num ids = 1000000 num iterations = 100 xxhash times = md5 times = ⋮---- test data = f"{i} {j}" for j in range num ids start time = time.time xxhash result = xxhash ids test data ⋮---- md5 result = md5 ids test data ⋮---- avg xxhash time = np.mean xxhash times avg md5 time = np.mean md5 times std xxhash time = np.std xxhash times std md5 time = np.std md5 times 证据：`examples/benchmarks/md5_vs_xxhash.py`
- **Drop the projected graph**（source_file）：neo4j lock = asyncio.Lock def make path idable path ⋮---- @dataclass class Neo4jStorage BaseGraphStorage ⋮---- def post init self async def init workspace self async def index start callback self async def has node self, node id: str - bool ⋮---- result = await session.run record = await result.single ⋮---- async def has edge self, source node id: str, target node id: str - bool async def node degree self, node id: str - int ⋮---- results = await self.node degrees batch node id ⋮---- async def node degrees batch self, node ids: List str - List str ⋮---- result dict = {node id: 0 for node id in node ids} ⋮---- async def edge degree self, src id: str, tgt id: str - int ⋮---- results = await s… 证据：`nano_graphrag/_storage/gdb_neo4j.py`
- **Gdb Networkx**（source_file）：@dataclass class NetworkXStorage BaseGraphStorage ⋮---- @staticmethod def load nx graph file name - nx.Graph ⋮---- @staticmethod def write nx graph graph: nx.Graph, file name ⋮---- @staticmethod def stable largest connected component graph: nx.Graph - nx.Graph ⋮---- graph = graph.copy graph = cast nx.Graph, largest connected component graph node mapping = {node: html.unescape node.upper .strip for node in graph.nodes } graph = nx.relabel nodes graph, node mapping ⋮---- @staticmethod def stabilize graph graph: nx.Graph - nx.Graph ⋮---- fixed graph = nx.DiGraph if graph.is directed else nx.Graph sorted nodes = graph.nodes data=True sorted nodes = sorted sorted nodes, key=lambda x: x 0 ⋮---- e… 证据：`nano_graphrag/_storage/gdb_networkx.py`
- **Kv Json**（source_file）：@dataclass class JsonKVStorage BaseKVStorage ⋮---- def post init self ⋮---- working dir = self.global config "working dir" ⋮---- async def all keys self - list str async def index done callback self async def get by id self, id async def get by ids self, ids, fields=None async def filter keys self, data: list str - set str async def upsert self, data: dict str, dict async def drop self 证据：`nano_graphrag/_storage/kv_json.py`
- **Vdb Hnswlib**（source_file）：@dataclass class HNSWVectorStorage BaseVectorStorage ⋮---- ef construction: int = 100 M: int = 16 max elements: int = 1000000 ef search: int = 50 num threads: int = -1 index: Any = field init=False metadata: dict str, dict = field default factory=dict current elements: int = 0 def post init self ⋮---- hnsw params = self.global config.get "vector db storage cls kwargs", {} ⋮---- async def upsert self, data: dict str, dict - np.ndarray ⋮---- list data = contents = v "content" for v in data.values batch size = min self. embedding batch num, len contents embeddings = np.concatenate ids = np.fromiter ⋮---- async def query self, query: str, top k: int = 5 - list dict ⋮---- top k = min top k, self… 证据：`nano_graphrag/_storage/vdb_hnswlib.py`
- **Vdb Nanovectordb**（source_file）：@dataclass class NanoVectorDBStorage BaseVectorStorage ⋮---- cosine better than threshold: float = 0.2 def post init self async def upsert self, data: dict str, dict ⋮---- list data = contents = v "content" for v in data.values batches = embeddings list = await asyncio.gather embeddings = np.concatenate embeddings list ⋮---- results = self. client.upsert datas=list data ⋮---- async def query self, query: str, top k=5 ⋮---- embedding = await self.embedding func query embedding = embedding 0 results = self. client.query results = ⋮---- async def index done callback self 证据：`nano_graphrag/_storage/vdb_nanovectordb.py`
- **Extract**（source_file）：entity extractor = TypedEntityRelationshipExtractor num refine turns=1, self refine=True ⋮---- ordered chunks = list chunks.items already processed = 0 already entities = 0 already relations = 0 ⋮---- chunk dp = chunk key dp 1 content = chunk dp "content" ⋮---- prediction = await asyncio.to thread entity extractor, input text=content ⋮---- example = dspy.Example ⋮---- now ticks = PROMPTS "process tickers" ⋮---- examples = await asyncio.gather filtered examples = num filtered examples = len examples - len filtered examples ⋮---- async def process single content chunk key dp: tuple str, TextChunkSchema ⋮---- chunk key = chunk key dp 0 ⋮---- maybe nodes = defaultdict list maybe edges = default… 证据：`nano_graphrag/entity_extraction/extract.py`
- **Metric**（source_file）：class AssessRelationships dspy.Signature ⋮---- gold relationships: list Relationship = dspy.InputField predicted relationships: list Relationship = dspy.InputField similarity score: float = dspy.OutputField ⋮---- model = dspy.ChainOfThought AssessRelationships gold relationships = Relationship item for item in gold "relationships" predicted relationships = Relationship item for item in pred "relationships" similarity score = float ⋮---- true set = set item "entity name" for item in gold "entities" pred set = set item "entity name" for item in pred "entities" true positives = len pred set.intersection true set false negatives = len true set - pred set recall = 证据：`nano_graphrag/entity_extraction/metric.py`
- **Module**（source_file）：ENTITY TYPES = class Entity BaseModel ⋮---- entity name: str = Field ..., description="The name of the entity." entity type: str = Field ..., description="The type of the entity." description: str = Field importance score: float = Field def to dict self class Relationship BaseModel ⋮---- src id: str = Field ..., description="The name of the source entity." tgt id: str = Field ..., description="The name of the target entity." ⋮---- weight: float = Field order: int = Field ⋮---- class CombinedExtraction dspy.Signature ⋮---- input text: str = dspy.InputField entity types: list str = dspy.InputField entities: list Entity = dspy.OutputField relationships: list Relationship = dspy.OutputField cla… 证据：`nano_graphrag/entity_extraction/module.py`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`docs/CONTRIBUTING.md`, `readme.md`, `LICENSE`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`docs/CONTRIBUTING.md`, `readme.md`, `LICENSE`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **项目概览与快速入门**：importance `high`
  - source_paths: readme.md, nano_graphrag/__init__.py, nano_graphrag/graphrag.py, setup.py, requirements.txt
- **系统架构、核心模块与查询流程**：importance `high`
  - source_paths: nano_graphrag/graphrag.py, nano_graphrag/base.py, nano_graphrag/_op.py, nano_graphrag/_splitter.py, nano_graphrag/_utils.py
- **存储后端、可视化与数据流**：importance `high`
  - source_paths: nano_graphrag/_storage/__init__.py, nano_graphrag/_storage/kv_json.py, nano_graphrag/_storage/vdb_nanovectordb.py, nano_graphrag/_storage/vdb_hnswlib.py, nano_graphrag/_storage/gdb_networkx.py
- **LLM/嵌入扩展、提示词与故障排查**：importance `high`
  - source_paths: nano_graphrag/_llm.py, nano_graphrag/prompt.py, nano_graphrag/_op.py, examples/using_ollama_as_llm.py, examples/using_ollama_as_llm_and_embedding.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `acb35c065614eb5a2f5f1be9a56b235f5a2e0a7a`
- inspected_files: `readme.md`, `requirements.txt`, `docs/CONTRIBUTING.md`, `docs/FAQ.md`, `docs/ROADMAP.md`, `docs/benchmark-dspy-entity-extraction.md`, `docs/benchmark-en.md`, `docs/benchmark-zh.md`, `docs/use_neo4j_for_graphrag.md`, `examples/benchmarks/dspy_entity.py`, `examples/benchmarks/hnsw_vs_nano_vector_storage.py`, `examples/benchmarks/md5_vs_xxhash.py`, `examples/graphml_visualize.py`, `examples/no_openai_key_at_all.py`, `examples/using_amazon_bedrock.py`, `examples/using_custom_chunking_method.py`, `examples/using_deepseek_api_as_llm+glm_api_as_embedding.py`, `examples/using_deepseek_as_llm.py`, `examples/using_dspy_entity_extraction.py`, `examples/using_faiss_as_vextorDB.py`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：JSONDecodeError using no_openai_key_at_all.py

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：JSONDecodeError using no_openai_key_at_all.py
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/gusye1234/nano-graphrag/issues/75 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据："'charmap' codec can't decode" error encountered when installing on windows from source

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题："'charmap' codec can't decode" error encountered when installing on windows from source
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/gusye1234/nano-graphrag/issues/163 | 来源讨论提到 windows 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：docker部署

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：docker部署
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/gusye1234/nano-graphrag/issues/166 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 可能修改宿主 AI 配置

- Trigger: 项目面向 Claude/Cursor/Codex/Gemini/OpenCode 等宿主，或安装命令涉及用户配置目录。
- Host AI rule: 列出会写入的配置文件、目录和卸载/回滚步骤。
- Why it matters: 安装可能改变本机 AI 工具行为，用户需要知道写入位置和回滚方法。
- Evidence: capability.host_targets | https://github.com/gusye1234/nano-graphrag | host_targets=claude, chatgpt
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：Writing graph with 0 edges triggers leiden error

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：Writing graph with 0 edges triggers leiden error
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/gusye1234/nano-graphrag/issues/167 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | https://github.com/gusye1234/nano-graphrag | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | https://github.com/gusye1234/nano-graphrag | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | https://github.com/gusye1234/nano-graphrag | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | https://github.com/gusye1234/nano-graphrag | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 来源证据：Installation issue on Windows machine related to encoding

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Installation issue on Windows machine related to encoding
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/gusye1234/nano-graphrag/issues/125 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。