neo4j-graphrag-python 项目说明书

Doramagic 项目包 · 项目说明书

neo4j-graphrag-python 项目

Neo4j GraphRPython：用于在 Python 中构建基于 Neo4j 知识图谱的检索增强生成（RAG）应用的官方库。

概述、架构与快速开始

neo4j-graphrag-python 是 Neo4j 官方的 Python SDK，用于在 Neo4j 图数据库之上构建生成式 AI（GenAI）与检索增强生成（RAG）应用。它把大语言模型（LLM）、Embedding、文本切分、实体关系抽取与 Neo4j 图写入能力组合为可复用组件与流水线（Pipeline），支持从 PDF / Markdown / 纯文本自动构...

章节 相关页面

继续阅读本节完整说明和来源证据。

项目概述

neo4j-graphrag-python 是 Neo4j 官方的 Python SDK，用于在 Neo4j 图数据库之上构建生成式 AI（GenAI）与检索增强生成（RAG）应用。它把大语言模型（LLM）、Embedding、文本切分、实体关系抽取与 Neo4j 图写入能力组合为可复用组件与流水线（Pipeline），支持从 PDF / Markdown / 纯文本自动构建知识图谱（KG），并基于该图谱执行问答与多工具检索。资料来源：README.md

核心架构

整套 SDK 围绕「组件（Component）+ 流水线（Pipeline）」模式构建。Pipeline 通过 add_component 注册阶段化任务，SimpleKGPipeline 则是对完整流程的高层封装，底层组件实现统一位于 src/neo4j_graphrag/experimental/components/。

flowchart LR
  A[DataLoader<br/>PDF/MD/TXT] --> B[TextSplitter]
  B --> C[SchemaFromTextExtractor]
  C --> D[EntityRelationExtractor]
  D --> E[LexicalGraphBuilder]
  E --> F[Neo4jWriter]
  F --> G[(Neo4j Graph)]
  H[Embedder] --> G
  I[LLM] --> D

主要组件及职责：

DataLoader：读取 PDF/Markdown 等原始文档并产出 LoadedDocument。资料来源：src/neo4j_graphrag/experimental/components/data_loader.py
TextSplitter：把文本切分为 TextChunk，LangChain 适配器位于 components/text_splitters/langchain.py。资料来源：src/neo4j_graphrag/experimental/components/text_splitters/langchain.py
SchemaFromTextExtractor：从文本中抽取节点/关系类型并组装 GraphSchema，支持 V1 prompt-based JSON 与 V2 结构化输出（仅 OpenAI / Vertex AI）。资料来源：src/neo4j_graphrag/experimental/components/schema.py
EntityRelationExtractor：基于 Schema 抽取实体与关系，组装 Neo4jGraph，通过 OnError 控制 JSON 解析失败行为。资料来源：src/neo4j_graphrag/experimental/components/entity_relation_extractor.py
LexicalGraphBuilder：生成文档—分块—顺序的词法图，含 Document、__Chunk__ 节点与 NEXT_CHUNK 关系。资料来源：src/neo4j_graphrag/experimental/components/lexical_graph.py
Neo4jWriter：把图批量写入 Neo4j，自动创建 KEY / UNIQUENESS 约束（1.18.0 起禁止同一属性同时声明 KEY 与 EXISTENCE）。资料来源：src/neo4j_graphrag/experimental/components/kg_writer.py

SimpleKGPipeline 把以上链路串成一个 run() 调用，并接受 from_file/schema 等参数。资料来源：src/neo4j_graphrag/experimental/pipeline/kg_builder.py

快速开始

最小可运行的「PDF → 知识图谱」示例：

import asyncio
from neo4j import GraphDatabase
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OpenAILLM

driver = GraphDatabase.driver("neo4j://localhost:7687", auth=("neo4j", "password"))

pipeline = SimpleKGPipeline(
    driver=driver,
    llm=OpenAILLM(model_name="gpt-4o"),
    embedder=OpenAIEmbeddings(model="text-embedding-3-large"),
    from_file=True,
    schema={
        "node_types": ["Person", "House", "Planet"],
        "relationship_types": ["PARENT_OF", "HEIR_OF", "RULES"],
        "patterns": [
            ("Person", "PARENT_OF", "Person"),
            ("Person", "HEIR_OF", "House"),
            ("House", "RULES", "Planet"),
        ],
    },
)
asyncio.run(pipeline.run(file_path="./doc.pdf"))

检索侧可使用 VectorCypherRetriever、HybridRetriever、Text2CypherRetriever 或 ToolsRetriever。ToolsRetriever 由 LLM 自动挑选并执行多个工具，要求每个工具名称唯一。资料来源：src/neo4j_graphrag/retrievers/tools_retriever.py、`examples/README.md]()

LLM 适配器覆盖 OpenAI、Anthropic、Google GenAI、Vertex AI、Ollama、Cohere、MistralAI 等。VertexAILLM 通过 _raw_generation_config 透传参数并支持结构化输出；OllamaLLM 提供 acall 与工具调用能力。资料来源：src/neo4j_graphrag/llm/vertexai_llm.py、src/neo4j_graphrag/llm/google_genai_llm.py、src/neo4j_graphrag/llm/ollama_llm.py

扩展与已知限制

Prompt 定制：ERExtractionTemplate 与 Text2CypherTemplate 暴露 format() 钩子，可注入 few-shot 示例与自定义字段。Text2CypherTemplate 同时支持 query_text 与已废弃的 query 参数。资料来源：src/neo4j_graphrag/generation/prompts.py
数据模型：Neo4jNode、Neo4jRelationship、TextChunk、TextChunks 等核心类型定义于 components/types.py，所有 KG 抽取/写入组件均围绕这些 Pydantic 模型进行交换。资料来源：src/neo4j_graphrag/experimental/components/types.py
检索器异步化：社区 Issue #406 指出当前 Retrievers 仅接受同步 Neo4j 驱动，全异步化仍在路线图中。
Message History 时间戳：Issue #321 提议为 Neo4jMessageHistory 写入节点补充 datetime() 属性，便于按时间排序分析。
return_context 默认值：Issue #148 讨论 GraphRAG.search() 默认行为——开启可观测性更好，但可能影响提示长度与延迟。

来源：https://github.com/neo4j/neo4j-graphrag-python / 项目说明书

检索器与 GraphRAG 生成

neo4j-graphrag-python 提供两大紧密耦合的子系统：检索器（Retrievers）与 GraphRAG 生成器（Generators）。前者从 Neo4j 图数据库中根据用户输入（文本查询或嵌入向量）召回上下文片段；后者将该上下文连同原始问题交给大语言模型（LLM），合成最终答案。最常见的端到端调用入口是 GraphRAG.search()，它把单个 R...

章节 相关页面

继续阅读本节完整说明和来源证据。

章节 抽象基类与通用生命周期

继续阅读本节完整说明和来源证据。

章节 主要检索器类型

继续阅读本节完整说明和来源证据。

章节 异步执行能力（社区关注点）

继续阅读本节完整说明和来源证据。

概述

neo4j-graphrag-python 提供两大紧密耦合的子系统：检索器（Retrievers） 与 GraphRAG 生成器（Generators）。前者从 Neo4j 图数据库中根据用户输入（文本查询或嵌入向量）召回上下文片段；后者将该上下文连同原始问题交给大语言模型（LLM），合成最终答案。最常见的端到端调用入口是 GraphRAG.search()，它把单个 Retriever 与 PromptTemplate、LLM、对话历史整合为一个统一的问答接口。资料来源：src/neo4j_graphrag/generation/graphrag.py

examples/README.md 列出的检索范式覆盖了几乎全部官方支持的入口，包括 VectorRetriever、VectorCypherRetriever、HybridRetriever、HybridCypherRetriever、Text2CypherRetriever 以及基于工具调用（Tool Calling）的 tools_retriever 变体。资料来源：examples/README.md

检索器架构

抽象基类与通用生命周期

所有 Retriever 都继承自 Retriever 抽象基类，统一持有同步 Neo4j 驱动实例。基类负责校验输入、执行搜索方法 search()，并返回标准化的 RetrieverResult（包含 items、metadata、custom_embeddings 等可选字段）。资料来源：src/neo4j_graphrag/retrievers/base.py

主要检索器类型

检索器	输入	检索机制	典型用途
`VectorRetriever`	文本或预生成向量	Neo4j 向量索引上的相似度搜索	纯语义召回
`VectorCypherRetriever`	文本或向量	向量召回 + Cypher 扩展遍历	带图关系扩展的语义召回
`HybridRetriever`	文本	向量 + 全文索引融合	兼顾语义与关键词命中
`HybridCypherRetriever`	文本	Hybrid 召回 + Cypher 扩展	兼顾关键词与图遍历
`Text2CypherRetriever`	自然语言查询	LLM 生成 Cypher 后执行	精确结构化查询
Cypher Template Tool	用户问题	由 LLM 选择预定义 Cypher 模板	受限的安全检索

资料来源：examples/README.md

异步执行能力（社区关注点）

社区提出 #406 反馈：Text2CypherRetriever、VectorCypherRetriever 等检索器目前只接受同步版 Neo4j 驱动，无法直接在 async 项目中复用事件循环，需要在外层额外起线程包装。这意味着在已有的 asyncio 应用中，要么把同步调用放到 to_thread，要么退而求其次使用线程池。资料来源：src/neo4j_graphrag/retrievers/text2cypher.py，src/neo4j_graphrag/retrievers/vector.py

flowchart LR
    Q[用户问题] --> R{检索器类型}
    R -->|向量| VR[VectorRetriever]
    R -->|向量+图| VCR[VectorCypherRetriever]
    R -->|混合| HR[HybridRetriever]
    R -->|文本→Cypher| T2C[Text2CypherRetriever]
    VR --> CTX[RetrieverResult]
    VCR --> CTX
    HR --> CTX
    T2C --> CTX
    CTX --> G[GraphRAG.search]
    G --> LLM[LLM 合成]
    LLM --> A[最终答案]

GraphRAG 生成

`GraphRAG.search()` 工作流

GraphRAG 是一个高阶封装，负责按顺序执行：① 通过注入的 retriever 获取上下文；② 将上下文与 prompt_template 渲染结果一起送入 LLMInterface.invoke() 或 ainvoke()；③ 用 MessageHistory（含内存或 Neo4jMessageHistory 后端）维护对话轮次。return_context 参数决定是否在响应中附带原始 RetrieverResult，社区讨论 #148 关注其默认值对调试体验的影响。资料来源：src/neo4j_graphrag/generation/graphrag.py

`Neo4jMessageHistory` 与时间戳

当配置 Neo4jMessageHistory 作为会话存档后端时，它会把每一轮消息以节点形式写入 Neo4j。社区 #321 提议为消息节点增加 datetime() 属性，方便按时间排序与检索。当前实现只记录角色与内容，分析时必须借助外部查询补齐时间字段。资料来源：examples/question_answering/graphrag_with_neo4j_message_history.py，src/neo4j_graphrag/generation/graphrag.py

结构化输出对生成的影响

EntityRelationExtractor 与 SchemaFromTextExtractor 均支持 use_structured_output=True 切换到 LLMInterfaceV2，可直接由 LLM 返回 Pydantic 对象，减少 JSON 解析失败。当前仅 OpenAILLM 与 VertexAILLM 具备该能力，社区 #493 提议为 AnthropicLLM 增加同等支持。资料来源：src/neo4j_graphrag/experimental/components/entity_relation_extractor.py，src/neo4j_graphrag/experimental/components/schema.py

模板与提示词

Text2CypherTemplate 在生成 Cypher 时强制要求 schema、输入查询都在提示词中出现，并禁止返回额外的 Markdown 围栏。该模板与 ERExtractionTemplate 共同决定了下游实体抽取与图谱构建的质量。资料来源：src/neo4j_graphrag/generation/prompts.py

使用模式与常见失败点

同步驱动阻塞：在异步项目中请使用 asyncio.to_thread(GraphRAG.search, ...)，或等待 issue #406 引入的原生 async driver 支持。资料来源：src/neo4j_graphrag/retrievers/base.py
Cypher 生成失败：Text2CypherRetriever 在 LLM 返回非合法 Cypher 时会抛 Text2CypherRetrievalError；可通过自定义 prompt_template 与 llm_params 提升稳定性。资料来源：src/neo4j_graphrag/retrievers/text2cypher.py
结构化输出非法：SchemaFromTextExtractor 在 LLM 返回内容违背 GraphSchemaExtractionOutput 时会记录日志并降级为 V1 路径，或按 OnError.RAISE 抛出 LLMGenerationError。资料来源：src/neo4j_graphrag/experimental/components/schema.py
工具调用结果为空：OllamaLLM 等实现当 LLM 未触发工具时，会将原始文本作为响应回传，使用方需自行区分普通回答与工具调用。资料来源：src/neo4j_graphrag/llm/ollama_llm.py

知识图谱构建管道（实验性）

neo4j-graphrag-python 提供了一套位于 neo4jgraphrag.experimental 命名空间下的实验性知识图谱构建管道，用于将非结构化文本（纯文本或 PDF）端到端地转换为存储在 Neo4j 数据库中的知识图谱。README 明确说明该包提供两种构建方式：高级的 Pipeline 类提供灵活的可定制管线，简化的 SimpleKGPipeline...

章节 相关页面

继续阅读本节完整说明和来源证据。

章节 构建管线主类

继续阅读本节完整说明和来源证据。

章节 实体与关系抽取器

继续阅读本节完整说明和来源证据。

章节 字面图与图谱写入

继续阅读本节完整说明和来源证据。

概述

neo4j-graphrag-python 提供了一套位于 neo4j_graphrag.experimental 命名空间下的实验性知识图谱构建管道，用于将非结构化文本（纯文本或 PDF）端到端地转换为存储在 Neo4j 数据库中的知识图谱。README 明确说明该包提供两种构建方式：高级的 Pipeline 类提供灵活的可定制管线，简化的 SimpleKGPipeline 类对 Pipeline 进行抽象封装，便于快速接入资料来源：[README.md]。

整个系统以"组件（Component）"为基本单元，通过编排器将加载、拆分、向量化、抽取与写入等步骤串联成可异步执行的管道。组件均继承自 neo4j_graphrag.experimental.pipeline.Component 抽象类，便于用户替换或扩展资料来源：[src/neo4j_graphrag/experimental/pipeline/kg_builder.py]。

核心组件

构建管线主类

SimpleKGPipeline 是快速入口，它接收 LLM、嵌入器、Neo4j 驱动以及可选的 schema / entities / relations / potential_schema 参数，将文本到图谱的常见步骤自动组装为一条可执行管道。它支持 from_file=True 时接收 file_path（PDF 或 Markdown），from_file=False 时直接接收 text 字段资料来源：[src/neo4j_graphrag/experimental/pipeline/kg_builder.py]。

实体与关系抽取器

EntityRelationExtractor 是 ER 抽取的核心组件，基于 ERExtractionTemplate 提示模板驱动 LLM 输出 JSON 形式的 Neo4jGraph。它使用 asyncio.Semaphore 控制并发度 max_concurrency，并对每个 chunk 异步调用 run_for_chunk，最终通过 combine_chunk_graphs 合并结果图资料来源：[src/neo4j_graphrag/experimental/components/entity_relation_extractor.py]。

当 use_structured_output=True 且 LLM 支持结构化输出时（如 OpenAILLM、VertexAILLM），抽取器使用 LLMInterfaceV2 的 response_format 参数将响应约束为 Neo4jGraph 模型；否则退回到基于提示词 + JSON 解析的 V1 流程资料来源：[src/neo4j_graphrag/experimental/components/entity_relation_extractor.py]。

字面图与图谱写入

LexicalGraphBuilder 负责生成字面图（lexical graph），包含文档节点、文本块节点以及它们之间的 NEXT_CHUNK、FROM_DOCUMENT 关系，便于在 RAG 阶段回溯原始文本段资料来源：[src/neo4j_graphrag/experimental/components/lexical_graph.py]。

Neo4jWriter（继承自 KGWriter）是默认的写入实现，按 batch_size（默认 1000）将 Neo4jGraph 批量写入目标数据库，并通过 __id__ 或首个属性的 KEY 约束处理实体去重资料来源：[src/neo4j_graphrag/experimental/components/kg_writer.py]。

Schema 组件

GraphSchema、NodeType、RelationshipType 共同构成对节点和关系类型的约束描述；SchemaFromTextExtractor 可以让 LLM 自动从文本中归纳 schema。当启用结构化输出（V2）时，其通过 ExtractedNodeType / ExtractedRelationshipType 等 wire DTO 与 LLM JSON Schema 对齐资料来源：[src/neo4j_graphrag/experimental/components/schema.py] 资料来源：[src/neo4j_graphrag/experimental/components/graph_schema_extraction.py]。

构建流程

下图展示了从原始输入到 Neo4j 中持久化知识图谱的整体数据流：

flowchart LR
    A[文本或 PDF] --> B[Loader]
    B --> C[TextSplitter]
    C --> D[ChunkEmbedder]
    D --> E[LexicalGraphBuilder]
    E --> F[EntityRelationExtractor]
    F --> G[Neo4jWriter]
    G --> H[(Neo4j Database)]
    F -.可选.-> I[SchemaFromTextExtractor]
    I -.约束.-> F

在 SimpleKGPipeline 内部，文本首先经过 from_file / text 入口进入，加载后通过文本拆分器（如 FixedSizeSplitter）切分为 TextChunks。LexicalGraphBuilder.run 在可选的 DocumentInfo 下生成字面图，随后 EntityRelationExtractor 并发执行 run_for_chunk 并通过 combine_chunk_graphs 合并 Neo4jGraph，最后由 Neo4jWriter 批量落库资料来源：[src/neo4j_graphrag/experimental/pipeline/kg_builder.py]。

配置与使用

通过 `SimpleKGPipeline` 快速接入

examples/README.md 推荐的方式是使用 SimpleKGPipeline：传入 driver、OpenAIEmbeddings、OpenAILLM，再声明 node_types、relationship_types 和 patterns 三元组，调用时直接 await pipeline.run(text=...)。该类还会自动套用 text_splitter、可选的 file_loader 等默认组件资料来源：[examples/README.md] 资料来源：[README.md]。

通过 `ObjectConfig` 从配置构建

ObjectConfig 提供了从类路径与构造参数字典反向实例化对象的能力，配合 ParamConfig 用于参数解析，从而支持从 YAML/JSON 配置文件重建整个管道资料来源：[src/neo4j_graphrag/experimental/pipeline/config/object_config.py]。

错误处理

抽取阶段若启用 use_structured_output，可能抛出 LLMGenerationError；若仅使用 V1 提示词路径，fix_invalid_json 与 json.loads 失败时将根据 on_error 参数决定 RAISE 还是回退为空 Neo4jGraph 并写入日志资料来源：[src/neo4j_graphrag/experimental/components/entity_relation_extractor.py]。

提示词模板

ERExtractionTemplate.DEFAULT_TEMPLATE 指导 LLM 抽取节点与关系，并强调：返回单一 JSON 对象（不要用列表包裹）、属性名必须加双引号、不要输出 Markdown 代码块包围，并要求为每个节点分配 id 字符串以在关系中复用资料来源：[src/neo4j_graphrag/generation/prompts.py]。

社区关注点

社区中关于同步 Neo4j 驱动阻塞异步流程的反馈（如 issue #406）目前主要针对检索器；构建管道在 1.18.0 版本中也持续增强 schema 处理，例如对 LLM 生成的重复关系类型进行自动调和（PR #536）以及禁止在同一属性上同时存在 KEY 与 EXISTENCE 约束（PR #537 后续）。这些改动影响了 SchemaFromTextExtractor 与 Neo4jWriter 的协作行为，建议升级时关注 examples/customize/build_graph/ 下的样例更新资料来源：[examples/README.md]。

参见

检索器与 GraphRAG 生成：GraphRAG.search 与 VectorCypherRetriever
LLM 接口层：LLMInterface / LLMInterfaceV2
消息历史：Neo4jMessageHistory

来源：https://github.com/neo4j/neo4j-graphrag-python / 项目说明书

LLM 与嵌入提供者集成

neo4j-graphrag-python 通过统一的抽象接口，将主流云端与本地的大语言模型（LLM）以及向量嵌入（Embedding）服务接入 GraphRAG 流水线。其核心目标是让用户在不修改业务代码的前提下，灵活切换 OpenAI、Anthropic、Cohere、Amazon Bedrock、Google GenAI、Ollama 等多家提供商 [资料来源：[RE...

章节 相关页面

继续阅读本节完整说明和来源证据。

概述

neo4j-graphrag-python 通过统一的抽象接口，将主流云端与本地的大语言模型（LLM）以及向量嵌入（Embedding）服务接入 GraphRAG 流水线。其核心目标是让用户在不修改业务代码的前提下，灵活切换 OpenAI、Anthropic、Cohere、Amazon Bedrock、Google GenAI、Ollama 等多家提供商资料来源：[README.md:1-200]。仓库在 src/neo4j_graphrag/llm/ 与 src/neo4j_graphrag/embeddings/ 目录下分别为 LLM 和 Embedder 提供独立模块，并通过 LLMInterface 与 Embedder 抽象基类约束实现资料来源：[src/neo4j_graphrag/llm/base.py:1-120]资料来源：[src/neo4j_graphrag/embeddings/base.py:1-80]。

flowchart LR
    A[Pipeline / Retriever] --> B[LLMInterface]
    A --> C[Embedder]
    B --> B1[OpenAILLM]
    B --> B2[AnthropicLLM]
    B --> B3[CohereLLM]
    B --> B4[BedrockLLM]
    B --> B5[GoogleGenAILLM]
    B --> B6[OllamaLLM]
    C --> C1[OpenAIEmbeddings]
    C --> C2[AzureOpenAIEmbeddings]
    C --> C3[VertexAIEmbeddings]
    C --> C4[MistralAIEmbeddings]
    C --> C5[CohereEmbeddings]
    C --> C6[OllamaEmbeddings]

LLM 提供者抽象层

所有 LLM 提供者均实现 LLMInterface（V1）或 LLMInterfaceV2（支持结构化输出）抽象类，提供同步 invoke 与异步 ainvoke 两种调用方式资料来源：[src/neo4j_graphrag/llm/openai_llm.py:1-200]。每个具体实现都负责将通用消息格式 LLMMessage 转换为对应服务商的协议，例如 OpenAILLM 在调用工具（Tool Calling）时将内部 Tool 对象转换为 ChatCompletionToolParam 资料来源：[src/neo4j_graphrag/llm/openai_llm.py:120-180]。类似地，GoogleGenAILLM 通过解析 response.candidates[0].content.parts 抽取 function_call 信息以生成 ToolCallResponse 资料来源：[src/neo4j_graphrag/llm/google_genai_llm.py:1-80]；OllamaLLM 则将工具转换为本地 ollama_tools 格式，并在没有工具调用时回退到普通文本响应资料来源：[src/neo4j_graphrag/llm/ollama_llm.py:1-180]。

下表列出当前已支持的主要 LLM 提供者及其关键能力：

提供者类	异步支持	工具调用	结构化输出 (V2)
`OpenAILLM`	是	是	是
`AnthropicLLM`	是	是	计划中 (Issue #493)
`CohereLLM`	是	是	否
`BedrockLLM`	是	是	否
`GoogleGenAILLM`	是	是	否
`OllamaLLM`	是	是	否

社区 Issue #493 已提出为 AnthropicLLM 增加 Anthropic SDK 的结构化输出支持，与 OpenAILLM、VertexAILLM 保持一致资料来源：[README.md:1-200]。

嵌入提供者集成

Embedder 抽象基类定义了 embed_query 方法，所有具体实现须返回 List[float] 形式的向量资料来源：[src/neo4j_graphrag/embeddings/base.py:1-80]。OpenAIEmbeddings 等内置类与 LLM 模块的设计保持一致：构造函数接收模型名称与厂商特定参数，调用方法在内部委托给对应 SDK。在 SimpleKGPipeline 中，嵌入器与 LLM 协同工作，将文本块向量化后写入 Neo4j 索引资料来源：[README.md:1-200]。仓库在 examples/customize/embeddings/ 下为 OpenAI、Azure OpenAI、VertexAI、MistralAI、Cohere、Ollama 以及自定义实现分别提供了示例资料来源：[examples/README.md:1-100]。

结构化输出与流水线集成

在知识图谱构建流水线中，LLM 需要返回符合 Neo4jGraph 或 GraphSchema 模型的 JSON。EntityRelationExtractor 在启用 use_structured_output 时，会调用支持 response_format 参数的 LLMInterfaceV2（目前仅 OpenAILLM 与 VertexAILLM），并使用 Neo4jGraph.model_validate_json 直接解析响应；否则回退到基于提示词的 V1 JSON 抽取，并通过 fix_invalid_json 修复常见格式错误资料来源：[src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:1-200]资料来源：[src/neo4j_graphrag/experimental/components/schema.py:1-200]。

流水线层面的对象配置通过 ObjectConfig 统一管理，支持基于 YAML/JSON 配置文件按类路径和构造参数动态实例化任意 LLMInterface、Embedder 或 Component，从而使 LLM 与嵌入提供者的替换对上层业务透明资料来源：[src/neo4j_graphrag/experimental/pipeline/config/object_config.py:1-150]。

社区关注与已知限制

异步驱动支持：Issue #406 指出当前 Text2CypherRetriever 等检索器仅接受同步 Neo4j 驱动，全异步链路尚未打通。
消息历史时间戳：Issue #321 请求为 Neo4jMessageHistory 节点写入 datetime() 属性，以便做基于时间的会话分析。
return_context 默认值：Issue #148 讨论 GraphRAG.search() 中 return_context 默认应改为 True，以更好地体现 GraphRAG 上下文增强的价值资料来源：[src/neo4j_graphrag/generation/graphrag.py:80-120]。
Schema 去重：1.18.0 版本起，LLM 生成的关系类型在落库前会自动合并重复定义，并禁止在同一属性上同时使用 KEY 与 EXISTENCE 约束。

失败模式与踩坑日记

保留 Doramagic 在发现、验证和编译中沉淀的项目专属风险，不把社区讨论只当作装饰信息。

high 来源证据：Allow async driver in retrievers

可能增加新用户试用和生产接入成本。

high 来源证据：[FEATURE]: Add Anthropic's Structured Output feature

可能影响授权、密钥配置或安全边界。

high 来源证据：[FEATURE]: Add MistralAI Structured Output feature

可能影响授权、密钥配置或安全边界。

medium 仓库名和安装名不一致

用户照着仓库名搜索包或照着包名找仓库时容易走错入口。

Pitfall Log / 踩坑日志

项目：neo4j/neo4j-graphrag-python

摘要：发现 14 个潜在踩坑项，其中 3 个为 high/blocking；最高优先级：安装坑 - 来源证据：Allow async driver in retrievers。

1. 安装坑 · 来源证据：Allow async driver in retrievers

严重度：high
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Allow async driver in retrievers
对用户的影响：可能增加新用户试用和生产接入成本。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/406 | 来源类型 github_issue 暴露的待验证使用条件。

2. 安全/权限坑 · 来源证据：[FEATURE]: Add Anthropic's Structured Output feature

严重度：high
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[FEATURE]: Add Anthropic's Structured Output feature
对用户的影响：可能影响授权、密钥配置或安全边界。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/493 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

3. 安全/权限坑 · 来源证据：[FEATURE]: Add MistralAI Structured Output feature

严重度：high
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[FEATURE]: Add MistralAI Structured Output feature
对用户的影响：可能影响授权、密钥配置或安全边界。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/542 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

4. 身份坑 · 仓库名和安装名不一致

严重度：medium
证据强度：runtime_trace
发现：仓库名 neo4j-graphrag-python 与安装入口 neo4j-graphrag 不完全一致。
对用户的影响：用户照着仓库名搜索包或照着包名找仓库时容易走错入口。
复现命令：pip install neo4j-graphrag
证据：identity.distribution | https://github.com/neo4j/neo4j-graphrag-python | repo=neo4j-graphrag-python; install=neo4j-graphrag

5. 安装坑 · 来源证据：Migrate VertexAIEmbeddings to use google-genai SDK

严重度：medium
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Migrate VertexAIEmbeddings to use google-genai SDK
对用户的影响：可能影响升级、迁移或版本选择。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/430 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

6. 安装坑 · 来源证据：[QUESTION]: How can i customise the entity/node extracted from SimpleKGPipeline

严重度：medium
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[QUESTION]: How can i customise the entity/node extracted from SimpleKGPipeline
对用户的影响：可能增加新用户试用和生产接入成本。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/439 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

7. 能力坑 · 能力判断依赖假设

严重度：medium
证据强度：source_linked
发现：README/documentation is current enough for a first validation pass.
对用户的影响：假设不成立时，用户拿不到承诺的能力。
证据：capability.assumptions | https://github.com/neo4j/neo4j-graphrag-python | README/documentation is current enough for a first validation pass.

8. 运行坑 · 来源证据：[FEATURE]: Add possibility to truncate retrieved context

严重度：medium
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个运行相关的待验证问题：[FEATURE]: Add possibility to truncate retrieved context
对用户的影响：可能增加新用户试用和生产接入成本。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/446 | 来源类型 github_issue 暴露的待验证使用条件。

9. 维护坑 · 维护活跃度未知

严重度：medium
证据强度：source_linked
发现：未记录 last_activity_observed。
对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
证据：evidence.maintainer_signals | https://github.com/neo4j/neo4j-graphrag-python | last_activity_observed missing

严重度：medium
证据强度：source_linked
发现：no_demo
证据：downstream_validation.risk_items | https://github.com/neo4j/neo4j-graphrag-python | no_demo; severity=medium

11. 安全/权限坑 · 存在评分风险

严重度：medium
证据强度：source_linked
发现：no_demo
对用户的影响：风险会影响是否适合普通用户安装。
证据：risks.scoring_risks | https://github.com/neo4j/neo4j-graphrag-python | no_demo; severity=medium

12. 安全/权限坑 · 来源证据：Problem with OllamaEmbedding: "init: embeddings required but some input tokens were not marked as outputs -> overriding"

严重度：medium
证据强度：source_linked
发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Problem with OllamaEmbedding: "init: embeddings required but some input tokens were not marked as outputs -> overriding"
对用户的影响：可能影响授权、密钥配置或安全边界。
证据：community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/427 | 来源类型 github_issue 暴露的待验证使用条件。

13. 维护坑 · issue/PR 响应质量未知

严重度：low
证据强度：source_linked
发现：issue_or_pr_quality=unknown。
对用户的影响：用户无法判断遇到问题后是否有人维护。
证据：evidence.maintainer_signals | https://github.com/neo4j/neo4j-graphrag-python | issue_or_pr_quality=unknown

14. 维护坑 · 发布节奏不明确

严重度：low
证据强度：source_linked
发现：release_recency=unknown。
对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
证据：evidence.maintainer_signals | https://github.com/neo4j/neo4j-graphrag-python | release_recency=unknown

来源：Doramagic 发现、验证与编译记录