Doramagic Project Pack · Human Manual
neo4j-graphrag-python
Neo4j GraphRAG for Python
Overview, Installation & Configuration
Related topics: Retrievers and Database Search, Knowledge Graph Construction Pipeline, Generation, LLM Providers, Embeddings & Message History
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Retrievers and Database Search, Knowledge Graph Construction Pipeline, Generation, LLM Providers, Embeddings & Message History
Overview, Installation & Configuration
1. Project Purpose and Scope
The neo4j-graphrag-python library is an official Neo4j integration that enables Graph Retrieval-Augmented Generation (GraphRAG) workflows. It provides Python-first building blocks to construct knowledge graphs from unstructured data, retrieve context from Neo4j using vectors, full-text search, or Cypher, and feed the results into LLM-based question answering.
The library supports two complementary workflows:
- Knowledge Graph Construction — extracting entities, relationships, and a lexical graph from text/PDF inputs and writing them into Neo4j.
- Retrieval and Generation — querying the graph via vector search, hybrid search, Text2Cypher, or Cypher templates, and producing grounded answers.
A high-level representation of the system:
flowchart LR
A[Documents / Text] --> B[SimpleKGPipeline / Pipeline]
B --> C[(Neo4j Graph)]
C --> D[Retriever]
D --> E[GraphRAG / Prompt]
E --> F[LLM Answer]
classDef store fill:#eef,stroke:#447;
class C store;Construction modules such as LexicalGraphBuilder (lexical_graph.py) and Neo4jWriter (kg_writer.py) write structured graphs, while retrieval classes consume the resulting index.
2. Installation
The project ships as a standard Python package and recommends using uv for dependency management. Source: README.md
Two install profiles are supported:
| Profile | Command | Purpose |
|---|---|---|
| Runtime only | uv sync (or pip install neo4j-graphrag) | Use the library in an application |
| Development | uv sync --group dev | Contributing, includes tooling |
| All extras (tests) | uv sync --all-extras | Run the full unit test suite |
Prerequisites:
- A reachable Neo4j instance (local, Docker, or Aura).
- For APOC-backed features, install the APOC plugin in the target Neo4j deployment (see
README.mdnote: *“the APOC plugin … must be installed … to use this feature”*). - Credentials and connection URL exposed to the Python process (typically via environment variables).
3. Core Configuration Objects
3.1 Neo4j Driver and Database
A neo4j GraphDatabase.driver instance is the entry point used by retrievers, the SimpleKGPipeline, and Neo4jWriter. The writer accepts an optional neo4j_database parameter and falls back to the server default database when omitted (Neo4jWriter source: kg_writer.py). Note: the current retrievers and pipelines accept the synchronous driver; community issue #406 (“Allow async driver in retrievers”) tracks the request for full AsyncGraphDatabase support.
3.2 LLM Providers
LLMs are abstracted behind the LLMInterface. Built-in providers include:
- OpenAI —
OpenAILLMwith sync/asyncinvoke, structured outputs (V2), and tool calling (openai_llm.py). - VertexAI — analogous provider with structured output support.
- Anthropic — currently lacks structured output parity; community issue #493 tracks the request.
- Google Gemini —
GoogleGenAILLMwrapsGenerateContentResponseand parsesfunction_callparts (google_genai_llm.py). - Ollama — local-model provider with tool-call conversion helpers (
ollama_llm.py).
Each LLM accepts model_params (temperature, max tokens, etc.) and a message_history that can be a list of LLMMessage or any MessageHistory subclass.
3.3 Embedders
Embedders live in neo4j_graphrag.embeddings and include OpenAIEmbeddings, AzureOpenAIEmbeddings, VertexAIEmbeddings, MistralAIEmbeddings, CohereEmbeddings, OllamaEmbeddings, and a SentenceTransformerEmbeddings baseline. They are consumed by retrievers for vector indexing and by the pipeline when chunk embeddings need to be persisted.
3.4 Pipeline Configuration via `ObjectConfig`
For declarative deployments, components are described through ObjectConfig, which is a Pydantic Generic[T] wrapper containing a fully-qualified class_ path and a params_ dict. This allows a pipeline to be specified entirely in YAML/JSON and instantiated via parse(...), while still being able to pass already-instantiated objects in code. The wrapper enforces type-correctness against Embedder and LLMInterface at parse time, and uses a DEFAULT_MODULE and INTERFACE to resolve the target class. Source: object_config.py:73-110
4. Getting Started Patterns
4.1 Build a Knowledge Graph
The README’s quickstart instantiates a SimpleKGPipeline with a driver, embedder, LLM, schema hints, and from_file=True. The wrapper class internally composes an EntityRelationExtractor (see entity_relation_extractor.py) and a Neo4jWriter. Concurrency is controlled by max_concurrency through an asyncio.Semaphore, and chunks are extracted in parallel.
4.2 Retrieval
Retrievers exposed by the package are accessible from the examples/ index (examples/README.md): similarity search (similarity_search_for_vector.py, similarity_search_for_text.py), VectorCypherRetriever, HybridRetriever, HybridCypherRetriever, Text2CypherRetriever, and CypherTemplateRetriever. Each retriever exposes a search(...) method returning nodes/records plus an optional context blob that can be forwarded to a GraphRAG or custom prompt.
4.3 Question Answering
The GraphRAG.search(...) method is the canonical entry point for retrieval-augmented QA. Community issue #148 (“Should return_context be enabled or not by default?”) flagged a behavioral concern at graphrag.py:88; users adopting the library should explicitly verify whether return_context is required by their prompt templates.
5. Common Failure Modes and Tips
- Synchronous driver only: As noted in #406, mixing async application code with sync drivers requires wrapping with
asyncio.to_thread. Plan accordingly when embedding retrievers in async services. - Message history without timestamps:
Neo4jMessageHistorydoes not currently write timestamps (community #321). If you need audit/orderable message trails, add a custom property at write time. - Structured output availability: Not every LLM provider exposes
LLMInterfaceV2/supports_structured_output. The schema extractor schema.py only enables V2 when the underlying LLM advertises the capability. - Prompt JSON drift: The V1 entity extraction prompt (prompts.py) is JSON-string based and relies on
fix_invalid_json. For production, prefer V2 structured outputs when available. - Constraint conflicts: Release 1.18.0 forbids applying
KEYandEXISTENCEconstraints on the same property (CHANGELOGnotes from PR #537/#536) — align your schema constraints to avoid writer failures.
See Also
- Retrievers & Generation
- Knowledge Graph Construction Pipeline
- Schema Extraction Components
- LLM Provider Reference
- Project README
- Examples Index
Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual
Retrievers and Database Search
Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History, Knowledge Graph Construction Pipeline
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History, Knowledge Graph Construction Pipeline
Retrievers and Database Search
Purpose and Scope
Retrievers are the data-access layer of neo4j-graphrag-python. They encapsulate every supported way of fetching relevant content from a Neo4j database — or, in the case of external vector stores, from third-party systems — so that the retrieved context can be fed into a generation step or a GraphRAG pipeline. The retriever interface is intentionally narrow: a retriever accepts a query and an optional configuration dictionary, runs a query (vector, fulltext, Cypher, or a hybrid combination), and returns a list of records transformed into RetrieverResultItem objects.
The full set of built-in retrievers exposed via the package entry point is defined in src/neo4j_graphrag/retrievers/__init__.py, which re-exports the concrete implementations. All concrete retrievers inherit from a shared abstract Retriever base class declared in src/neo4j_graphrag/retrievers/base.py. The base class performs driver validation, optionally verifies the Neo4j server version, and defines the public search() contract that downstream code (such as GraphRAG) relies on.
Built-in Retriever Types
The package ships several retrieval strategies, each targeting a different retrieval pattern.
Vector Retriever
VectorRetriever (implemented in src/neo4j_graphrag/retrievers/vector.py) performs k-Nearest-Neighbor similarity search against a Neo4j vector index. It accepts either a precomputed query_vector or a query_text together with an Embedder instance, and supports an optional retrieval_query Cypher fragment for post-filtering or expansion. A simple text-to-vector search example is documented in examples/README.md under retrieve/similarity_search_for_text.py.
Hybrid and HybridCypher Retrievers
HybridRetriever and HybridCypherRetriever (in src/neo4j_graphrag/retrievers/hybrid.py) combine vector similarity with a Neo4j fulltext index, fusing the two result lists using Reciprocal Rank Fusion (RRF) before applying any retrieval_query post-processing. HybridCypherRetriever is a strict superset that always requires a Cypher post-query, exposing the upstream node variable so callers can traverse related entities in the same round-trip. As stated in the docstring of src/neo4j_graphrag/retrievers/hybrid.py, when an embedder is supplied at construction time the retriever accepts query_text; otherwise the caller must pass a precomputed query_vector.
Text2Cypher Retriever
Text2CypherRetriever (in src/neo4j_graphrag/retrievers/text2cypher.py) delegates natural-language question understanding to an LLM. It uses the Text2CypherTemplate defined in src/neo4j_graphrag/generation/prompts.py to render a schema-aware prompt, generates a Cypher statement, and executes it. The template accepts schema and optional examples and explicitly instructs the LLM not to wrap the Cypher in triple backticks, which is critical because the generated query is executed directly against the database.
Tools Retriever
ToolsRetriever (in src/neo4j_graphrag/retrievers/tools_retriever.py) acts as an LLM-driven orchestrator over a list of tool objects. Each tool is typically obtained by calling .convert_to_tool() on a child retriever (for example, a VectorRetriever converted into a vector_search tool and a Text2CypherRetriever converted into a cypher_search tool). The LLM analyzes the user query and selects which tools to invoke, after which ToolsRetriever collects and returns the combined results. Duplicate tool names are rejected at construction time with a ValueError, and the class explicitly disables Neo4j version verification because it does not issue its own queries.
Data Flow Through a Retriever
The following diagram summarises how a user query is routed from a caller (e.g. GraphRAG.search) down to the database and back.
flowchart TD
A[Caller: GraphRAG.search or direct retriever.search] --> B{Retriever type}
B -- Vector --> C[VectorRetriever<br/>embed query_text if needed]
B -- Hybrid --> D[HybridRetriever<br/>vector + fulltext + RRF]
B -- Text2Cypher --> E[Text2CypherRetriever<br/>LLM generates Cypher]
B -- Tools --> F[ToolsRetriever<br/>LLM picks tools]
C --> G[Execute against Neo4j<br/>via neo4j.Driver]
D --> G
E --> G
F --> H[Invoke selected tool retrievers]
H --> G
G --> I[RetrieverResultItem list]
I --> J[Optional: prompt template<br/>injects context]
J --> K[LLM answer]Configuration Parameters
The base retriever interface exposes a uniform retriever_config dictionary that is forwarded to each implementation. The following table summarises the common parameters and where they are documented.
| Parameter | Used by | Description |
|---|---|---|
top_k | VectorRetriever, HybridRetriever, HybridCypherRetriever | Maximum number of items to return from the index search. |
query_text | All retrievers with an embedder | Natural-language query; embedded on the fly. |
query_vector | Vector and Hybrid variants | Precomputed embedding vector; bypasses the embedder. |
retrieval_query | VectorRetriever, HybridCypherRetriever | Additional Cypher appended after the index lookup, with node bound to the matched record. |
filters | Most retrievers | WHERE-clause fragment applied at query time. |
database (or neo4j_database at construction) | All | Targets a non-default Neo4j database. |
return_context | GraphRAG.search | When True, the raw retrieved context is included in the RagResultModel. |
GraphRAG.search() in src/neo4j_graphrag/generation/graphrag.py explicitly warns that the default value of return_context will change from False to True in a future version — a deprecation noted by the community in issue #148, which argues that surfacing the retrieved context by default is more aligned with GraphRAG's value proposition.
Known Limitations and Community Discussion
Two recurring community concerns are directly tied to the retriever interface.
Synchronous-only driver. Issue #406 observes that retrievers such as Text2CypherRetriever and VectorCypherRetriever accept only the synchronous neo4j.Driver, forcing callers who have adopted the async driver to wrap calls in threads. The Retriever base class is built around driver.execute_query()-style synchronous calls, and external retrievers for Weaviate, Pinecone, and Qdrant (listed in examples/README.md) follow the same convention. Until async support lands, the recommended pattern for async applications is to run retriever calls via asyncio.to_thread or to keep an async boundary only around the LLM invocation step.
Default return_context. As noted above, GraphRAG.search() currently defaults to return_context=False, which drops the retrieval context from the result. Issue #148 argues that the default should be True so that callers can debug retrieval quality and so that GraphRAG's main benefit — richer context — is visible out of the box. The source has already been updated with a deprecation warning, signalling that a flip to True is on the roadmap.
See Also
- GraphRAG Generation Pipeline
- Embedders and Vector Index Configuration
- Text2Cypher Prompt Templates
- External Retrievers (Weaviate, Pinecone, Qdrant)
- Knowledge Graph Builder Pipeline
Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual
Knowledge Graph Construction Pipeline
Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History
Knowledge Graph Construction Pipeline
Purpose and Scope
The Knowledge Graph Construction Pipeline is the subsystem of neo4j-graphrag-python responsible for transforming unstructured text or PDF documents into a populated Neo4j graph. It provides two entry points: a low-level, highly customizable Pipeline class and a higher-level SimpleKGPipeline abstraction that wires the standard components together. According to the project README, both classes accept text or PDF input and target users who want to "build knowledge graph" pipelines (Source: README.md:50-70).
The pipeline is asynchronous, component-based, and designed to plug in custom loaders, splitters, embedders, schema builders, extractors, pruners, and writers (Source: README.md:55-65).
Pipeline Architecture
Construction follows a staged data flow. Each stage is a Component that produces typed outputs consumed by the next stage:
flowchart LR
A[Document Loader<br/>PDF / Text / Custom] --> B[Text Splitter<br/>FixedSizeSplitter]
B --> C[Chunk Embedder<br/>OpenAIEmbeddings]
B --> D[SchemaFromTextExtractor<br/>optional]
C --> E[LexicalGraphBuilder]
C --> F[EntityRelationExtractor<br/>LLM-driven]
F --> G[Schema / GraphPruner]
D --> G
E --> G
G --> H[Neo4jWriter]
H --> I[(Neo4j Graph DB)]The SimpleKGPipeline composes this default chain but each node can be replaced through the lower-level Pipeline API (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:60-180).
Lexical Graph Construction
LexicalGraphBuilder creates the "bookkeeping" layer that connects chunks back to their source document. It builds a Document node, one Chunk node per text chunk, a FROM_DOCUMENT (or configurable) relationship from each chunk to the document, and a NEXT_CHUNK relationship between sequential chunks (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:40-95). When document metadata is absent, only chunks and NEXT_CHUNK edges are produced (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:55-65). Entity-to-chunk linking is established later by EntityRelationExtractor.process_chunk_extracted_entities (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:150-175).
Entity and Relation Extraction
EntityRelationExtractor is the LLM-driven heart of the pipeline. For each chunk it formats a prompt using ERExtractionTemplate (Source: src/neo4j_graphrag/generation/prompts.py:80-115) and calls the configured LLM asynchronously. Two execution modes are supported:
| Mode | Mechanism | When to use |
|---|---|---|
| V1 (default) | Prompt-based JSON, repaired via fix_invalid_json | Any LLMInterface implementation |
| V2 | Structured output against Neo4jGraph schema | LLMs declaring supports_structured_output (e.g., OpenAILLM, VertexAILLM) |
Failure handling is controlled by the OnError enum: RAISE surfaces LLMGenerationError, while IGNORE logs and continues with an empty graph for that chunk (Source: src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:110-160). Concurrency is bounded by max_concurrency using an asyncio.Semaphore (Source: src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:85-105).
Schema Extraction, Validation, and Pruning
SchemaFromTextExtractor builds a GraphSchema either through prompt-based extraction or via the structured GraphSchemaExtractionOutput model (Source: src/neo4j_graphrag/experimental/components/schema.py:30-90). Recent releases (v1.18.0) auto-reconcile duplicate relationship types emitted by the LLM and forbid KEY + EXISTENCE constraints on the same property (Source: README.md community context / release notes).
GraphPruner enforces the schema on the extracted graph, removing nodes with missing labels, missing required properties, or unknown labels (when additional_node_types=False). If a node is dropped, all its relationships are dropped too (Source: src/neo4j_graphrag/experimental/components/graph_pruning.py:40-90). Pattern-level cross-reference checks happen both during extraction (_extraction_filter_invalid_patterns) and at validation time (_extraction_apply_cross_reference_filters) (Source: src/neo4j_graphrag/experimental/components/schema.py:200-260).
Persistence
Neo4jWriter (subclass of KGWriter) persists the final graph in configurable batches (batch_size, default 1000) and supports a custom neo4j_database argument (Source: src/neo4j_graphrag/experimental/components/kg_writer.py:60-100). Nodes lacking a unique key are auto-assigned a synthetic __id__ so they can be referenced from relationships (Source: src/neo4j_graphrag/experimental/components/kg_writer.py:10-40).
Using `SimpleKGPipeline`
The recommended entry point for new users. The example in the README shows how to define node_types, relationship_types, patterns, an OpenAIEmbeddings embedder, and an OpenAILLM, then call await kg_builder.run_async(text=...) or kg_builder.run(file_path=...) (Source: README.md:60-110).
Key constructor parameters (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:80-180):
| Parameter | Description |
|---|---|
driver | Neo4j driver used by the writer |
llm | LLM used for schema extraction and ER extraction |
embedder | Embedder used for chunk embeddings |
schema | Pre-defined GraphSchema (preferred over entities/relations/potential_schema) |
from_file | If True, expects file_path (PDF/Markdown); otherwise expects text |
text_splitter, file_loader | Customizable pipeline stages |
on_error | RAISE or IGNORE for ER extraction failures |
create_lexical_graph | Toggle lexical graph creation |
The entities, relations, and potential_schema parameters are deprecated since 1.7.1 in favor of schema (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:30-60).
Customization and Common Pitfalls
- Custom loaders/splitters/embedders: Every stage implements the
Componentinterface and can be swapped when building aPipelinedirectly (Source: examples/README.md:60-120). - Config-file driven pipelines: Pipelines can be reconstructed from JSON/YAML config files (
Source: examples/README.md:10-40). - Structured output coverage: Issue #493 tracks adding Anthropic structured-output support; today only
OpenAILLMandVertexAILLMexposesupports_structured_output(Source: src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:115-140). - Async drivers: Issue #406 notes that retrievers still require a synchronous Neo4j driver; the construction pipeline itself is fully async via
run_async(Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:60-80). - Empty-graph warnings: If pruning removes every node,
GraphPrunerlogs a warning and returns an emptyNeo4jGraph, which will result in a no-op write (Source: src/neo4j_graphrag/experimental/components/graph_pruning.py:55-70). - APOC requirement: Several advanced schema/constraint operations require APOC to be installed in the target Neo4j instance (
Source: README.md:45-55).
See Also
- Retrievers and
GraphRAG.search()(see "Retrievers and Question Answering" page) - LLM and Embedder provider abstraction
- Schema configuration and constraints
Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual
Generation, LLM Providers, Embeddings & Message History
Related topics: Overview, Installation & Configuration, Retrievers and Database Search, Knowledge Graph Construction Pipeline
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Overview, Installation & Configuration, Retrievers and Database Search, Knowledge Graph Construction Pipeline
Generation, LLM Providers, Embeddings & Message History
Overview
The neo4j-graphrag-python library separates *what* the pipeline does (retrieval, extraction, schema inference) from *how* it talks to a model. The generation/ package hosts the user-facing GraphRAG orchestrator and its prompt templates, while llm/ (alongside the embedders package) exposes provider-neutral abstractions so users can swap vendors without rewriting retrieval logic. Message history lives next to the LLM layer and is reused both by retrievers and by GraphRAG.search(). Together these modules form the generation half of any GraphRAG pipeline.
flowchart LR
User[User query] --> GR[GraphRAG.search]
GR --> Ret[Retriever]
Ret --> Ctx[Retrieved context]
Ctx --> Prompt[RagTemplate]
Prompt --> LLM[LLMInterface / LLMInterfaceV2]
LLM --> Out[Answer + optional context]
History[MessageHistory / Neo4jMessageHistory] --> GR
History --> LLMSource: src/neo4j_graphrag/generation/graphrag.py:1-200 · src/neo4j_graphrag/llm/base.py:1-200
The GraphRAG Generation Pipeline
The main entry point is GraphRAG in src/neo4j_graphrag/generation/graphrag.py. It accepts a Retriever, an LLM (typed as LLMInterface, LLMInterfaceV2, or any LangChain chat model), and a RagTemplate. Inputs are validated through RagInitModel; failures raise RagInitializationError. Its search() method takes query_text, optional message_history, examples, retriever_config, return_context, and response_fallback; the method runs the retriever, formats the prompt, and calls the LLM in a single call. Source: src/neo4j_graphrag/generation/graphrag.py:1-150
A long-running community discussion (#148) highlights that return_context currently defaults to False, even though the docstring warns the default will change to True in a future release. Applications that rely on the retrieved-context block today should pass return_context=True explicitly to stay forward-compatible.
Prompt construction happens in src/neo4j_graphrag/generation/prompts.py. Three PromptTemplate subclasses cover the core flows:
RagTemplate— the default forGraphRAG.search, formats the retrieved context plus the user question.Text2CypherTemplate— builds a Cypher statement from a natural-language query, given a graph schema and optional few-shot examples. It still accepts the deprecatedqueryargument and emits aDeprecationWarning.ERExtractionTemplate— instructs the LLM to emit entity/relationship JSON for knowledge-graph construction; the output is post-processed byfix_invalid_json.
Customising these templates is the simplest way to tune extraction or QA behaviour — the examples/README.md "Prompts" section ships a custom_prompt.py reference. Source: examples/README.md:1-200
LLM Provider Abstraction
Every vendor adapter implements one of two abstract interfaces in src/neo4j_graphrag/llm/base.py:
LLMInterface(V1) — exposesinvoke,ainvoke,invoke_with_tools, andainvoke_with_tools, returning either anLLMResponseor aToolCallResponse. The defaultinvoke_with_toolsbody raisesNotImplementedErrorfor adapters that do not provide tool/function calling.LLMInterfaceV2(V2) — addsresponse_formatsupport for structured outputs, used by the extraction and schema-inference components.
Shared message and response shapes are defined in src/neo4j_graphrag/llm/types.py: LLMMessage/BaseMessage (with UserMessage and SystemMessage subclasses), LLMResponse(content, usage), plus the tool types ToolCall and ToolCallResponse.
The library ships adapters for OpenAI (src/neo4j_graphrag/llm/openai_llm.py), Ollama, Google GenAI (src/neo4j_graphrag/llm/google_genai_llm.py), Anthropic, Vertex AI, MistralAI, and Cohere. Each adapter converts the neutral Tool objects to the provider's native format inside invoke_with_tools/ainvoke_with_tools. Examples for OpenAI, VertexAI, and Ollama tool calling live under examples/customize/llms/.
Structured Output (V2) and the Anthropic Gap
Structured output is opt-in via use_structured_output=True. src/neo4j_graphrag/experimental/components/schema.py (SchemaFromTextExtractor) verifies that the chosen LLM exposes supports_structured_output before issuing a V2 call with response_format=GraphSchemaExtractionOutput. The same gate exists in src/neo4j_graphrag/experimental/components/entity_relation_extractor.py, where V2 extraction returns Neo4jGraph.model_validate_json(llm_result.content) and the docstring explicitly states that V2 is currently supported only for OpenAILLM and VertexAILLM.
Community issue #493 requests the same treatment for AnthropicLLM. Until that is merged, Anthropic users must keep use_structured_output=False, forcing the V1 ERExtractionTemplate path together with the fix_invalid_json post-processor.
Embeddings and Message History
Embedders live outside the LLM package but follow the same plug-in philosophy: the Embedder base is referenced in src/neo4j_graphrag/experimental/pipeline/config/object_config.py as a first-class ObjectConfig target, so an embedder can be specified either in code or in a YAML pipeline config. examples/README.md lists adapters for OpenAI, Azure OpenAI, VertexAI, MistralAI, Cohere, Ollama, and a custom_embeddings.py template.
Message history is passed to GraphRAG.search() (and to every LLM call) as either a list[LLMMessage] or a MessageHistory object. The example llm_with_neo4j_message_history.py demonstrates Neo4j-backed persistence. Community issue #321 points out that Neo4jMessageHistory does not currently stamp a datetime() property on stored messages, which makes chronological querying awkward — a known limitation worth working around at the application layer if ordering matters.
Common Failure Modes
- Invoking tool calling on an adapter without tool support raises
NotImplementedErrorfrom the defaultLLMInterface.invoke_with_toolsimplementation. Source:src/neo4j_graphrag/llm/base.py:1-200 - Enabling
use_structured_outputon an LLM lacking V2 support raisesRuntimeErrorinside the extractor (entity_relation_extractor.py). - LLM errors surface as
LLMGenerationError; retrieval errors propagate from the retriever, not from the LLM layer.
See Also
- Retrievers page (covers async driver discussion #406)
- Pipeline and configuration loading (
experimental/pipeline/) - Examples index (
examples/README.md)
Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 14 structured pitfall item(s), including 3 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/406
2. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/493
3. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/542
4. Identity risk: Identity risk requires verification
- Severity: medium
- Finding: Project evidence flags a identity risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: identity.distribution | https://github.com/neo4j/neo4j-graphrag-python
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/430
6. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/439
7. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/neo4j/neo4j-graphrag-python
8. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/446
9. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/neo4j/neo4j-graphrag-python
10. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/neo4j/neo4j-graphrag-python
11. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/neo4j/neo4j-graphrag-python
12. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/427
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using neo4j-graphrag-python with real data or production workflows.
- Allow async driver in retrievers - github / github_issue
- [[FEATURE]: Add Anthropic's Structured Output feature](https://github.com/neo4j/neo4j-graphrag-python/issues/493) - github / github_issue
- [[FEATURE]: Add MistralAI Structured Output feature](https://github.com/neo4j/neo4j-graphrag-python/issues/542) - github / github_issue
- [[FEATURE]: Add possibility to truncate retrieved context](https://github.com/neo4j/neo4j-graphrag-python/issues/446) - github / github_issue
- Problem with OllamaEmbedding: "init: embeddings required but some input - github / github_issue
- [[QUESTION]: How can i customise the entity/node extracted from SimpleKGP](https://github.com/neo4j/neo4j-graphrag-python/issues/439) - github / github_issue
- Migrate VertexAIEmbeddings to use google-genai SDK - github / github_issue
- 1.18.0 - github / github_release
- 1.17.0 - github / github_release
- 1.16.1 - github / github_release
- 1.16.0 - github / github_release
- 1.15.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence