Doramagic Project Pack · Human Manual

neo4j-graphrag-python

Neo4j GraphRAG for Python

Overview, Installation & Configuration

Related topics: Retrievers and Database Search, Knowledge Graph Construction Pipeline, Generation, LLM Providers, Embeddings & Message History

Section Related Pages

Continue reading this section for the full explanation and source context.

Section 3.1 Neo4j Driver and Database

Continue reading this section for the full explanation and source context.

Section 3.2 LLM Providers

Continue reading this section for the full explanation and source context.

Section 3.3 Embedders

Continue reading this section for the full explanation and source context.

Related topics: Retrievers and Database Search, Knowledge Graph Construction Pipeline, Generation, LLM Providers, Embeddings & Message History

Overview, Installation & Configuration

1. Project Purpose and Scope

The neo4j-graphrag-python library is an official Neo4j integration that enables Graph Retrieval-Augmented Generation (GraphRAG) workflows. It provides Python-first building blocks to construct knowledge graphs from unstructured data, retrieve context from Neo4j using vectors, full-text search, or Cypher, and feed the results into LLM-based question answering.

The library supports two complementary workflows:

  • Knowledge Graph Construction — extracting entities, relationships, and a lexical graph from text/PDF inputs and writing them into Neo4j.
  • Retrieval and Generation — querying the graph via vector search, hybrid search, Text2Cypher, or Cypher templates, and producing grounded answers.

A high-level representation of the system:

flowchart LR
    A[Documents / Text] --> B[SimpleKGPipeline / Pipeline]
    B --> C[(Neo4j Graph)]
    C --> D[Retriever]
    D --> E[GraphRAG / Prompt]
    E --> F[LLM Answer]
    classDef store fill:#eef,stroke:#447;
    class C store;

Construction modules such as LexicalGraphBuilder (lexical_graph.py) and Neo4jWriter (kg_writer.py) write structured graphs, while retrieval classes consume the resulting index.

2. Installation

The project ships as a standard Python package and recommends using uv for dependency management. Source: README.md

Two install profiles are supported:

ProfileCommandPurpose
Runtime onlyuv sync (or pip install neo4j-graphrag)Use the library in an application
Developmentuv sync --group devContributing, includes tooling
All extras (tests)uv sync --all-extrasRun the full unit test suite

Prerequisites:

  • A reachable Neo4j instance (local, Docker, or Aura).
  • For APOC-backed features, install the APOC plugin in the target Neo4j deployment (see README.md note: *“the APOC plugin … must be installed … to use this feature”*).
  • Credentials and connection URL exposed to the Python process (typically via environment variables).

3. Core Configuration Objects

3.1 Neo4j Driver and Database

A neo4j GraphDatabase.driver instance is the entry point used by retrievers, the SimpleKGPipeline, and Neo4jWriter. The writer accepts an optional neo4j_database parameter and falls back to the server default database when omitted (Neo4jWriter source: kg_writer.py). Note: the current retrievers and pipelines accept the synchronous driver; community issue #406 (“Allow async driver in retrievers”) tracks the request for full AsyncGraphDatabase support.

3.2 LLM Providers

LLMs are abstracted behind the LLMInterface. Built-in providers include:

  • OpenAIOpenAILLM with sync/async invoke, structured outputs (V2), and tool calling (openai_llm.py).
  • VertexAI — analogous provider with structured output support.
  • Anthropic — currently lacks structured output parity; community issue #493 tracks the request.
  • Google GeminiGoogleGenAILLM wraps GenerateContentResponse and parses function_call parts (google_genai_llm.py).
  • Ollama — local-model provider with tool-call conversion helpers (ollama_llm.py).

Each LLM accepts model_params (temperature, max tokens, etc.) and a message_history that can be a list of LLMMessage or any MessageHistory subclass.

3.3 Embedders

Embedders live in neo4j_graphrag.embeddings and include OpenAIEmbeddings, AzureOpenAIEmbeddings, VertexAIEmbeddings, MistralAIEmbeddings, CohereEmbeddings, OllamaEmbeddings, and a SentenceTransformerEmbeddings baseline. They are consumed by retrievers for vector indexing and by the pipeline when chunk embeddings need to be persisted.

3.4 Pipeline Configuration via `ObjectConfig`

For declarative deployments, components are described through ObjectConfig, which is a Pydantic Generic[T] wrapper containing a fully-qualified class_ path and a params_ dict. This allows a pipeline to be specified entirely in YAML/JSON and instantiated via parse(...), while still being able to pass already-instantiated objects in code. The wrapper enforces type-correctness against Embedder and LLMInterface at parse time, and uses a DEFAULT_MODULE and INTERFACE to resolve the target class. Source: object_config.py:73-110

4. Getting Started Patterns

4.1 Build a Knowledge Graph

The README’s quickstart instantiates a SimpleKGPipeline with a driver, embedder, LLM, schema hints, and from_file=True. The wrapper class internally composes an EntityRelationExtractor (see entity_relation_extractor.py) and a Neo4jWriter. Concurrency is controlled by max_concurrency through an asyncio.Semaphore, and chunks are extracted in parallel.

4.2 Retrieval

Retrievers exposed by the package are accessible from the examples/ index (examples/README.md): similarity search (similarity_search_for_vector.py, similarity_search_for_text.py), VectorCypherRetriever, HybridRetriever, HybridCypherRetriever, Text2CypherRetriever, and CypherTemplateRetriever. Each retriever exposes a search(...) method returning nodes/records plus an optional context blob that can be forwarded to a GraphRAG or custom prompt.

4.3 Question Answering

The GraphRAG.search(...) method is the canonical entry point for retrieval-augmented QA. Community issue #148 (“Should return_context be enabled or not by default?”) flagged a behavioral concern at graphrag.py:88; users adopting the library should explicitly verify whether return_context is required by their prompt templates.

5. Common Failure Modes and Tips

  • Synchronous driver only: As noted in #406, mixing async application code with sync drivers requires wrapping with asyncio.to_thread. Plan accordingly when embedding retrievers in async services.
  • Message history without timestamps: Neo4jMessageHistory does not currently write timestamps (community #321). If you need audit/orderable message trails, add a custom property at write time.
  • Structured output availability: Not every LLM provider exposes LLMInterfaceV2/supports_structured_output. The schema extractor schema.py only enables V2 when the underlying LLM advertises the capability.
  • Prompt JSON drift: The V1 entity extraction prompt (prompts.py) is JSON-string based and relies on fix_invalid_json. For production, prefer V2 structured outputs when available.
  • Constraint conflicts: Release 1.18.0 forbids applying KEY and EXISTENCE constraints on the same property (CHANGELOG notes from PR #537/#536) — align your schema constraints to avoid writer failures.

See Also

Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual

Retrievers and Database Search

Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History, Knowledge Graph Construction Pipeline

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Vector Retriever

Continue reading this section for the full explanation and source context.

Section Hybrid and HybridCypher Retrievers

Continue reading this section for the full explanation and source context.

Section Text2Cypher Retriever

Continue reading this section for the full explanation and source context.

Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History, Knowledge Graph Construction Pipeline

Purpose and Scope

Retrievers are the data-access layer of neo4j-graphrag-python. They encapsulate every supported way of fetching relevant content from a Neo4j database — or, in the case of external vector stores, from third-party systems — so that the retrieved context can be fed into a generation step or a GraphRAG pipeline. The retriever interface is intentionally narrow: a retriever accepts a query and an optional configuration dictionary, runs a query (vector, fulltext, Cypher, or a hybrid combination), and returns a list of records transformed into RetrieverResultItem objects.

The full set of built-in retrievers exposed via the package entry point is defined in src/neo4j_graphrag/retrievers/__init__.py, which re-exports the concrete implementations. All concrete retrievers inherit from a shared abstract Retriever base class declared in src/neo4j_graphrag/retrievers/base.py. The base class performs driver validation, optionally verifies the Neo4j server version, and defines the public search() contract that downstream code (such as GraphRAG) relies on.

Built-in Retriever Types

The package ships several retrieval strategies, each targeting a different retrieval pattern.

Vector Retriever

VectorRetriever (implemented in src/neo4j_graphrag/retrievers/vector.py) performs k-Nearest-Neighbor similarity search against a Neo4j vector index. It accepts either a precomputed query_vector or a query_text together with an Embedder instance, and supports an optional retrieval_query Cypher fragment for post-filtering or expansion. A simple text-to-vector search example is documented in examples/README.md under retrieve/similarity_search_for_text.py.

Hybrid and HybridCypher Retrievers

HybridRetriever and HybridCypherRetriever (in src/neo4j_graphrag/retrievers/hybrid.py) combine vector similarity with a Neo4j fulltext index, fusing the two result lists using Reciprocal Rank Fusion (RRF) before applying any retrieval_query post-processing. HybridCypherRetriever is a strict superset that always requires a Cypher post-query, exposing the upstream node variable so callers can traverse related entities in the same round-trip. As stated in the docstring of src/neo4j_graphrag/retrievers/hybrid.py, when an embedder is supplied at construction time the retriever accepts query_text; otherwise the caller must pass a precomputed query_vector.

Text2Cypher Retriever

Text2CypherRetriever (in src/neo4j_graphrag/retrievers/text2cypher.py) delegates natural-language question understanding to an LLM. It uses the Text2CypherTemplate defined in src/neo4j_graphrag/generation/prompts.py to render a schema-aware prompt, generates a Cypher statement, and executes it. The template accepts schema and optional examples and explicitly instructs the LLM not to wrap the Cypher in triple backticks, which is critical because the generated query is executed directly against the database.

Tools Retriever

ToolsRetriever (in src/neo4j_graphrag/retrievers/tools_retriever.py) acts as an LLM-driven orchestrator over a list of tool objects. Each tool is typically obtained by calling .convert_to_tool() on a child retriever (for example, a VectorRetriever converted into a vector_search tool and a Text2CypherRetriever converted into a cypher_search tool). The LLM analyzes the user query and selects which tools to invoke, after which ToolsRetriever collects and returns the combined results. Duplicate tool names are rejected at construction time with a ValueError, and the class explicitly disables Neo4j version verification because it does not issue its own queries.

Data Flow Through a Retriever

The following diagram summarises how a user query is routed from a caller (e.g. GraphRAG.search) down to the database and back.

flowchart TD
    A[Caller: GraphRAG.search or direct retriever.search] --> B{Retriever type}
    B -- Vector --> C[VectorRetriever<br/>embed query_text if needed]
    B -- Hybrid --> D[HybridRetriever<br/>vector + fulltext + RRF]
    B -- Text2Cypher --> E[Text2CypherRetriever<br/>LLM generates Cypher]
    B -- Tools --> F[ToolsRetriever<br/>LLM picks tools]
    C --> G[Execute against Neo4j<br/>via neo4j.Driver]
    D --> G
    E --> G
    F --> H[Invoke selected tool retrievers]
    H --> G
    G --> I[RetrieverResultItem list]
    I --> J[Optional: prompt template<br/>injects context]
    J --> K[LLM answer]

Configuration Parameters

The base retriever interface exposes a uniform retriever_config dictionary that is forwarded to each implementation. The following table summarises the common parameters and where they are documented.

ParameterUsed byDescription
top_kVectorRetriever, HybridRetriever, HybridCypherRetrieverMaximum number of items to return from the index search.
query_textAll retrievers with an embedderNatural-language query; embedded on the fly.
query_vectorVector and Hybrid variantsPrecomputed embedding vector; bypasses the embedder.
retrieval_queryVectorRetriever, HybridCypherRetrieverAdditional Cypher appended after the index lookup, with node bound to the matched record.
filtersMost retrieversWHERE-clause fragment applied at query time.
database (or neo4j_database at construction)AllTargets a non-default Neo4j database.
return_contextGraphRAG.searchWhen True, the raw retrieved context is included in the RagResultModel.

GraphRAG.search() in src/neo4j_graphrag/generation/graphrag.py explicitly warns that the default value of return_context will change from False to True in a future version — a deprecation noted by the community in issue #148, which argues that surfacing the retrieved context by default is more aligned with GraphRAG's value proposition.

Known Limitations and Community Discussion

Two recurring community concerns are directly tied to the retriever interface.

Synchronous-only driver. Issue #406 observes that retrievers such as Text2CypherRetriever and VectorCypherRetriever accept only the synchronous neo4j.Driver, forcing callers who have adopted the async driver to wrap calls in threads. The Retriever base class is built around driver.execute_query()-style synchronous calls, and external retrievers for Weaviate, Pinecone, and Qdrant (listed in examples/README.md) follow the same convention. Until async support lands, the recommended pattern for async applications is to run retriever calls via asyncio.to_thread or to keep an async boundary only around the LLM invocation step.

Default return_context. As noted above, GraphRAG.search() currently defaults to return_context=False, which drops the retrieval context from the result. Issue #148 argues that the default should be True so that callers can debug retrieval quality and so that GraphRAG's main benefit — richer context — is visible out of the box. The source has already been updated with a deprecation warning, signalling that a flip to True is on the roadmap.

See Also

  • GraphRAG Generation Pipeline
  • Embedders and Vector Index Configuration
  • Text2Cypher Prompt Templates
  • External Retrievers (Weaviate, Pinecone, Qdrant)
  • Knowledge Graph Builder Pipeline

Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual

Knowledge Graph Construction Pipeline

Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Lexical Graph Construction

Continue reading this section for the full explanation and source context.

Section Entity and Relation Extraction

Continue reading this section for the full explanation and source context.

Section Schema Extraction, Validation, and Pruning

Continue reading this section for the full explanation and source context.

Related topics: Overview, Installation & Configuration, Generation, LLM Providers, Embeddings & Message History

Knowledge Graph Construction Pipeline

Purpose and Scope

The Knowledge Graph Construction Pipeline is the subsystem of neo4j-graphrag-python responsible for transforming unstructured text or PDF documents into a populated Neo4j graph. It provides two entry points: a low-level, highly customizable Pipeline class and a higher-level SimpleKGPipeline abstraction that wires the standard components together. According to the project README, both classes accept text or PDF input and target users who want to "build knowledge graph" pipelines (Source: README.md:50-70).

The pipeline is asynchronous, component-based, and designed to plug in custom loaders, splitters, embedders, schema builders, extractors, pruners, and writers (Source: README.md:55-65).

Pipeline Architecture

Construction follows a staged data flow. Each stage is a Component that produces typed outputs consumed by the next stage:

flowchart LR
    A[Document Loader<br/>PDF / Text / Custom] --> B[Text Splitter<br/>FixedSizeSplitter]
    B --> C[Chunk Embedder<br/>OpenAIEmbeddings]
    B --> D[SchemaFromTextExtractor<br/>optional]
    C --> E[LexicalGraphBuilder]
    C --> F[EntityRelationExtractor<br/>LLM-driven]
    F --> G[Schema / GraphPruner]
    D --> G
    E --> G
    G --> H[Neo4jWriter]
    H --> I[(Neo4j Graph DB)]

The SimpleKGPipeline composes this default chain but each node can be replaced through the lower-level Pipeline API (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:60-180).

Lexical Graph Construction

LexicalGraphBuilder creates the "bookkeeping" layer that connects chunks back to their source document. It builds a Document node, one Chunk node per text chunk, a FROM_DOCUMENT (or configurable) relationship from each chunk to the document, and a NEXT_CHUNK relationship between sequential chunks (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:40-95). When document metadata is absent, only chunks and NEXT_CHUNK edges are produced (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:55-65). Entity-to-chunk linking is established later by EntityRelationExtractor.process_chunk_extracted_entities (Source: src/neo4j_graphrag/experimental/components/lexical_graph.py:150-175).

Entity and Relation Extraction

EntityRelationExtractor is the LLM-driven heart of the pipeline. For each chunk it formats a prompt using ERExtractionTemplate (Source: src/neo4j_graphrag/generation/prompts.py:80-115) and calls the configured LLM asynchronously. Two execution modes are supported:

ModeMechanismWhen to use
V1 (default)Prompt-based JSON, repaired via fix_invalid_jsonAny LLMInterface implementation
V2Structured output against Neo4jGraph schemaLLMs declaring supports_structured_output (e.g., OpenAILLM, VertexAILLM)

Failure handling is controlled by the OnError enum: RAISE surfaces LLMGenerationError, while IGNORE logs and continues with an empty graph for that chunk (Source: src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:110-160). Concurrency is bounded by max_concurrency using an asyncio.Semaphore (Source: src/neo4j_graphrag/experimental/components/entity_relation_extractor.py:85-105).

Schema Extraction, Validation, and Pruning

SchemaFromTextExtractor builds a GraphSchema either through prompt-based extraction or via the structured GraphSchemaExtractionOutput model (Source: src/neo4j_graphrag/experimental/components/schema.py:30-90). Recent releases (v1.18.0) auto-reconcile duplicate relationship types emitted by the LLM and forbid KEY + EXISTENCE constraints on the same property (Source: README.md community context / release notes).

GraphPruner enforces the schema on the extracted graph, removing nodes with missing labels, missing required properties, or unknown labels (when additional_node_types=False). If a node is dropped, all its relationships are dropped too (Source: src/neo4j_graphrag/experimental/components/graph_pruning.py:40-90). Pattern-level cross-reference checks happen both during extraction (_extraction_filter_invalid_patterns) and at validation time (_extraction_apply_cross_reference_filters) (Source: src/neo4j_graphrag/experimental/components/schema.py:200-260).

Persistence

Neo4jWriter (subclass of KGWriter) persists the final graph in configurable batches (batch_size, default 1000) and supports a custom neo4j_database argument (Source: src/neo4j_graphrag/experimental/components/kg_writer.py:60-100). Nodes lacking a unique key are auto-assigned a synthetic __id__ so they can be referenced from relationships (Source: src/neo4j_graphrag/experimental/components/kg_writer.py:10-40).

Using `SimpleKGPipeline`

The recommended entry point for new users. The example in the README shows how to define node_types, relationship_types, patterns, an OpenAIEmbeddings embedder, and an OpenAILLM, then call await kg_builder.run_async(text=...) or kg_builder.run(file_path=...) (Source: README.md:60-110).

Key constructor parameters (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:80-180):

ParameterDescription
driverNeo4j driver used by the writer
llmLLM used for schema extraction and ER extraction
embedderEmbedder used for chunk embeddings
schemaPre-defined GraphSchema (preferred over entities/relations/potential_schema)
from_fileIf True, expects file_path (PDF/Markdown); otherwise expects text
text_splitter, file_loaderCustomizable pipeline stages
on_errorRAISE or IGNORE for ER extraction failures
create_lexical_graphToggle lexical graph creation

The entities, relations, and potential_schema parameters are deprecated since 1.7.1 in favor of schema (Source: src/neo4j_graphrag/experimental/pipeline/kg_builder.py:30-60).

Customization and Common Pitfalls

See Also

  • Retrievers and GraphRAG.search() (see "Retrievers and Question Answering" page)
  • LLM and Embedder provider abstraction
  • Schema configuration and constraints

Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual

Generation, LLM Providers, Embeddings & Message History

Related topics: Overview, Installation & Configuration, Retrievers and Database Search, Knowledge Graph Construction Pipeline

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Structured Output (V2) and the Anthropic Gap

Continue reading this section for the full explanation and source context.

Related topics: Overview, Installation & Configuration, Retrievers and Database Search, Knowledge Graph Construction Pipeline

Generation, LLM Providers, Embeddings & Message History

Overview

The neo4j-graphrag-python library separates *what* the pipeline does (retrieval, extraction, schema inference) from *how* it talks to a model. The generation/ package hosts the user-facing GraphRAG orchestrator and its prompt templates, while llm/ (alongside the embedders package) exposes provider-neutral abstractions so users can swap vendors without rewriting retrieval logic. Message history lives next to the LLM layer and is reused both by retrievers and by GraphRAG.search(). Together these modules form the generation half of any GraphRAG pipeline.

flowchart LR
    User[User query] --> GR[GraphRAG.search]
    GR --> Ret[Retriever]
    Ret --> Ctx[Retrieved context]
    Ctx --> Prompt[RagTemplate]
    Prompt --> LLM[LLMInterface / LLMInterfaceV2]
    LLM --> Out[Answer + optional context]
    History[MessageHistory / Neo4jMessageHistory] --> GR
    History --> LLM

Source: src/neo4j_graphrag/generation/graphrag.py:1-200 · src/neo4j_graphrag/llm/base.py:1-200

The GraphRAG Generation Pipeline

The main entry point is GraphRAG in src/neo4j_graphrag/generation/graphrag.py. It accepts a Retriever, an LLM (typed as LLMInterface, LLMInterfaceV2, or any LangChain chat model), and a RagTemplate. Inputs are validated through RagInitModel; failures raise RagInitializationError. Its search() method takes query_text, optional message_history, examples, retriever_config, return_context, and response_fallback; the method runs the retriever, formats the prompt, and calls the LLM in a single call. Source: src/neo4j_graphrag/generation/graphrag.py:1-150

A long-running community discussion (#148) highlights that return_context currently defaults to False, even though the docstring warns the default will change to True in a future release. Applications that rely on the retrieved-context block today should pass return_context=True explicitly to stay forward-compatible.

Prompt construction happens in src/neo4j_graphrag/generation/prompts.py. Three PromptTemplate subclasses cover the core flows:

  • RagTemplate — the default for GraphRAG.search, formats the retrieved context plus the user question.
  • Text2CypherTemplate — builds a Cypher statement from a natural-language query, given a graph schema and optional few-shot examples. It still accepts the deprecated query argument and emits a DeprecationWarning.
  • ERExtractionTemplate — instructs the LLM to emit entity/relationship JSON for knowledge-graph construction; the output is post-processed by fix_invalid_json.

Customising these templates is the simplest way to tune extraction or QA behaviour — the examples/README.md "Prompts" section ships a custom_prompt.py reference. Source: examples/README.md:1-200

LLM Provider Abstraction

Every vendor adapter implements one of two abstract interfaces in src/neo4j_graphrag/llm/base.py:

  • LLMInterface (V1) — exposes invoke, ainvoke, invoke_with_tools, and ainvoke_with_tools, returning either an LLMResponse or a ToolCallResponse. The default invoke_with_tools body raises NotImplementedError for adapters that do not provide tool/function calling.
  • LLMInterfaceV2 (V2) — adds response_format support for structured outputs, used by the extraction and schema-inference components.

Shared message and response shapes are defined in src/neo4j_graphrag/llm/types.py: LLMMessage/BaseMessage (with UserMessage and SystemMessage subclasses), LLMResponse(content, usage), plus the tool types ToolCall and ToolCallResponse.

The library ships adapters for OpenAI (src/neo4j_graphrag/llm/openai_llm.py), Ollama, Google GenAI (src/neo4j_graphrag/llm/google_genai_llm.py), Anthropic, Vertex AI, MistralAI, and Cohere. Each adapter converts the neutral Tool objects to the provider's native format inside invoke_with_tools/ainvoke_with_tools. Examples for OpenAI, VertexAI, and Ollama tool calling live under examples/customize/llms/.

Structured Output (V2) and the Anthropic Gap

Structured output is opt-in via use_structured_output=True. src/neo4j_graphrag/experimental/components/schema.py (SchemaFromTextExtractor) verifies that the chosen LLM exposes supports_structured_output before issuing a V2 call with response_format=GraphSchemaExtractionOutput. The same gate exists in src/neo4j_graphrag/experimental/components/entity_relation_extractor.py, where V2 extraction returns Neo4jGraph.model_validate_json(llm_result.content) and the docstring explicitly states that V2 is currently supported only for OpenAILLM and VertexAILLM.

Community issue #493 requests the same treatment for AnthropicLLM. Until that is merged, Anthropic users must keep use_structured_output=False, forcing the V1 ERExtractionTemplate path together with the fix_invalid_json post-processor.

Embeddings and Message History

Embedders live outside the LLM package but follow the same plug-in philosophy: the Embedder base is referenced in src/neo4j_graphrag/experimental/pipeline/config/object_config.py as a first-class ObjectConfig target, so an embedder can be specified either in code or in a YAML pipeline config. examples/README.md lists adapters for OpenAI, Azure OpenAI, VertexAI, MistralAI, Cohere, Ollama, and a custom_embeddings.py template.

Message history is passed to GraphRAG.search() (and to every LLM call) as either a list[LLMMessage] or a MessageHistory object. The example llm_with_neo4j_message_history.py demonstrates Neo4j-backed persistence. Community issue #321 points out that Neo4jMessageHistory does not currently stamp a datetime() property on stored messages, which makes chronological querying awkward — a known limitation worth working around at the application layer if ordering matters.

Common Failure Modes

  • Invoking tool calling on an adapter without tool support raises NotImplementedError from the default LLMInterface.invoke_with_tools implementation. Source: src/neo4j_graphrag/llm/base.py:1-200
  • Enabling use_structured_output on an LLM lacking V2 support raises RuntimeError inside the extractor (entity_relation_extractor.py).
  • LLM errors surface as LLMGenerationError; retrieval errors propagate from the retriever, not from the LLM layer.

See Also

  • Retrievers page (covers async driver discussion #406)
  • Pipeline and configuration loading (experimental/pipeline/)
  • Examples index (examples/README.md)

Source: https://github.com/neo4j/neo4j-graphrag-python / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Identity risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 14 structured pitfall item(s), including 3 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

1. Installation risk: Installation risk requires verification

  • Severity: high
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/406

2. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/493

3. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/542

4. Identity risk: Identity risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a identity risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: identity.distribution | https://github.com/neo4j/neo4j-graphrag-python

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/430

6. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/439

7. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/neo4j/neo4j-graphrag-python

8. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/446

9. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/neo4j/neo4j-graphrag-python

10. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/neo4j/neo4j-graphrag-python

11. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/neo4j/neo4j-graphrag-python

12. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | https://github.com/neo4j/neo4j-graphrag-python/issues/427

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using neo4j-graphrag-python with real data or production workflows.

  • Allow async driver in retrievers - github / github_issue
  • [[FEATURE]: Add Anthropic's Structured Output feature](https://github.com/neo4j/neo4j-graphrag-python/issues/493) - github / github_issue
  • [[FEATURE]: Add MistralAI Structured Output feature](https://github.com/neo4j/neo4j-graphrag-python/issues/542) - github / github_issue
  • [[FEATURE]: Add possibility to truncate retrieved context](https://github.com/neo4j/neo4j-graphrag-python/issues/446) - github / github_issue
  • Problem with OllamaEmbedding: "init: embeddings required but some input - github / github_issue
  • [[QUESTION]: How can i customise the entity/node extracted from SimpleKGP](https://github.com/neo4j/neo4j-graphrag-python/issues/439) - github / github_issue
  • Migrate VertexAIEmbeddings to use google-genai SDK - github / github_issue
  • 1.18.0 - github / github_release
  • 1.17.0 - github / github_release
  • 1.16.1 - github / github_release
  • 1.16.0 - github / github_release
  • 1.15.0 - github / github_release

Source: Project Pack community evidence and pitfall evidence