Doramagic Project Pack · Human Manual

graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

GraphRAG Overview and Architecture

Related topics: Indexing Pipeline, Data Flow & Incremental Updates, Query Engine and Search Methods, Configuration, LLM Integration, Storage & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Indexing API

Continue reading this section for the full explanation and source context.

Section Query API

Continue reading this section for the full explanation and source context.

Section Prompt Tuning API

Continue reading this section for the full explanation and source context.

Related topics: Indexing Pipeline, Data Flow & Incremental Updates, Query Engine and Search Methods, Configuration, LLM Integration, Storage & Extensibility

GraphRAG Overview and Architecture

Purpose and Scope

GraphRAG is a data pipeline and transformation suite designed to extract meaningful, structured information from unstructured text using large language models (LLMs). The project implements a knowledge-graph–based memory layer that augments LLM reasoning over private datasets, as described in the upstream Microsoft Research blog post and the GraphRAG arXiv paper. Source: README.md:1-9.

The repository is a methodology demonstration, not an officially supported Microsoft product, and indexing is intentionally treated as an expensive operation that should be started on small data first. Source: README.md:17-19. The codebase is published as a monorepo of several Python packages (each with its own README) plus a unified-search-app demo that consumes the resulting index.

Repository and Package Architecture

The monorepo separates concerns into narrowly-scoped libraries that can be composed at runtime. The top-level graphrag package exposes the user-facing API and CLI; the remaining packages provide pluggable, factory-based building blocks.

graph TB
    subgraph "graphrag (main package)"
        API[api: index, query, prompt_tune]
        CLI[cli: graphrag init/index/query]
        CFG[config: GraphRagConfig + load_config]
        IDX[index: run_pipeline, workflows]
    end

    subgraph "Supporting packages"
        CHUNK[graphrag-chunking]
        LLM[graphrag-llm: completion]
        STORE[graphrag-storage]
        CACHE[graphrag-cache]
        IN[graphrag-input]
        COMMON[graphrag-common: factory + config]
    end

    subgraph "Reference consumers"
        APP[unified-search-app]
    end

    API --> IDX
    API --> CFG
    CLI --> API
    IDX --> CHUNK
    IDX --> LLM
    IDX --> STORE
    IDX --> CACHE
    IDX --> IN
    API --> LLM
    APP --> STORE
    APP --> API
    COMMON -. provides .-> CHUNK
    COMMON -. provides .-> LLM
    COMMON -. provides .-> STORE
    COMMON -. provides .-> CACHE

Key architectural conventions:

Core APIs and Pipelines

The public surface of the main package is concentrated in graphrag.api, with three entry points: indexing, query, and prompt tuning. Source: packages/graphrag/graphrag/api/__init__.py:11-33.

Indexing API

build_index(config, method, is_update_run, callbacks, input_documents, ...) runs a pipeline under a GraphRagConfig, choosing an IndexingMethod (e.g., Standard) and a PipelineFactory-resolved workflow. Source: packages/graphrag/graphrag/api/index.py:21-49. The function returns a list of PipelineRunResult records, allowing callers to inspect per-stage output. The is_update_run flag is the existing hook for the highly requested incremental-indexing workflow tracked in community discussion #741 ("Incremental indexing (adding new content)"), which has 35 comments and is currently in the design stage. Source: packages/graphrag/graphrag/api/index.py:30-33.

Query API

graphrag.api.query exposes six search entry points: global_search, global_search_streaming, local_search, local_search_streaming, drift_search, drift_search_streaming, plus basic_search variants re-exported from __init__. Source: packages/graphrag/graphrag/api/query.py:1-19 and packages/graphrag/graphrag/api/__init__.py:23-29. Internally these functions call get_global_search_engine, get_local_search_engine, get_drift_search_engine, and get_basic_search_engine from the query.factory module, then rehydrate the persisted index through read_indexer_* adapter helpers. Source: packages/graphrag/graphrag/api/query.py:33-44. The expected table names for those artifacts are codified in unified-search-app/app/data_config.py (output/communities, output/community_reports, output/entities, output/relationships, output/covariates, output/text_units). Source: unified-search-app/app/data_config.py:6-21.

Prompt Tuning API

generate_indexing_prompts (in graphrag.api.prompt_tune) drives auto-templating: it loads sample documents, detects language and domain, infers entity types, and synthesizes extraction, summarization, community-report, and reporter-role prompts. Source: packages/graphrag/graphrag/api/prompt_tune.py:11-39. This API is explicitly marked as under development and not yet stable. Source: packages/graphrag/graphrag/api/prompt_tune.py:9-11.

CLI Surface

The graphrag CLI is exported from graphrag.cli and is the recommended starting point. The recommended initialization command is graphrag init --root [path] --force, which should be rerun between minor version bumps to pick up the latest config format. Source: packages/graphrag/README.md:51-55.

Supporting Subsystems and Community-Driven Roadmap

Beyond the core APIs, several subsystems implement the storage, caching, and language-model abstractions that the indexer and query engines rely on:

  • The LLMCompletion returned by create_completion is the abstraction used throughout the codebase; it returns either an LLMCompletionResponse or an Iterator[LLMCompletionChunk] for streaming, and gather_completion_response collapses both into a single string. Source: packages/graphrag-llm/graphrag_llm/README.md:5-43.
  • The unified search app's data config defines reasonable defaults for downstream LLM use, including suggested follow-up questions and a 7-day Streamlit cache TTL, and notes that context-window settings should be tuned per model. Source: unified-search-app/app/data_config.py:23-30.
  • The most recent release (v3.1.0) introduced a native CosmosTableProvider with namespace partitioning, transactional batch writes, and a simplified AzureCosmosStorage, plus a litellm dependency update that broadens indirect model-provider support. Source: community release notes for v3.1.0.

Open community threads shape the near-term roadmap and are useful to know when planning an adoption:

  • Incremental indexing (#741, 35 comments). Add new documents to an existing index without a full re-run; design is in progress.
  • Additional model providers (#657, 15 comments; #345, 29 comments for Ollama). Native support beyond OpenAI/Azure is not planned by the core team; the litellm upgrade in v3.1.0 and community workarounds for Ollama remain the primary paths.
  • Cheaper triplet extraction (#632, 2 comments). Interest in integrating Triplex for cost reduction relative to gpt-4o.
  • LazyGraphRAG (#1512, 44 comments). The most-upvoted open question, awaiting a release announcement.

See Also

  • GraphRAG Indexing Pipeline (wiki)
  • GraphRAG Query Engine (wiki)
  • GraphRAG Configuration Reference (wiki)
  • GraphRAG Prompt Tuning Guide (wiki)
  • GraphRAG Storage Backends (wiki)

Source: https://github.com/microsoft/graphrag / Human Manual

Indexing Pipeline, Data Flow & Incremental Updates

Related topics: GraphRAG Overview and Architecture, Query Engine and Search Methods, Configuration, LLM Integration, Storage & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: GraphRAG Overview and Architecture, Query Engine and Search Methods, Configuration, LLM Integration, Storage & Extensibility

Indexing Pipeline, Data Flow & Incremental Updates

Overview and Purpose

GraphRAG's indexing pipeline is the data transformation suite that converts unstructured text into a structured knowledge graph plus derived artifacts (entities, relationships, communities, community reports, embeddings, and covariates). The repository positions this suite as "a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs" README.md. The system warns users that "GraphRAG indexing can be an expensive operation, please read all of the documentation to understand the process and costs involved, and start small" README.md.

The pipeline is composed of modular Python packages:

  • packages/graphrag-input — loaders that ingest source documents from disk, blob storage, or markitdown for PDF parsing packages/graphrag-input/README.md.
  • packages/graphrag-chunking — text splitters (sentence, token, factory-based) that produce text units packages/graphrag-chunking/README.md.
  • packages/graphrag — the core library, including graphrag.api.prompt_tune, graphrag.api.query, the CLI (graphrag.cli.prompt_tune, graphrag.cli.index), and the runnable pipeline runner packages/graphrag/README.md.
  • packages/graphrag-common — shared infrastructure providing the Factory dependency-injection pattern and the load_config system that parses YAML/JSON with Pydantic, environment-variable substitution, and .env loading packages/graphrag-common/README.md.
  • unified-search-app — a Streamlit reference application that consumes the produced parquet outputs to expose search and community exploration unified-search-app/README.md.

The pipeline writes its final results as parquet tables under well-known paths consumed downstream by the query engine and the search app: output/communities, output/community_reports, output/entities, output/relationships, output/covariates, and output/text_units unified-search-app/app/data_config.py.

Data Flow Stages

The runtime flow from raw documents to queryable index follows a five-stage pipeline, each stage producing artifacts that the next stage consumes.

flowchart LR
    A[Input Loader<br/>graphrag-input] --> B[Chunking<br/>graphrag-chunking]
    B --> C[Graph Extraction<br/>LLM: entities/relationships]
    C --> D[Community Detection<br/>+ Report Generation]
    D --> E[Embeddings & Covariates]
    E --> F[(Parquet Outputs<br/>text_units, entities,<br/>relationships, communities,<br/>community_reports, covariates)]
    F --> G[Query / Search App<br/>graphrag.api.query]

Key behaviors observed in the source:

  • Input — loaders read raw files according to a configured input.type (for example, markitdown with a file pattern such as ".*\\.pdf$$") and an input_storage block describing where the input lives (e.g. local type: file, base_dir: input) packages/graphrag-input/README.md. The unified-search-app's create_datasource switches between BlobDatasource and LocalDatasource based on whether blob_account_name is set, demonstrating the same pluggable strategy used inside the indexing CLI unified-search-app/app/knowledge_loader/data_sources/loader.py.
  • Chunking — the ChunkingConfig selects a strategy via create_chunker, with SentenceChunker for boundary detection and TokenChunker for fixed-size windows with overlap packages/graphrag-chunking/README.md. During prompt tuning, the chunking overrides are read from the loaded graph config: if chunk_size != graph_config.chunking.size: graph_config.chunking.size = chunk_size and the same pattern is used for overlap packages/graphrag/graphrag/cli/prompt_tune.py.
  • Prompt Tuning (optional pre-pass)generate_indexing_prompts chunks a sample of documents, derives a domain and persona from the LLM if not supplied, and returns the entity-extraction, entity-summarization, and community-summarization prompts that downstream index stages will use packages/graphrag/graphrag/api/prompt_tune.py. The CLI mirrors this API, writing logs to prompt-tuning.log and honoring overrides for chunk_size, overlap, limit, selection_method, domain, language, max_tokens, discover_entity_types, and min_examples_required packages/graphrag/graphrag/cli/prompt_tune.py.
  • Graph Extraction — text units are sent to the LLM with the tuned prompts to produce entities, relationships, claims/covariates, and descriptions. This stage is the dominant cost driver and is what makes GraphRAG "an expensive operation" README.md.
  • Community Detection and Reporting — Leiden/Leiden-like algorithms produce a hierarchy of communities; an LLM-driven reporter generates per-community summaries that the global search engine consumes packages/graphrag/graphrag/api/query.py.
  • Outputs — the canonical tables listed above are persisted and re-read by local_search and global_search via DataFrame parameters (entities, relationships, text_units, community_reports, covariates, communities) packages/graphrag/graphrag/api/query.py. The unified-search-app's UI then renders these as citations, hyperlinking entity/relationship IDs back to source text units unified-search-app/app/ui/search.py.

Incremental Updates: Current State

Incremental indexing — the ability to add new documents to an existing index without rebuilding from scratch — is the most engaged community topic, tracked in issue #741 "Incremental indexing (adding new content)". As of v3.1.0, the maintainers state that the feature is "in the design stages" and provide a manual workaround. The current repository architecture, however, still assumes a full re-run for the parquet outputs consumed by graphrag.api.query packages/graphrag/graphrag/api/query.py.

What users can do today without a re-index:

  1. Append new files to the input directory configured via input_storage (local or blob) packages/graphrag-input/README.md.
  2. Re-run the full pipeline; the loaders will pick up the new files based on the configured file pattern (e.g. ".*\\.pdf$$") packages/graphrag-input/README.md.
  3. Swap in a different parquet output backend. The v3.1.0 release notes call out a "Native CosmosTableProvider with namespace partitioning, transactional batch writes, and simplified AzureCosmosStorage", which makes it easier to persist index artifacts in Azure Cosmos and treat each pipeline run as a partitioned namespace — a foundation for future incremental runs.

What is not yet first-class:

Until incremental indexing ships, the recommended operational pattern is to version output directories per run and treat each run as immutable.

Configuration, CLI, and Extensibility

All pipeline behavior is driven by settings.yaml, parsed through load_config which "automatically discovers and parses YAML/JSON config files into Pydantic models with support for environment variable substitution and .env file loading" packages/graphrag-common/README.md. Strategies (chunkers, model providers, storage backends) are registered through the Factory class with transient or singleton scope, allowing new implementations to be plugged in without changing call sites packages/graphrag-common/README.md.

The prompt-tuning CLI explicitly honors per-invocation overrides for chunking parameters before delegating to the API, illustrating how users can experiment without rewriting the config file packages/graphrag/graphrag/cli/prompt_tune.py. Community demand for non-OpenAI/Azure providers (issue #657, with #345 focused on Ollama) flows through this same factory mechanism — new model providers are added by registering a strategy string in graphrag-common's Factory rather than by patching core code packages/graphrag-common/README.md.

Downstream, the Streamlit unified-search-app renders citation tables for each context type (sources, reports, entities, relationships, covariates) by reading the parquet outputs the indexing pipeline produces, making the pipeline's contract with the query layer explicit and stable unified-search-app/app/ui/search.py.

See Also

Source: https://github.com/microsoft/graphrag / Human Manual

Query Engine and Search Methods

Related topics: GraphRAG Overview and Architecture, Indexing Pipeline, Data Flow & Incremental Updates, Configuration, LLM Integration, Storage & Extensibility

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: GraphRAG Overview and Architecture, Indexing Pipeline, Data Flow & Incremental Updates, Configuration, LLM Integration, Storage & Extensibility

Query Engine and Search Methods

Overview

The Query Engine is the retrieval layer of Microsoft GraphRAG. After the indexer produces a knowledge graph (entities, relationships, communities, community reports, text units, and optional covariates), the query engine consumes those parquet outputs and returns natural-language answers grounded in the graph. The module exposes a public API and a CLI, and is also embedded inside the Streamlit-based unified-search-app for interactive exploration.

The module's docstring states it "provides access to the query engine of graphrag, allowing external applications to hook into graphrag and run queries over a knowledge graph" and warns that "this API is under development and may undergo changes in future releases. Backwards compatibility is not guaranteed at this time" (packages/graphrag/graphrag/api/query.py). Treat the surface as stable in shape but evolving in detail.

Search Methods

The query engine implements multiple search strategies, each suited to a different question type. They are assembled through a factory in graphrag.query.factory (referenced as get_basic_search_engine, get_drift_search_engine, get_global_search_engine, and get_local_search_engine in the API module) and selected via the CLI's --method argument.

  • Local Search — entity-centric retrieval. Uses entities, their text-unit neighborhoods, relationships, and covariates to answer questions about specific people, places, or concepts. The API signature requires entities, communities, community_reports, text_units, relationships, community_level, and response_type (packages/graphrag/graphrag/api/query.py).
  • Global Search — map-reduce over community reports. The engine distributes the query across many community summaries and consolidates partial answers into a single response. A dynamic_community_selection flag enables runtime selection of communities, capped by community_level (packages/graphrag/graphrag/api/query.py).
  • DRIFT Search — a dynamic variant that combines local and global reasoning by introducing exploratory sub-queries; useful for comparative or "why/how" questions. The CLI exposes it as a distinct --method value (packages/graphrag/graphrag/cli/query.py).
  • Basic Search — text-unit-only retrieval, lightweight, with no graph traversal (packages/graphrag/graphrag/cli/query.py).

Every method has both a blocking variant (global_search, local_search) and a streaming variant (global_search_streaming, local_search_streaming) that yield chunks via an AsyncGenerator (packages/graphrag/graphrag/api/query.py).

flowchart LR
    A[User Query] --> B{Method}
    B -->|local| C[Local Search Engine]
    B -->|global| D[Global Search Engine]
    B -->|drift| E[DRIFT Search Engine]
    B -->|basic| F[Basic Search Engine]
    C --> G[Index Artifacts]
    D --> G
    E --> G
    F --> G
    G --> H[Response + Context]

API and CLI Usage

The API functions take a GraphRagConfig (loaded from settings.yaml) plus the relevant pandas DataFrames. _resolve_output_files in the CLI is responsible for loading the parquet outputs required by a given method (packages/graphrag/graphrag/cli/query.py). Records are normalized into typed objects via read_indexer_entities, read_indexer_relationships, read_indexer_text_units, read_indexer_reports, read_indexer_report_embeddings, read_indexer_communities, and read_indexer_covariates (packages/graphrag/graphrag/api/query.py). Entity and Relationship builders accept configurable column names, so custom indexers can be plugged in by remapping columns (packages/graphrag/graphrag/query/input/loaders/dfs.py).

On the CLI side, graphrag query --method <local|global|drift|basic> is the entry point. Each method has a dedicated runner (run_global_search, run_local_search, run_drift_search, run_basic_search) and the streaming path is triggered with --streaming. Response style is controlled by --response-type (e.g., multiple_paragraphs, single_paragraph, prioritized_list) (packages/graphrag/graphrag/cli/query.py).

Prompt Tuning and the Unified Search App

Before running queries, users typically tune their prompts via graphrag prompt-tune, which uses the same GraphRagConfig to load chunks and produce entity-extraction, entity-summarization, and community-summarization prompts (packages/graphrag/graphrag/api/prompt_tune.py). The CLI override pattern lets the tuning run inject chunk-size and overlap adjustments into the loaded config (packages/graphrag/graphrag/cli/prompt_tune.py).

The unified-search-app is a Streamlit reference consumer of the query engine. home_page.py wires search buttons to run_all_searches and run_generate_questions (unified-search-app/app/home_page.py). ui/search.py renders per-method responses, token usage, LLM call counts, and a Citations panel that lists the entities, relationships, reports, and source chunks the engine consumed (unified-search-app/app/ui/search.py). The expected parquet paths for the app live in app/data_config.py (e.g., output/communities, output/community_reports, output/entities, output/relationships, output/covariates, output/text_units) (unified-search-app/app/data_config.py).

Configuration and Known Constraints

Several practical constraints surface from the source and from community discussion:

  • Model providers. The API and CLI instantiate completion and embedding models through the GraphRagConfig, which natively targets OpenAI and Azure. Community requests for additional providers (Ollama, other SLMs, custom endpoints) are tracked but not yet supported in-tree (issue #657, issue #345).
  • Cheaper extraction. Triplex-style models have been proposed to lower extraction cost during indexing; this affects the indexer rather than the query engine, but the engine consumes the result (issue #632).
  • LazyGraphRAG. A deferred-evaluation variant has been requested; once shipped it would likely plug in alongside the existing factory methods (issue #1512).
  • Incremental indexing. Adding new documents to an existing index currently requires a re-run; the engine is unaffected, but the artifacts it loads would need to be regenerated or extended (issue #741).
  • Data layout. Because the engine loads from parquet, downstream tools must respect the column conventions expected by read_indexer_* helpers; mismatches will fail at load time (packages/graphrag/graphrag/query/input/loaders/dfs.py).
  • API stability. The module docstring explicitly warns that "backwards compatibility is not guaranteed at this time", so pin versions and avoid coupling external code to internal helper signatures (packages/graphrag/graphrag/api/query.py).

See Also

  • Indexing and Prompt Tuning pipeline
  • Configuration reference (settings.yaml and GraphRagConfig)
  • unified-search-app user guide
  • Chunking strategies (packages/graphrag-chunking)
  • Input loaders (packages/graphrag-input)

Source: https://github.com/microsoft/graphrag / Human Manual

Configuration, LLM Integration, Storage & Extensibility

Related topics: GraphRAG Overview and Architecture, Indexing Pipeline, Data Flow & Incremental Updates, Query Engine and Search Methods

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: GraphRAG Overview and Architecture, Indexing Pipeline, Data Flow & Incremental Updates, Query Engine and Search Methods

Configuration, LLM Integration, Storage & Extensibility

Overview

GraphRAG is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using LLMs. Source: packages/graphrag/README.md. Underneath the indexing and query APIs sit four foundational subsystems that determine how the project is configured, how it talks to language models, where it persists intermediate artifacts, and how third parties can plug in new behavior. These four subsystems — configuration, LLM integration, storage, and extensibility — are the primary surfaces users customize when adapting GraphRAG to their own data and infrastructure.

The configuration layer is built on Pydantic-style typed config models exposed under graphrag.config.models. Source: packages/graphrag/graphrag/config/models/__init__.py. It is loaded via load_config and accepts a settings.yaml file as the canonical user-facing configuration artifact.

Configuration System

The GraphRagConfig model is the central object that drives indexing, prompt tuning, and query workflows. Every public API accepts a GraphRagConfig instance and reads model, storage, chunking, and input settings from it. Source: packages/graphrag/graphrag/api/prompt_tune.py.

The prompt_tune API shows the typical usage pattern: the configuration is loaded, an LLM is instantiated via create_completion(default_llm_settings), and downstream operations are configured against the typed model. Source: packages/graphrag/graphrag/api/prompt_tune.py. The CLI mirror in graphrag.cli.prompt_tune calls load_config(root_dir=root) and allows runtime overrides such as chunk_size and chunking.overlap before invoking the prompt-tuning pipeline. Source: packages/graphrag/graphrag/cli/prompt_tune.py.

The unified search application uses a parallel data_config.py module that defines table names for downstream artifacts (output/communities, output/community_reports, output/entities, output/relationships, output/covariates, output/text_units). Source: unified-search-app/app/data_config.py. This reflects how a built index is consumed at query time and how output artifacts are addressed independently of storage backend.

LLM Integration

Native LLM support in GraphRAG is implemented through completion-model configuration objects and a create_completion factory. The prompt-tuning API explicitly retrieves the model via config.get_completion_model_config(PROMPT_TUNING_MODEL_ID) and instantiates the model with create_completion(...). Source: packages/graphrag/graphrag/api/prompt_tune.py.

A second completion model is retrieved for graph extraction when discover_entity_types is enabled: config.get_completion_model_config(config.extract_graph.completion_model_id). Source: packages/graphrag/graphrag/api/prompt_tune.py. This separation lets users run a cheaper model for prompt tuning while keeping a stronger model for entity/relationship extraction.

For query time, both global (global_search) and local (local_search) APIs in graphrag.api.query accept the same GraphRagConfig, ensuring a consistent model-selection surface across indexing and retrieval. Source: packages/graphrag/graphrag/api/query.py.

Storage Architecture

The graphrag-storage package provides a unified storage abstraction. By default the create_storage factory ships with four preregistered providers corresponding to a StorageType enum. Source: packages/graphrag-storage/README.md.

graph LR
    Config["GraphRagConfig"] --> Factory["create_storage / storage_factory"]
    Factory --> FS["FileStorage"]
    Factory --> ABS["AzureBlobStorage"]
    Factory --> ACS["AzureCosmosStorage"]
    Factory --> MS["MemoryStorage"]
    User["User-defined Storage subclass"] -.register.-> Factory

Registration is dynamic — FileStorage is only imported when requested — and users can bypass preregistration by importing storage_factory directly for a clean factory. Source: packages/graphrag-storage/README.md. The v3.1.0 release notes describe a native CosmosTableProvider with namespace partitioning, transactional batch writes, and a simplified AzureCosmosStorage, indicating that the storage layer is actively evolving toward richer table semantics. Source: packages/graphrag-storage/README.md.

ProviderTypical UseNotes from source
FileStorageLocal developmentDefault; lazily imported
AzureBlobStorageCloud blob persistencePre-registered
AzureCosmosStorageCosmos DB-backed storageSimplified in v3.1.0; adds table-provider semantics
MemoryStorageTests / ephemeral pipelinesPre-registered

Extensibility

GraphRAG is designed for extension at every layer. Three concrete extension points are documented:

Common Failure Modes and Community Notes

Several recurring community discussions intersect with the topics on this page. Users have repeatedly asked for non-OpenAI/Azure model providers such as Ollama (#657, #345) and for cheaper extractors like Triplex (#632); because native support is limited to OpenAI and Azure, these integrations typically rely on OpenAI-compatible endpoints wired through create_completion. Source: packages/graphrag/graphrag/api/prompt_tune.py.

Incremental indexing (#741) is itself an extensibility concern: users wishing to add content today must re-run the full pipeline because the storage and configuration layers do not yet expose a partial-update API. Source: packages/graphrag-storage/README.md.

See Also

Source: https://github.com/microsoft/graphrag / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 6 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Capability evidence risk - Capability evidence risk requires verification.

1. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | https://github.com/microsoft/graphrag

2. Maintenance risk: Maintenance risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/microsoft/graphrag

3. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: downstream_validation.risk_items | https://github.com/microsoft/graphrag

4. Security or permission risk: Security or permission risk requires verification

  • Severity: medium
  • Finding: no_demo
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: risks.scoring_risks | https://github.com/microsoft/graphrag

5. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: issue_or_pr_quality=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/microsoft/graphrag

6. Maintenance risk: Maintenance risk requires verification

  • Severity: low
  • Finding: release_recency=unknown。
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: evidence.maintainer_signals | https://github.com/microsoft/graphrag

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using graphrag with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence