Doramagic Project Pack · Human Manual

HippoRAG

HippoRAG is a graph-based Retrieval-Augmented Generation (RAG) framework designed to enable Large Language Models (LLMs) to identify and leverage connections within knowledge bases for imp...

Installation and Setup

Related topics: Configuration System, Deployment Options

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python Version

Continue reading this section for the full explanation and source context.

Section Hardware Requirements

Continue reading this section for the full explanation and source context.

Section Method 1: pip Installation (Recommended)

Continue reading this section for the full explanation and source context.

Related topics: Configuration System, Deployment Options

Installation and Setup

Overview

HippoRAG is a graph-based Retrieval-Augmented Generation (RAG) framework designed to enable Large Language Models (LLMs) to identify and leverage connections within knowledge bases for improved retrieval and question answering. The installation process configures the necessary dependencies, environment variables, and model configurations to run HippoRAG in either cloud (OpenAI) or local (vLLM) deployment modes.

Sources: README.md

System Requirements

Python Version

RequirementVersion
Python>= 3.10

The package explicitly requires Python 3.10 or higher as specified in the setup.py configuration.

Sources: setup.py:16

Hardware Requirements

ComponentRequirement
GPUCUDA-compatible GPU(s) recommended
GPU MemoryVaries based on model size (see deployment sections)

For local deployment with vLLM, the framework supports tensor parallelism across multiple GPUs. The README recommends reserving enough memory for embedding models when deploying LLM servers.

Sources: README.md

Installation Methods

conda create -n hipporag python=3.10
conda activate hipporag
pip install hipporag

This method installs HippoRAG version 2.0.0-alpha.4 along with all core dependencies from PyPI.

Sources: README.md

Method 2: Source Installation (For Development)

git clone https://github.com/OSU-NLP-Group/HippoRAG.git
cd HippoRAG
pip install -e .

Clone the repository and install in editable mode to work with the latest source code.

Sources: CONTRIBUTING.md

Environment Variables

Proper configuration of environment variables is essential for HippoRAG to function correctly. These variables control GPU allocation, model caching, and API access.

Required Environment Variables

VariableDescriptionExample
CUDA_VISIBLE_DEVICESComma-separated list of GPU device IDs0,1,2,3
HF_HOMEPath to Hugging Face cache directory/path/to/huggingface/home
OPENAI_API_KEYAPI key for OpenAI models (cloud mode only)sk-...

Setting Environment Variables

# Set CUDA visible devices
export CUDA_VISIBLE_DEVICES=0,1,2,3

# Set Hugging Face cache location
export HF_HOME=<path to Huggingface home directory>

# Set OpenAI API key (required for cloud deployment)
export OPENAI_API_KEY=<your openai api key>

Sources: README.md

Core Dependencies

HippoRAG depends on a comprehensive set of libraries for LLM inference, embedding models, graph processing, and data handling.

Dependency Overview

PackageVersionPurpose
torch2.5.1PyTorch deep learning framework
transformers4.45.2Model architectures and tokenizers
vllm0.6.6.post1High-throughput LLM inference
openai1.91.1OpenAI API client
litellm1.73.1Unified LLM interface
gritlm1.0.2Embedding model
networkx3.4.2Graph data structures
python_igraph0.11.8Graph algorithms
tiktoken0.7.0Tokenization
pydantic2.10.4Data validation
tenacity8.5.0Retry logic
einops(latest)Tensor operations
tqdm(latest)Progress bars
boto3(latest)AWS S3 integration

Sources: setup.py:17-32, requirements.txt

Additional Dependencies

The requirements.txt file includes additional packages not pinned to specific versions:

PackagePurpose
nest_asyncioAsynchronous operations
numpyNumerical computing
scipyScientific computing

Sources: requirements.txt

Configuration

HippoRAG uses a Pydantic-based configuration system defined in BaseConfig within config_utils.py. This configuration controls all aspects of indexing, retrieval, and QA.

Configuration Parameters

#### Embedding Configuration

ParameterDefaultDescription
embedding_model_namenvidia/NV-Embed-v2Name of the embedding model
embedding_batch_size16Batch size for embedding encoding
embedding_return_as_normalizedTrueWhether to normalize embeddings
embedding_max_seq_len2048Maximum sequence length for embeddings
embedding_model_dtypeautoData type for local embedding models

#### Retrieval Configuration

ParameterDefaultDescription
retrieval_top_k200Number of documents to retrieve
linking_top_k5Number of linked nodes per retrieval step
damping0.5Damping factor for PPR algorithm
passage_node_weight0.05Weight modifier for passage nodes in PPR

#### QA Configuration

ParameterDefaultDescription
max_qa_steps1Maximum steps for interleaved retrieval and reasoning
qa_top_k5Top k documents fed to QA model

#### Graph Construction Configuration

ParameterDefaultDescription
synonymy_edge_topk2047K for KNN retrieval in synonymy edge building
synonymy_edge_sim_threshold0.8Similarity threshold for synonymy nodes
is_directed_graphFalseWhether the graph is directed
graph_typefacts_and_sim_passage_node_unidirectionalType of graph structure

#### Information Extraction Configuration

ParameterDefaultDescription
information_extraction_model_nameopenie_openai_gptOpenIE model class name
openie_modeonlineMode: "online" or "offline"

#### Preprocessing Configuration

ParameterDefaultDescription
text_preprocessor_class_nameTextPreprocessorPreprocessor class name
preprocess_encoder_namegpt-4oEncoder for preprocessing
preprocess_chunk_overlap_token_size128Overlap tokens between chunks
preprocess_chunk_max_token_sizeNoneMax tokens per chunk (None = whole doc)
preprocess_chunk_funcby_tokenChunking function type

Sources: src/hipporag/utils/config_utils.py

Deployment Modes

HippoRAG supports two primary deployment modes for LLM inference.

graph TD
    A[HippoRAG Deployment] --> B[Cloud Mode]
    A --> C[Local Mode]
    
    B --> B1[OpenAI API]
    B --> B2[OpenAI Compatible API]
    
    C --> C1[vLLM Server]
    C --> C1b[Local Embedding Model]
    
    B1 --> D[Requires OPENAI_API_KEY]
    B2 --> E[Custom LLM Base URL]
    C1 --> F[Multi-GPU Support]

Cloud Mode (OpenAI)

Cloud mode uses OpenAI's API for both LLM and embedding inference.

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

#### OpenAI Compatible Embeddings

For OpenAI-compatible embedding endpoints:

hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='Your LLM Model name',
    llm_base_url='Your LLM Model url',
    embedding_model_name='Your Embedding model name',
    embedding_base_url='Your Embedding model url'
)

Sources: README.md

Local Mode (vLLM)

Local mode deploys LLM servers using vLLM for offline inference with GPU acceleration.

#### Step 1: Start vLLM Server

export CUDA_VISIBLE_DEVICES=0,1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

vllm serve meta-llama/Llama-3.3-70B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

#### Step 2: Run HippoRAG with Different GPUs

export CUDA_VISIBLE_DEVICES=2,3
export HF_HOME=<path to Huggingface home directory>
python main.py --dataset sample --llm_base_url http://localhost:6578/v1

Sources: README.md

Quick Start Workflow

graph LR
    A[Install HippoRAG] --> B[Configure Environment]
    B --> C[Set Environment Variables]
    C --> D[Initialize HippoRAG]
    D --> E[index Documents]
    E --> F[RAG QA Queries]

Complete Example

from hipporag import HippoRAG

# Define documents
docs = [
    "Oliver Badman is a politician.",
    "George Rankin is a politician.",
    "Cinderella attended the royal ball.",
    "The prince used the lost glass slipper to search the kingdom.",
    "Erik Hort's birthplace is Montebello.",
    "Montebello is a part of Rockland County."
]

# Initialize HippoRAG
hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# Index documents
hipporag.index(docs)

# Define queries and gold standard answers
queries = [
    "What is George Rankin's occupation?",
    "How did Cinderella reach her happy ending?",
    "What county is Erik Hort's birthplace a part of?"
]

gold_docs = [
    ["George Rankin is a politician."],
    ["Cinderella attended the royal ball.",
     "The prince used the lost glass slipper to search the kingdom."],
    ["Montebello is a part of Rockland County."]
]

answers = [
    ["Politician"],
    ["By going to the ball."],
    ["Rockland County"]
]

# Run RAG QA
results = hipporag.rag_qa(
    queries=queries,
    gold_docs=gold_docs,
    gold_answers=answers
)

Sources: README.md

Testing Your Installation

OpenAI Test

Run this test to verify cloud mode functionality:

export OPENAI_API_KEY=<your openai api key>
conda activate hipporag
python tests_openai.py

Local Test

Run this test to verify local vLLM mode:

export CUDA_VISIBLE_DEVICES=0
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

# Start vLLM server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

# Run local test
CUDA_VISIBLE=1 python tests_local.py

Sources: README.md

Troubleshooting

Out of Memory (OOM) Errors

If you encounter OOM errors during local deployment:

  1. Reduce gpu-memory-utilization parameter in vLLM
  2. Reduce max_model_len in vLLM server
  3. Adjust CUDA_VISIBLE_DEVICES to use more GPUs
  4. Reduce embedding_batch_size in configuration

Environment Variable Issues

Ensure all required environment variables are set before running HippoRAG:

# Verify environment variables are set
echo $CUDA_VISIBLE_DEVICES
echo $HF_HOME
echo $OPENAI_API_KEY

Conda Environment

Always activate the correct conda environment before running commands:

conda activate hipporag

Sources: README.md

Reproducing Experiments

To reproduce the paper's experiments:

  1. Clone the repository and install dependencies
  2. Download datasets from HuggingFace or use provided samples in reproduce/dataset
  3. Set required environment variables
  4. Run the main script with appropriate parameters:
# OpenAI model
python main.py \
    --dataset sample \
    --llm_base_url https://api.openai.com/v1 \
    --llm_name gpt-4o-mini \
    --embedding_name nvidia/NV-Embed-v2

# Local vLLM model
python main.py \
    --dataset sample \
    --llm_base_url http://localhost:6578/v1 \
    --llm_name meta-llama/Llama-3.3-70B-Instruct \
    --embedding_name nvidia/NV-Embed-v2

Sources: README.md, main.py

Sources: [README.md](https://github.com/OSU-NLP-Group/HippoRAG/blob/main/README.md)

Quick Start Guide

Related topics: Installation and Setup, HippoRAG Core Class

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Environment Setup

Continue reading this section for the full explanation and source context.

Section Pattern 1: OpenAI Models

Continue reading this section for the full explanation and source context.

Section Pattern 2: OpenAI Compatible Embeddings

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, HippoRAG Core Class

Quick Start Guide

This guide provides a comprehensive walkthrough for setting up and running HippoRAG, enabling you to quickly leverage neurobiologically inspired long-term memory capabilities for Large Language Models.

Prerequisites

Before beginning, ensure your environment meets the following requirements:

RequirementSpecification
Python>= 3.10
CUDA GPUsRequired for local embedding model inference
HuggingFace HomeConfigured via HF_HOME environment variable
API KeysOpenAI API key (if using OpenAI models)

Environment Setup

# Create conda environment
conda create -n hipporag python=3.10
conda activate hipporag

# Install HippoRAG
pip install hipporag

# Configure environment variables
export CUDA_VISIBLE_DEVICES=0,1,2,3
export HF_HOME=<path to Huggingface home directory>
export OPENAI_API_KEY=<your openai api key>

Sources: README.md:150-165

Core Usage Patterns

HippoRAG supports three primary deployment configurations. The initialization workflow follows this pattern:

graph TD
    A[Initialize HippoRAG] --> B{Select LLM Backend}
    B -->|OpenAI| C[Set llm_model_name + llm_base_url]
    B -->|vLLM| D[Set llm_model_name + llm_base_url]
    B -->|Azure| E[Set azure_endpoint]
    A --> F{Select Embedding Backend}
    F -->|HuggingFace| G[Set embedding_model_name]
    F -->|Custom| H[Set embedding_base_url]

Sources: demo_azure.py:1-30

Pattern 1: OpenAI Models

The simplest configuration uses OpenAI for both LLM inference and embeddings:

from hipporag import HippoRAG

# Configuration
save_dir = 'outputs'
llm_model_name = 'gpt-4o-mini'
embedding_model_name = 'nvidia/NV-Embed-v2'

# Initialize HippoRAG instance
hipporag = HippoRAG(
    save_dir=save_dir, 
    llm_model_name=llm_model_name,
    embedding_model_name=embedding_model_name
)

Sources: README.md:175-195

Pattern 2: OpenAI Compatible Embeddings

For custom LLM endpoints that follow OpenAI's API format:

hipporag = HippoRAG(
    save_dir=save_dir, 
    llm_model_name='Your LLM Model name',
    llm_base_url='Your LLM Model url',
    embedding_model_name='Your Embedding model name',  
    embedding_base_url='Your Embedding model url'
)

Sources: README.md:210-220

Pattern 3: Azure OpenAI Integration

For Azure-hosted models:

hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name=llm_model_name,
    embedding_model_name=embedding_model_name,
    azure_endpoint="https://[ENDPOINT NAME].openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview",
    azure_embedding_endpoint="https://[ENDPOINT NAME].openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
)

Sources: demo_azure.py:10-15

Indexing Documents

The indexing process converts raw documents into HippoRAG's knowledge graph structure:

graph LR
    A[Raw Documents] --> B[Chunking]
    B --> C[OpenIE Extraction]
    C --> D[Embedding Generation]
    D --> E[Graph Construction]
    E --> F[Knowledge Graph Index]

Input Data Format

Documents should be provided as a list of strings:

docs = [
    "Oliver Badman is a politician.",
    "George Rankin is a politician.",
    "Cinderella attended the royal ball.",
    "The prince used the lost glass slipper to search the kingdom.",
]

Execute Indexing

hipporag.index(docs=docs)

Sources: demo_azure.py:18-45

Retrieval and Question Answering

The rag_qa method performs retrieval-augmented question answering:

graph TD
    A[Query Input] --> B[Retrieval]
    B --> C[Personalized PageRank]
    C --> D[Document Selection]
    D --> E[QA Generation]
    E --> F[Final Answer]
    
    C -.->|links documents| G[Knowledge Graph]
    G -.->|context| D

Complete QA Example

# Prepare queries and evaluation data
queries = [
    "What is George Rankin's occupation?",
    "How did Cinderella reach her happy ending?"
]

answers = [
    ["Politician"],
    ["By going to the ball."]
]

gold_docs = [
    ["George Rankin is a politician."],
    ["Cinderella attended the royal ball.",
     "The prince used the lost glass slipper to search the kingdom.",
     "When the slipper fit perfectly, Cinderella was reunited with the prince."]
]

# Execute RAG QA
results = hipporag.rag_qa(
    queries=queries, 
    gold_docs=gold_docs,
    gold_answers=answers
)

print(results)

Sources: README.md:195-215

Local Deployment with vLLM

For running LLMs locally, HippoRAG supports vLLM server integration:

Step 1: Start vLLM Server

export CUDA_VISIBLE_DEVICES=0,1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

conda activate hipporag

# Adjust gpu-memory-utilization and max_model_len based on your GPU memory
vllm serve meta-llama/Llama-3.1-8B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

Sources: README.md:225-240

Step 2: Initialize HippoRAG with vLLM

hipporag = HippoRAG(
    save_dir=save_dir, 
    llm_model_name='meta-llama/Llama-3.1-8B-Instruct',
    llm_base_url='http://localhost:6578/v1',
    embedding_model_name='nvidia/NV-Embed-v2'
)

Reproducing Experiments

For reproducing published experiments, follow the structured workflow:

Dataset Structure

File TypeNaming ConventionPurpose
Corpus{dataset}_corpus.jsonDocument collection
Queries{dataset}.jsonQuestions with answers
Outputoutputs/{dataset}/Index and results

Corpus JSON Format

[
  {
    "title": "FIRST PASSAGE TITLE",
    "text": "FIRST PASSAGE TEXT",
    "idx": 0
  },
  {
    "title": "SECOND PASSAGE TITLE",
    "text": "SECOND PASSAGE TEXT",
    "idx": 1
  }
]

Sources: README.md:100-125

Running Experiments

# Set environment variables
export CUDA_VISIBLE_DEVICES=0,1,2,3
export HF_HOME=<path to Huggingface home directory>
export OPENAI_API_KEY=<your openai api key>
conda activate hipporag

# Run with OpenAI model
dataset=sample
python main.py --dataset $dataset \
    --llm_base_url https://api.openai.com/v1 \
    --llm_name gpt-4o-mini \
    --embedding_name nvidia/NV-Embed-v2

Sources: main.py:1-35

Testing Your Installation

OpenAI Test

Verify installation with minimal OpenAI API cost:

export OPENAI_API_KEY=<your openai api key> 
conda activate hipporag
python tests_openai.py

Local Test with vLLM

Test with a locally deployed model:

export CUDA_VISIBLE_DEVICES=0
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

conda activate hipporag

# Start vLLM server with smaller model
vllm serve meta-llama/Llama-3.1-8B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

# Run test
CUDA_VISIBLE=1 python tests_local.py

Sources: README.md:250-280

Configuration Parameters

Core Parameters

ParameterDefaultDescription
save_diroutputsDirectory for saving all related information
llm_model_name-LLM model identifier
llm_base_url-Base URL for LLM API endpoint
embedding_model_namenvidia/NV-Embed-v2Embedding model identifier
embedding_batch_size16Batch size for embedding model

Sources: src/hipporag/utils/config_utils.py:50-80

Retrieval Parameters

ParameterDefaultDescription
retrieval_top_k200Number of documents to retrieve initially
linking_top_k5Number of linked nodes at each retrieval step
qa_top_k5Number of documents fed to QA model
max_qa_steps1Maximum interleaved retrieval-reasoning steps
damping0.5Damping factor for Personalized PageRank

Sources: src/hipporag/utils/config_utils.py:30-50

Graph Construction Parameters

ParameterDefaultDescription
synonymy_edge_topk2047K for KNN retrieval in synonymy edge building
synonymy_edge_sim_threshold0.8Similarity threshold for synonymy nodes
graph_typefacts_and_sim_passage_node_unidirectionalType of graph structure to construct
is_directed_graphFalseWhether to build a directed graph

Sources: src/hipporag/utils/config_utils.py:80-110

Troubleshooting

Common Issues

IssueSolution
CUDA OOM errorsReduce gpu-memory-utilization or max_model_len in vLLM; reduce embedding_batch_size
Connection errorsVerify API endpoint URLs and network connectivity
Index loading failuresCheck that save_dir contains valid index files

Environment Validation

Always verify your setup before running experiments:

# Verify CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# Verify package installation
pip list | grep hipporag

Next Steps

Sources: [README.md:150-165]()

Configuration System

Related topics: Installation and Setup, HippoRAG Core Class

Section Related Pages

Continue reading this section for the full explanation and source context.

Section BaseConfig

Continue reading this section for the full explanation and source context.

Section OpenIE (Open Information Extraction) Configuration

Continue reading this section for the full explanation and source context.

Section Embedding Model Configuration

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, HippoRAG Core Class

Configuration System

HippoRAG provides a comprehensive configuration system built on Pydantic's data validation framework. The configuration system enables fine-grained control over all aspects of the indexing, retrieval, and QA pipeline while maintaining type safety and default values for common use cases.

Architecture Overview

The configuration system is centered around the BaseConfig class defined in config_utils.py. This class uses Pydantic's BaseModel with Field definitions to provide structured configuration with metadata and validation.

graph TD
    A[BaseConfig] --> B[OpenIE Configuration]
    A --> C[Embedding Configuration]
    A --> D[Graph Construction Configuration]
    A --> E[Retrieval Configuration]
    A --> F[QA Configuration]
    A --> G[Save/Directory Configuration]
    A --> H[Dataset Configuration]
    
    I[main.py] --> A
    J[HippoRAG class] --> A
    K[StandardRAG class] --> A

Source: src/hipporag/utils/config_utils.py:1-100

Core Configuration Class

BaseConfig

The BaseConfig class serves as the single source of truth for all pipeline parameters. It inherits from Pydantic's BaseModel and provides automatic validation, serialization, and documentation through field metadata.

from hipporag.utils.config_utils import BaseConfig

global_config = BaseConfig(
    openie_mode='openai_gpt',
    information_extraction_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
    retrieval_top_k=200,
    linking_top_k=5,
    max_qa_steps=3,
    qa_top_k=5,
    graph_type="facts_and_sim_passage_node_unidirectional",
    embedding_batch_size=8
)

Source: main.py:20-35

Configuration Categories

OpenIE (Open Information Extraction) Configuration

Controls the information extraction module that identifies facts and entities from passages.

ParameterTypeDefaultDescription
openie_modeLiteral["openai_gpt", "vllm_offline", "Transformers-offline"]"openai_gpt"The mode for information extraction model
information_extraction_model_namestr"gpt-4o-mini"Model name for information extraction

The openie_mode parameter supports three execution modes:

  • openai_gpt: Uses OpenAI's GPT models for extraction via API
  • vllm_offline: Uses locally deployed LLMs through vLLM server
  • Transformers-offline: Uses HuggingFace Transformers models directly

Source: src/hipporag/utils/config_utils.py:config_fields

Embedding Model Configuration

Manages embedding generation for passages and queries.

ParameterTypeDefaultDescription
embedding_model_namestr"nvidia/NV-Embed-v2"Name of the embedding model
embedding_batch_sizeint16Batch size for embedding generation
embedding_return_as_normalizedboolTrueWhether to normalize embeddings
embedding_max_seq_lenint2048Maximum sequence length for embedding model
embedding_model_dtypeLiteral["float16", "float32", "bfloat16", "auto"]"auto"Data type for local embedding model
embedding_base_urlOptional[str]NoneBase URL for OpenAI-compatible embedding endpoints

Source: src/hipporag/utils/config_utils.py:embedding_batch_size-def

Graph Construction Configuration

Controls the knowledge graph construction process that forms the backbone of HippoRAG's memory system.

ParameterTypeDefaultDescription
synonymy_edge_topkint2047K value for KNN retrieval in building synonymy edges
synonymy_edge_query_batch_sizeint1000Batch size for query embeddings during KNN retrieval
synonymy_edge_key_batch_sizeint10000Batch size for key embeddings during KNN retrieval
synonymy_edge_sim_thresholdfloat0.8Similarity threshold for including candidate synonymy nodes
is_directed_graphboolFalseWhether the constructed graph is directed or undirected
graph_typestr"facts_and_sim_passage_node_unidirectional"Type of graph structure to build

Supported graph_type values include:

  • facts_and_sim_passage_node_unidirectional - Passages connected via facts with similarity edges
  • facts_and_sim_passage_node_bidirectional - Bidirectional passage connections
  • facts_only - Only fact-based connections
  • sim_passage_node - Only passage similarity connections

Source: src/hipporag/utils/config_utils.py:synonymy_edge_topk-def

Retrieval Configuration

Parameters governing the retrieval and linking process using Personalized PageRank (PPR).

ParameterTypeDefaultDescription
linking_top_kint5Number of linked nodes at each retrieval step
retrieval_top_kint200Number of documents to retrieve at each step
dampingfloat0.5Damping factor for PPR algorithm

The damping parameter controls the probability of following graph edges during the random walk in PPR. A higher value (closer to 1.0) results in more exploration, while lower values favor exploitation of high-probability paths.

Source: src/hipporag/utils/config_utils.py:linking_top_k-def, main.py:28

QA (Question Answering) Configuration

Controls the iterative QA process that interleaves retrieval with reasoning.

ParameterTypeDefaultDescription
max_qa_stepsint1Maximum steps for interleaved retrieval and reasoning
qa_top_kint5Number of top documents fed to the QA model

The max_qa_steps parameter enables multi-step reasoning where the system can retrieve additional documents based on intermediate reasoning results before producing the final answer.

Source: src/hipporag/utils/config_utils.py:max_qa_steps-def, main.py:27

LLM Configuration

Manages the language model used for QA and information extraction.

ParameterTypeDefaultDescription
llm_model_namestr"gpt-4o-mini"Name of the LLM
llm_base_urlOptional[str]NoneBase URL for OpenAI-compatible LLM endpoints
max_new_tokensOptional[int]NoneMaximum new tokens for generation

Source: src/hipporag/utils/config_utils.py:llm_model_name-def

Save and Directory Configuration

Controls output persistence and directory structure.

ParameterTypeDefaultDescription
save_dirstr"outputs"Top-level directory for saving all related information
corpus_lenintRequiredLength of the corpus being processed

The save_dir parameter specifies where HippoRAG objects, intermediate results, and evaluation outputs are stored. When running with specific datasets, the default saves to a dataset-customized output directory under save_dir.

Source: src/hipporag/utils/config_utils.py:save_dir-def, main.py:32

Configuration Workflow

graph LR
    A[Define BaseConfig] --> B[Initialize HippoRAG]
    B --> C[Index Documents]
    C --> D[Run RAG QA]
    D --> E[Results Saved to save_dir]
    
    F[Modify Config] -->|Update| B
    G[New Documents] -->|Index| C

Initialization Example

from hipporag.utils.config_utils import BaseConfig
from hipporag import HippoRAG

config = BaseConfig(
    openie_mode='openai_gpt',
    information_extraction_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
    retrieval_top_k=200,
    linking_top_k=5,
    max_qa_steps=3,
    qa_top_k=5,
    graph_type="facts_and_sim_passage_node_unidirectional",
    embedding_batch_size=8,
    max_new_tokens=None,
    corpus_len=len(corpus),
)

hipporag = HippoRAG(global_config=config)

Source: main.py:19-38

Configuration for Different Execution Modes

OpenAI API Mode

config = BaseConfig(
    openie_mode='openai_gpt',
    information_extraction_model_name='gpt-4o-mini',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
)

Source: main.py:20-26

Local vLLM Deployment Mode

config = BaseConfig(
    openie_mode='vllm_offline',
    information_extraction_model_name='meta-llama/Llama-3.1-8B-Instruct',
    llm_model_name='meta-llama/Llama-3.3-70B-Instruct',
    llm_base_url='http://localhost:8000/v1',
    embedding_model_name='nvidia/NV-Embed-v2',
)

Source: README.md:vllm_example

Transformers Offline Mode

config = BaseConfig(
    openie_mode='Transformers-offline',
    information_extraction_model_name='Transformers/Qwen/Qwen2.5-7B-Instruct',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
)

Source: test_transformers.py:16-20

Testing with Configuration

The test suite demonstrates configuration usage across different scenarios:

# tests_openai.py - Basic indexing and QA
hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# tests_openai.py - Document deletion
hipporag.delete(docs_to_delete)

# test_transformers.py - Transformers offline mode
hipporag = HippoRAG(
    global_config=global_config,
    save_dir=save_dir,
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
)

Source: tests_openai.py:test_structure, test_transformers.py:16-25

Package Dependencies

The configuration system depends on the following packages specified in setup.py:

PackageVersionPurpose
torch2.5.1PyTorch backend for models
transformers4.45.2HuggingFace Transformers
pydantic2.10.4Data validation and settings
vllm0.6.6.post1LLM inference server
openai1.91.1OpenAI API client
litellm1.73.1Unified LLM interface
gritlm1.0.2GritLM embedding model
networkx3.4.2Graph operations
python_igraph0.11.8Graph algorithms
tiktoken0.7.0Tokenization
tenacity8.5.0Retry logic

Source: setup.py:14-27

Best Practices

``bash export OPENAI_API_KEY=<your_openai_api_key> export HF_HOME=<path_to_huggingface_home> ``

  1. Use environment variables for sensitive configuration like API keys:

``bash export CUDA_VISIBLE_DEVICES=0,1,2,3 ``

  1. Set GPU devices before initialization:
  1. Adjust batch sizes based on available GPU memory when using local models
  1. Configure damping factor carefully for retrieval - higher values (0.7-0.85) work better for complex multi-hop questions
  1. Set corpus_len correctly to enable proper progress tracking and memory management

Source: https://github.com/OSU-NLP-Group/HippoRAG / Human Manual

HippoRAG Core Class

Related topics: Knowledge Graph and Retrieval, Embedding Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Section HippoRAG Class

Continue reading this section for the full explanation and source context.

Section StandardRAG Class

Continue reading this section for the full explanation and source context.

Section BaseConfig Parameters

Continue reading this section for the full explanation and source context.

Related topics: Knowledge Graph and Retrieval, Embedding Models

HippoRAG Core Class

Overview

HippoRAG is a neurobiologically inspired graph-based Retrieval-Augmented Generation (RAG) framework designed to enable Large Language Models (LLMs) to identify and leverage connections within knowledge for improved retrieval and question answering. The project implements two primary RAG classes: HippoRAG (neurobiologically inspired with Personal Knowledge Graph) and StandardRAG (traditional DPR-based approach).

Sources: setup.py:8-9

Architecture Overview

graph TB
    subgraph "Input Layer"
        Docs[Documents/Passages]
        Queries[User Queries]
    end
    
    subgraph "HippoRAG Core"
        Index[Indexing Pipeline]
        Retrieve[Retrieval Pipeline]
        QA[Question Answering]
    end
    
    subgraph "Knowledge Graph Construction"
        OpenIE[OpenIE Information Extraction]
        Embed[Embedding Model]
        GraphBuild[Graph Building]
    end
    
    subgraph "Backend Services"
        LLM[LLM Inference]
        EmbedModel[Embedding Service]
    end
    
    Docs --> Index
    Index --> OpenIE
    Index --> Embed
    OpenIE --> GraphBuild
    Embed --> GraphBuild
    GraphBuild --> KG[Knowledge Graph]
    
    Queries --> Retrieve
    Retrieve --> KG
    KG --> QA
    QA --> LLM
    Retrieve --> EmbedModel

Core Classes

HippoRAG Class

The HippoRAG class is the main entry point for the neurobiologically inspired RAG system. It extends a base RAG implementation with Personal Knowledge Graph (PKG) capabilities.

Initialization Parameters

ParameterTypeDefaultDescription
save_dirstrRequiredDirectory to save all related information
llm_model_namestrRequiredLLM model identifier (e.g., gpt-4o-mini)
embedding_model_namestrRequiredEmbedding model name (e.g., nvidia/NV-Embed-v2)
global_configBaseConfigNoneFull configuration object
llm_base_urlstrNoneCustom LLM API endpoint for OpenAI-compatible models
embedding_base_urlstrNoneCustom embedding API endpoint
azure_endpointstrNoneAzure OpenAI endpoint for LLM
azure_embedding_endpointstrNoneAzure OpenAI endpoint for embeddings

Sources: main.py:19-28

Basic Usage Pattern

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# Index documents
hipporag.index(docs=documents_list)

# Retrieve and answer queries
results = hipporag.rag_qa(
    queries=query_list,
    gold_docs=expected_documents,
    gold_answers=expected_answers
)

StandardRAG Class

The StandardRAG class provides traditional Dense Passage Retrieval (DPR) based RAG without the Personal Knowledge Graph components. This is useful for baseline comparisons.

Sources: main_dpr.py:19

Configuration System

BaseConfig Parameters

The BaseConfig class (defined in src/hipporag/utils/config_utils.py) provides comprehensive configuration options:

OpenIE Configuration

ParameterTypeDefaultDescription
openie_modestrRequiredOpenIE mode: OpenAI, vllm-offline, or Transformers-offline
information_extraction_model_namestrNoneModel for offline OpenIE (e.g., Qwen/Qwen2.5-7B-Instruct)

Embedding Configuration

ParameterTypeDefaultDescription
embedding_batch_sizeint16Batch size for embedding model inference
embedding_return_as_normalizedboolTrueWhether to normalize embeddings
embedding_max_seq_lenint2048Maximum sequence length for embedding
embedding_model_dtypestr"auto"Data type: float16, float32, bfloat16, or auto

Graph Construction Configuration

ParameterTypeDefaultDescription
synonymy_edge_topkint2047K value for KNN retrieval in synonymy edge construction
synonymy_edge_query_batch_sizeint1000Batch size for query embeddings
synonymy_edge_key_batch_sizeint10000Batch size for key embeddings
synonymy_edge_sim_thresholdfloat0.8Similarity threshold for synonymy edges
is_directed_graphboolFalseWhether the graph is directed

Retrieval Configuration

ParameterTypeDefaultDescription
retrieval_top_kint200Number of documents to retrieve initially
linking_top_kint5Number of linked nodes at each retrieval step
dampingfloat0.5Damping factor for Personalized PageRank

QA Configuration

ParameterTypeDefaultDescription
max_qa_stepsint1Maximum interleaved retrieval and reasoning steps
qa_top_kint5Top k documents fed to QA model

Sources: src/hipporag/utils/config_utils.py:1-80

Core Methods

Indexing Pipeline

graph LR
    A[Documents] --> B[Passage Embedding]
    B --> C[OpenIE Extraction]
    C --> D[Fact Node Creation]
    D --> E[Similarity Edge Building]
    E --> F[Knowledge Graph]

Method Signature

def index(self, docs: List[str], **kwargs) -> None

The indexing process:

  1. Embeds passages using the configured embedding model
  2. Runs OpenIE to extract factual triples from each passage
  3. Constructs fact nodes and passage nodes in the knowledge graph
  4. Builds synonymy edges based on embedding similarity
  5. Persists the graph structure to save_dir

RAG QA Pipeline

graph TD
    Q[Query] --> EP[Embedding]
    EP --> PPR[Personalized PageRank]
    PPR --> LN[Linked Nodes]
    LN --> LLM[LLM Reasoning]
    LLM -->|Iteration| Check{More Steps?}
    Check -->|Yes| EP
    Check -->|No| Final[Final Answer]

Method Signature

def rag_qa(
    self,
    queries: List[str],
    gold_docs: Optional[List[List[str]]] = None,
    gold_answers: Optional[List[List[str]]] = None,
    **kwargs
) -> Dict

Parameters

ParameterTypeRequiredDescription
queriesList[str]YesList of questions to answer
gold_docsList[List[str]]NoGround truth documents for evaluation
gold_answersList[List[str]]NoGround truth answers for evaluation

Returns

A dictionary containing evaluation metrics and retrieved results.

Document Deletion

def delete(self, docs_to_delete: List[str]) -> None

Removes specified documents from the knowledge graph and updates persistence.

Supported Backend Models

LLM Backends

BackendConfigurationExample Model
OpenAIllm_model_namegpt-4o-mini
Azure OpenAIazure_endpointAzure deployment URL
vLLM (Local)llm_base_url + vLLM servermeta-llama/Llama-3.1-8B-Instruct
OpenAI-Compatiblellm_model_name + llm_base_urlCustom endpoint

Sources: README.md:80-95

Embedding Models

Model TypeConfigurationNotes
NV-Embed-v2embedding_model_name='nvidia/NV-Embed-v2'Recommended
GritLMembedding_model_name='GritLM'Supported
Contrieverembedding_model_name='Contriever'Supported
Azure Embeddingsazure_embedding_endpointVia Azure OpenAI
Custom OpenAI-Compatibleembedding_base_urlAny compatible endpoint

OpenIE Modes

HippoRAG supports three OpenIE (Open Information Extraction) modes:

ModeDescriptionUse Case
OpenAIUses OpenAI GPT models for extractionCloud-based, high quality
vllm-offlineUses locally deployed vLLM modelsGPU-equipped servers
Transformers-offlineUses HuggingFace TransformersCPU or limited GPU

Sources: test_transformers.py:20-22

Workflow Example

from hipporag import HippoRAG

# Initialize
hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# Prepare data
docs = [
    "Oliver Badman is a politician.",
    "George Rankin is a politician.",
    "Cinderella attended the royal ball."
]

# Index
hipporag.index(docs=docs)

# Query
queries = ["What is George Rankin's occupation?"]
answers = [["Politician"]]
gold_docs = [["George Rankin is a politician."]]

# Retrieve and evaluate
results = hipporag.rag_qa(
    queries=queries,
    gold_docs=gold_docs,
    gold_answers=answers
)

Graph Types

The framework supports configurable graph structures:

Graph TypeDescription
facts_and_sim_passage_node_unidirectionalFacts with similarity-based passage connections (default)

Graph edges include:

  • Fact-to-Fact edges: Created from OpenIE extractions
  • Synonymy edges: Based on embedding similarity above threshold
  • Passage edges: Connect passages to their extracted facts

Dependencies

Key package dependencies managed in setup.py:

PackageVersionPurpose
torch2.5.1Deep learning framework
transformers4.45.2Model architectures
vllm0.6.6.post1LLM inference
openai1.91.1OpenAI API client
gritlm1.0.2GritLM embedding model
networkx3.4.2Graph operations
python_igraph0.11.8Graph algorithms
pydantic2.10.4Configuration validation
tiktoken0.7.0Tokenization

Sources: setup.py:15-30

Error Handling

The framework uses tenacity for retry mechanisms with configurable backoff strategies when interacting with external APIs (OpenAI, Azure, vLLM).

Persistence

All indexed data is persisted to the save_dir directory with the following structure:

save_dir/
└── {llm_model_name}_{embedding_model_name}/
    ├── knowledge_graph.pkl       # Serialized graph
    ├── passages.pkl              # Passage embeddings
    ├── fact_nodes.pkl            # Extracted facts
    └── config.json                # Configuration snapshot

Sources: [setup.py:8-9]()

Knowledge Graph and Retrieval

Related topics: Embedding Store and Management, LLM Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level System Design

Continue reading this section for the full explanation and source context.

Section Graph Construction Pipeline

Continue reading this section for the full explanation and source context.

Section Node Types

Continue reading this section for the full explanation and source context.

Related topics: Embedding Store and Management, LLM Integrations

Knowledge Graph and Retrieval

Overview

HippoRAG implements a neurobiologically inspired retrieval system that combines knowledge graph construction with advanced retrieval algorithms. The system is designed to enable LLMs to identify and leverage connections within new knowledge for improved retrieval performance. Sources: setup.py:8

The Knowledge Graph and Retrieval module forms the core of HippoRAG's architecture, providing mechanisms to:

  • Extract factual knowledge from text passages using Open Information Extraction (OpenIE)
  • Construct heterogeneous graphs with multiple node and edge types
  • Perform personalized PageRank (PPR) based retrieval over the constructed graphs
  • Support incremental updates and document deletion operations

Sources: src/hipporag/utils/config_utils.py:48-72

Architecture

High-Level System Design

HippoRAG's retrieval system integrates several key components working in concert to provide accurate and efficient knowledge retrieval:

graph TD
    A[Input Documents] --> B[OpenIE Processing]
    B --> C[Knowledge Graph Construction]
    C --> D[Embedding Generation]
    D --> E[Synonymy Edge Building]
    C --> F[Hybrid Graph]
    
    G[Query Input] --> H[Query Embedding]
    H --> I[Personalized PageRank]
    I --> F
    F --> J[Retrieval Results]
    J --> K[Reranking]
    K --> L[Final QA Output]
    
    M[LLM Inference] --> L

Graph Construction Pipeline

The graph construction process transforms raw text into a structured knowledge representation:

graph LR
    A[Passages] --> B[OpenIE Extractor]
    B --> C[Triplets/Entities]
    C --> D[Fact Nodes]
    
    E[Passages] --> F[Embedding Model]
    F --> G[Passage Embeddings]
    G --> H[Passage Nodes]
    
    D --> I[Passage-Fact Edges]
    H --> I
    
    G --> J[Synonymy Edges]
    J --> K[knn Retrieval]
    K --> L[Similarity Threshold Filter]
    L --> M[Synonymy Edge Network]

Knowledge Graph Components

Node Types

Node TypeDescriptionAttributes
Passage NodesRepresent original text passagesidx, title, text, embedding
Fact NodesExtracted facts/triplets from OpenIEsubject, predicate, object, embedding

Edge Types

Edge TypeSourceTargetPurpose
Passage-to-FactPassage NodeFact NodeLinks passages to their extracted facts
Fact-to-FactFact NodeFact NodeConnects semantically related facts
SynonymyPassage NodePassage NodeLinks passages with high semantic similarity
BidirectionalBothBothFull edge in both directions

Sources: src/hipporag/utils/config_utils.py:70-85

Graph Types Configuration

The system supports multiple graph configurations via the graph_type parameter:

Graph TypeDescription
facts_and_sim_passage_node_unidirectionalFacts + similar passage nodes, unidirectional edges
facts_and_sim_passage_node_bidirectionalFacts + similar passage nodes, bidirectional edges
Custom typesExtensible graph construction patterns

Sources: main.py:18

Retrieval Process

Personalized PageRank (PPR) Algorithm

HippoRAG uses Personalized PageRank for graph-based retrieval, which allows queries to propagate through the knowledge graph to identify relevant nodes.

graph TD
    A[Query] --> B[Query Embedding]
    B --> C[Initial PPR Scores]
    C --> D[Graph Propagation]
    D --> E{Iteration}
    E -->|Continue| F[Score Aggregation]
    F --> D
    E -->|Converge| G[Top-K Selection]
    G --> H[Linked Nodes]
    
    I[damping factor: 0.5] --> D
    J[linking_top_k: 5] --> G

Retrieval Configuration Parameters

ParameterDefaultDescription
retrieval_top_k200Number of documents retrieved at each step
linking_top_k5Number of linked nodes at each retrieval step
damping0.5Damping factor for PPR algorithm
qa_top_k5Top-k documents fed to QA model

Sources: src/hipporag/utils/config_utils.py:60-72

Synonymy Edge Construction

Synonymy edges connect passages with high semantic similarity, enabling cross-document retrieval:

graph TD
    A[All Passage Embeddings] --> B[KNN Retrieval]
    B --> C[Top-K Candidates]
    C --> D{Similarity > Threshold?}
    D -->|Yes| E[Create Synonymy Edge]
    D -->|No| F[Discard]
    E --> G[Synonymy Edge Network]

#### Synonymy Edge Parameters

ParameterDefaultDescription
synonymy_edge_topk2047k for knn retrieval in building synonymy edges
synonymy_edge_query_batch_size1000Batch size for query embeddings
synonymy_edge_key_batch_size10000Batch size for key embeddings
synonymy_edge_sim_threshold0.8Similarity threshold for candidate synonymy nodes

Sources: src/hipporag/utils/config_utils.py:73-85

Embedding Integration

Embedding Model Configuration

ParameterDefaultDescription
embedding_model_name-Name of the embedding model
embedding_batch_size16Batch size for embedding calls
embedding_return_as_normalizedTrueWhether to normalize embeddings
embedding_max_seq_len2048Maximum sequence length
embedding_model_dtypeautoData type for local models (float16/float32/bfloat16/auto)

Sources: src/hipporag/utils/config_utils.py:40-54

Supported Embedding Models

The system integrates with multiple embedding model providers:

  • NV-Embed-v2: NVIDIA's embedding model
  • GritLM: GritLM embedding model
  • Contriever: Facebook's dense retriever
  • OpenAI Compatible: Any OpenAI-compatible embedding endpoint
  • Azure OpenAI: Azure-hosted embedding models

Reranking Module

After initial retrieval, HippoRAG applies reranking to improve result quality. The reranking module reorders retrieved candidates using additional scoring mechanisms.

graph LR
    A[Retrieved Candidates] --> B[Reranker Model]
    B --> C[Relevance Scores]
    C --> D[Ranked Results]
    D --> E[Top Results]

Sources: src/hipporag/rerank.py

QA Integration

Multi-Step Retrieval and Reasoning

HippoRAG supports interleaved retrieval and reasoning with configurable steps:

ParameterDefaultDescription
max_qa_steps1Maximum steps for interleaved retrieval and reasoning
qa_top_k5Number of documents for QA model to process

Sources: src/hipporag/utils/config_utils.py:68-72

QA Pipeline Flow

graph TD
    A[Query] --> B[QA Step 1]
    B --> C[Retrieval]
    C --> D[Read Documents]
    D --> E{More Steps Needed?}
    E -->|Yes| F[Update Context]
    F --> B
    E -->|No| G[Final Answer]
    
    H[gold_docs] --> I[Evaluation]
    I --> J[Metrics]
    J --> K[Recall, EM, F1]

Data Formats

Corpus JSON Structure

[
  {
    "title": "PASSAGE TITLE",
    "text": "PASSAGE TEXT",
    "idx": 0
  }
]

Query JSON Structure

[
  {
    "id": "question_id",
    "question": "QUESTION TEXT",
    "answer": ["ANSWER"],
    "answerable": true,
    "paragraphs": [
      {
        "title": "SUPPORTING TITLE",
        "text": "SUPPORTING TEXT",
        "is_supporting": true,
        "idx": 0
      }
    ]
  }
]

Usage Examples

Basic Retrieval with HippoRAG

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# Index documents
docs = [
    "Oliver Badman is a politician.",
    "George Rankin is a politician.",
    "Erik Hort's birthplace is Montebello.",
    "Montebello is a part of Rockland County."
]

hipporag.index(docs)

# Query with evaluation
queries = ["What is George Rankin's occupation?"]
gold_docs = [["George Rankin is a politician."]]
answers = [["Politician"]]

results = hipporag.rag_qa(
    queries=queries,
    gold_docs=gold_docs,
    gold_answers=answers
)

Sources: README.md:Quick_Start, tests_openai.py:22-60

Incremental Updates

# Add new documents
new_docs = [
    "Tom Hort's birthplace is Montebello.",
    "Sam Hort's birthplace is Montebello."
]
hipporag.index(docs=new_docs)

# Delete documents
docs_to_delete = [
    "Tom Hort's birthplace is Montebello.",
    "Sam Hort's birthplace is Montebello."
]
hipporag.delete(docs_to_delete)

Sources: tests_openai.py:61-82

Evaluation Metrics

The retrieval system is evaluated using standard information retrieval metrics:

MetricDescription
Recall@kFraction of relevant documents in top-k
EMExact Match accuracy
F1Harmonic mean of precision and recall

Summary

The Knowledge Graph and Retrieval module in HippoRAG provides a sophisticated pipeline for:

  1. Knowledge Extraction: Using OpenIE to extract factual triplets from text
  2. Graph Construction: Building heterogeneous graphs with passage nodes, fact nodes, and multiple edge types
  3. Synonymy Discovery: Creating semantic links between similar passages via embedding similarity
  4. PPR-based Retrieval: Performing personalized PageRank for graph-aware document retrieval
  5. Reranking: Refining retrieval results for improved accuracy
  6. Incremental Updates: Supporting document additions and deletions

This architecture enables HippoRAG to perform complex associativity and multi-hop reasoning tasks that traditional vector similarity retrieval cannot accomplish effectively.

Sources: [src/hipporag/utils/config_utils.py:48-72]()

Embedding Store and Management

Related topics: LLM Integrations, Embedding Models

Section Related Pages

Continue reading this section for the full explanation and source context.

Section High-Level Components

Continue reading this section for the full explanation and source context.

Section Data Flow

Continue reading this section for the full explanation and source context.

Section Base Class Contract

Continue reading this section for the full explanation and source context.

Related topics: LLM Integrations, Embedding Models

Embedding Store and Management

Overview

The Embedding Store and Management system in HippoRAG provides a unified interface for encoding text passages into vector embeddings, managing these embeddings throughout the indexing and retrieval lifecycle, and supporting multiple embedding model backends including NVIDIA NV-Embed-v2, GritLM, and Contriever. The system is designed to handle batch processing of documents with configurable parameters for sequence length, data type precision, and normalization behavior.

HippoRAG's embedding management is tightly integrated with the knowledge graph construction process, where embeddings serve dual purposes: enabling semantic similarity search for passage linking and powering the retrieval phase through Personalized PageRank (PPR) algorithms. The embedding store abstracts away the underlying model implementation details, allowing the framework to switch between different embedding providers without changing the core indexing and retrieval logic.

Sources: src/hipporag/utils/config_utils.py:1-50

Architecture

High-Level Components

The embedding system consists of three primary layers that work together to provide embedding services throughout the HippoRAG pipeline.

The Model Layer contains implementations for specific embedding models, each inheriting from a common base class that enforces a consistent interface. Currently supported models include NV-Embed-v2, GritLM, and Contriever, with the architecture supporting easy extension to additional models. Each model implementation handles the specific requirements of its underlying transformer architecture, including tokenizer configuration, padding strategies, and model-specific inference optimizations.

The Utility Layer provides helper functions for common embedding operations such as batch processing, embedding normalization, and similarity computation. These utilities ensure consistent handling of embeddings across different contexts and help optimize memory usage during large-scale indexing operations.

The Configuration Layer defines the parameters that control embedding behavior, including batch sizes, sequence length limits, and model-specific settings. This layer connects the embedding system to HippoRAG's global configuration management, allowing users to customize embedding behavior without modifying code.

graph TD
    A[Documents] --> B[Embedding Store]
    B --> C[Model Layer<br/>NV-Embed-v2<br/>GritLM<br/>Contriever]
    B --> D[Utility Layer<br/>Batch Processing<br/>Normalization]
    C --> E[Vector Storage]
    D --> E
    E --> F[Graph Construction]
    E --> G[Retrieval Phase]

Sources: src/hipporag/embedding_store.py:1-30

Data Flow

During the indexing phase, documents are first processed by the embedding store to generate passage vectors. These vectors are stored alongside the passage metadata and serve as the foundation for graph construction. The embedding store processes passages in configurable batch sizes to balance memory usage and throughput, with the default batch size set to 16 documents per batch.

During the retrieval phase, incoming queries are encoded using the same embedding model to produce a query vector. This query vector is then used for similarity computation against the indexed passage vectors, enabling semantic matching between the query intent and stored knowledge. The retrieval system can perform k-nearest neighbor (kNN) searches over the embedding space to identify candidate passages for further processing.

graph LR
    A[Indexing Flow] --> B[Input Documents]
    B --> C[Batch Processing<br/>batch_size=16]
    C --> D[Embedding Encoding]
    D --> E[Normalized Vectors]
    E --> F[Vector Storage]
    
    G[Retrieval Flow] --> H[Query Text]
    H --> I[Query Encoding]
    I --> J[Similarity Search]
    J --> K[kNN Retrieval<br/>top-k candidates]
    K --> L[Ranked Passages]

Sources: src/hipporag/utils/embed_utils.py:1-25

Configuration Parameters

The embedding system is controlled through several configuration parameters defined in the global configuration structure. These parameters allow fine-tuning of embedding behavior for different hardware configurations and use cases.

ParameterTypeDefaultDescription
embedding_batch_sizeint16Number of documents processed in each embedding batch
embedding_return_as_normalizedbooltrueWhether to L2-normalize output embeddings
embedding_max_seq_lenint2048Maximum sequence length in tokens for the embedding model
embedding_model_dtypeLiteral"auto"Data type for local embedding models: float16, float32, bfloat16, or auto
embedding_model_namestrvariesIdentifier for the embedding model (e.g., "nvidia/NV-Embed-v2")
embedding_base_urlstrNoneBase URL for OpenAI-compatible embedding endpoints
synonymy_edge_topkint2047k value for kNN retrieval when building synonymy edges
synonymy_edge_sim_thresholdfloat0.8Minimum similarity threshold for synonymy edge candidates

Sources: src/hipporag/utils/config_utils.py:15-40

Embedding Model Interface

Base Class Contract

All embedding models must inherit from BaseEmbeddingModel, which defines the core interface that HippoRAG expects. The base class enforces implementation of the __call__ method that accepts text inputs and returns embeddings, ensuring polymorphism across different model implementations.

The base class also defines the EmbeddingConfig dataclass that encapsulates model-specific settings. This configuration includes the model name, batch size, maximum sequence length, and data type settings. The configuration object is passed to the embedding model during initialization and can be modified to adjust model behavior without recreating the model instance.

Supported Models

NV-Embed-v2 is the primary embedding model recommended for production use, developed by NVIDIA. It provides high-quality sentence embeddings optimized for retrieval tasks. The model is accessed through HuggingFace and supports automatic device placement based on available GPU resources.

GritLM provides an alternative embedding approach that combines retrieval and generation capabilities. It can serve both as an embedding model and as a decoder for generation tasks, offering flexibility in deployment configurations.

Contriever is an open-source bi-encoder model for dense retrieval, useful for scenarios requiring a completely open-source embedding solution without proprietary dependencies.

Sources: src/hipporag/embedding_model/__init__.py:1-20

Embedding Store API

Initialization

The embedding store is typically instantiated through the main HippoRAG class rather than directly. When creating a HippoRAG instance, the embedding model name and optional endpoint configuration are passed as parameters:

hipporag = HippoRAG(
    save_dir="outputs",
    llm_model_name="gpt-4o-mini",
    embedding_model_name="nvidia/NV-Embed-v2"
)

For OpenAI-compatible embedding endpoints, the base URL can be specified:

hipporag = HippoRAG(
    save_dir="outputs",
    llm_model_name="gpt-4o-mini",
    embedding_model_name="text-embedding-3-small",
    embedding_base_url="https://api.openai.com/v1"
)

Sources: README.md:1-50

Encoding Operations

The embedding store provides batch encoding capabilities for processing multiple documents efficiently. The encoding operation returns normalized embeddings by default, which is required for proper similarity computation during retrieval. The normalization is L2 normalization, ensuring that all embedding vectors have unit length.

For Azure OpenAI deployments, specialized endpoint parameters are supported:

hipporag = HippoRAG(
    save_dir="save_dir",
    llm_model_name="gpt-4o-mini",
    embedding_model_name="text-embedding-3-small",
    azure_endpoint="https://[ENDPOINT].openai.azure.com/...",
    azure_embedding_endpoint="https://[ENDPOINT].openai.azure.com/..."
)

Sources: demo_azure.py:1-30

Integration with Knowledge Graph

The embedding system plays a critical role in HippoRAG's knowledge graph construction phase. After passages are indexed and encoded, the embeddings are used for two key graph-related operations.

Synonymy Edge Construction uses embeddings to identify semantically similar passage pairs that should be connected in the knowledge graph. The system performs k-nearest neighbor searches over the passage embedding space, where the synonymy_edge_topk parameter controls how many candidates are considered for each passage. The synonymy_edge_sim_threshold parameter filters these candidates, with only pairs exceeding the similarity threshold being connected as synonymy edges.

Retrieval-Graph Linking during the PPR retrieval process uses passage embeddings to establish the connection between the query and the knowledge graph. The query embedding enables the system to identify the most relevant starting nodes in the graph for the random walk algorithm.

Sources: src/hipporag/utils/config_utils.py:30-45

Memory Management and Optimization

Batch Processing Strategy

The embedding store implements batch processing to optimize GPU memory utilization and throughput. The batch size is configurable via embedding_batch_size with a default of 16, meaning 16 documents are processed simultaneously during encoding. For systems with larger GPU memory, increasing this value can significantly improve indexing performance.

The system also supports separate batch sizes for the synonymy edge construction phase. The synonymy_edge_query_batch_size (default 1000) controls how many passage embeddings are queried at once during kNN search, while synonymy_edge_key_batch_size (default 10000) controls the key batch size for the search index.

Data Type Selection

The embedding_model_dtype parameter allows selection of the precision for local embedding models. The "auto" setting allows the system to select an appropriate default based on the hardware and model. Available options include float16 for memory-constrained environments, float32 for maximum precision, and bfloat16 which offers a good balance of range and memory efficiency on newer GPUs.

Sources: src/hipporag/utils/config_utils.py:25-35

Error Handling and Resilience

The embedding system is designed with error handling patterns compatible with HippoRAG's overall resilience strategy. Batch processing allows partial failures to be identified and retried without losing all progress. The configuration system supports specifying fallback models or endpoints for production deployments requiring high availability.

Tenacity is used for retry logic in the embedding utilities, ensuring transient network failures or temporary service unavailability do not cause complete pipeline failures. This is particularly important when using remote embedding endpoints that may experience temporary connectivity issues.

Sources: setup.py:1-30

Performance Considerations

When optimizing HippoRAG for production deployment, the embedding configuration should be tuned based on the available hardware and expected workload characteristics. The primary tuning parameters include batch size for indexing throughput, sequence length limits for handling long documents, and data type selection for memory-constrained environments.

For maximum retrieval quality, the default normalization behavior should be maintained as it ensures consistent similarity computation across the retrieval pipeline. Disabling normalization may lead to suboptimal retrieval results as the similarity metrics assume unit-normalized vectors.

Sources: src/hipporag/utils/config_utils.py:18-22

The embedding system interacts closely with several other HippoRAG components. The Information Extraction module uses embeddings for processing extracted facts, the retrieval module depends on embeddings for kNN search and PPR initialization, and the evaluation module uses embeddings for computing retrieval metrics such as recall and MRR.

The embedding model implementations in src/hipporag/embedding_model/ follow a consistent interface defined in base.py, allowing the embedding store to work with any model that adheres to this contract.

Sources: [src/hipporag/utils/config_utils.py:1-50]()

LLM Integrations

Related topics: Embedding Models, Deployment Options

Section Related Pages

Continue reading this section for the full explanation and source context.

Section OpenAI Models

Continue reading this section for the full explanation and source context.

Section vLLM Local Deployment

Continue reading this section for the full explanation and source context.

Section AWS Bedrock

Continue reading this section for the full explanation and source context.

Related topics: Embedding Models, Deployment Options

LLM Integrations

HippoRAG provides a flexible, pluggable architecture for integrating various Large Language Model (LLM) providers. This modular design enables the framework to support multiple inference backends including OpenAI, vLLM for local deployment, and AWS Bedrock, allowing researchers and developers to choose the most appropriate LLM backend for their specific use case and infrastructure requirements.

Architecture Overview

The LLM integration system follows a strategy pattern where a base abstract class defines the interface contract, and concrete implementations handle provider-specific details. This design ensures that the core HippoRAG logic remains independent of any particular LLM vendor while maintaining the ability to leverage specialized features offered by different providers.

graph TD
    A[HippoRAG Core] --> B[LLM Base Class]
    B --> C[OpenAIGPT]
    B --> D[VLLMOffline]
    B --> E[BedrockLLM]
    B --> F[Custom LLM Adapter]
    
    C --> G[OpenAI API]
    D --> H[Local vLLM Server]
    E --> I[AWS Bedrock]

The BaseLLM abstract class in src/hipporag/llm/base.py defines the common interface that all LLM adapters must implement, ensuring consistent behavior across different providers.

Supported LLM Providers

OpenAI Models

HippoRAG supports all OpenAI chat completion models through the OpenAIGPT class. This integration allows users to leverage the GPT family of models for both information extraction and question answering tasks.

Configuration Parameters:

ParameterTypeDefaultDescription
model_namestringrequiredOpenAI model identifier (e.g., gpt-4o-mini, gpt-4o)
api_keystringenv OPENAI_API_KEYOpenAI API authentication key
base_urlstringhttps://api.openai.com/v1API endpoint base URL
max_tokensintNoneMaximum tokens in generated response
temperaturefloat0.0Sampling temperature for generation

Usage Example:

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

Sources: README.md:67-72

vLLM Local Deployment

For scenarios requiring local inference, HippoRAG supports vLLM-deployed models through the VLLMOffline class. This approach is particularly useful for privacy-sensitive applications, cost reduction at scale, or when working with custom fine-tuned models.

Server Setup:

export CUDA_VISIBLE_DEVICES=0,1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

vllm serve meta-llama/Llama-3.1-8B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

Sources: README.md:93-101

Configuration Parameters:

ParameterTypeDefaultDescription
model_namestringrequiredModel identifier for vLLM server
base_urlstringrequiredvLLM server endpoint URL
openie_modestring"online"Mode for OpenIE processing (online or offline)
max_tokensintNoneMaximum tokens in generated response
temperaturefloat0.0Sampling temperature for generation

Offline Mode for OpenIE:

The vLLM integration supports an offline mode where OpenIE extraction runs separately from the main pipeline. This is useful for debugging or when OpenIE results can be cached and reused.

python main.py \
    --dataset sample \
    --llm_name meta-llama/Llama-3.3-70B-Instruct \
    --openie_mode offline \
    --skip_graph

Sources: README.md:130-135

AWS Bedrock

HippoRAG integrates with AWS Bedrock through the BedrockLLM class, enabling access to various foundation models hosted on AWS infrastructure. This integration is designed for enterprise deployments requiring scalable, managed LLM services.

Configuration Parameters:

ParameterTypeDefaultDescription
model_namestringrequiredBedrock model identifier
aws_regionstring"us-east-1"AWS region for Bedrock endpoint
max_tokensintNoneMaximum tokens in generated response
temperaturefloat0.0Sampling temperature for generation

Azure OpenAI

For enterprise users with Azure OpenAI deployments, HippoRAG provides direct integration with Azure endpoints.

Configuration Example:

hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='gpt-4o-mini',
    embedding_model_name='embedding-model-name',
    azure_endpoint="https://[ENDPOINT NAME].openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview",
    azure_embedding_endpoint="https://[ENDPOINT NAME].openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
)

Sources: demo_azure.py:16-21

Base LLM Interface

All LLM adapters inherit from the BaseLLM abstract class, which defines the core contract for LLM interactions.

classDiagram
    class BaseLLM {
        <<abstract>>
        +generate(prompt: str) str
        +batch_generate(prompts: List[str]) List[str]
        +get_model_name() str
    }
    
    class OpenAIGPT {
        +generate(prompt: str) str
        +batch_generate(prompts: List[str]) List[str]
    }
    
    class VLLMOffline {
        +generate(prompt: str) str
        +batch_generate(prompts: List[str]) List[str]
    }
    
    class BedrockLLM {
        +generate(prompt: str) str
        +batch_generate(prompts: List[str]) List[str]
    }
    
    BaseLLM <|-- OpenAIGPT
    BaseLLM <|-- VLLMOffline
    BaseLLM <|-- BedrockLLM

Core Methods:

MethodParametersReturn TypeDescription
generateprompt: strstrGenerate a single response from a prompt
batch_generateprompts: List[str]List[str]Generate responses for multiple prompts in batch
get_model_nameNonestrReturn the configured model identifier

OpenIE Integration

Open Information Extraction (OpenIE) is a critical component of HippoRAG's knowledge graph construction pipeline. The LLM integration system supports multiple OpenIE modes to accommodate different deployment scenarios.

graph LR
    A[Documents] --> B{HippoRAG}
    B --> C{OpenIE Mode}
    
    C -->|online| D[Real-time OpenIE]
    C -->|offline| E[Cached OpenIE Results]
    
    D --> F[OpenIE with LLM]
    E --> G[Load from JSON]
    
    F --> H[Knowledge Graph]
    G --> H

OpenIE Implementation Classes:

ClassProviderUse Case
OpenAI_GPTOpenAI APICloud-based OpenIE extraction
VLLM_OfflineLocal vLLMPrivate/onsite OpenIE extraction

Sources: README.md:47-48

Configuration Schema

The LLM integration configuration is defined through the HippoRAGConfig class, which validates and manages all LLM-related settings.

Configuration Fields:

FieldTypeDefaultDescription
llm_namestringrequiredLLM model identifier
llm_base_urlstringNoneBase URL for LLM API endpoint
llm_max_tokensintNoneMaximum tokens per generation
llm_temperaturefloat0.0Sampling temperature
openie_modestring"online"OpenIE processing mode
skip_graphboolFalseSkip graph construction step

Sources: main.py:18-26

Workflow Integration

The following diagram illustrates how LLM integrations fit into the HippoRAG indexing and retrieval pipeline:

graph TD
    subgraph Indexing
        A1[Input Documents] --> A2[Chunking]
        A2 --> A3[Embedding Generation]
        A3 --> A4[OpenIE with LLM]
        A4 --> A5[Knowledge Graph Construction]
        A5 --> A6[Graph Indexing]
    end
    
    subgraph Retrieval & QA
        B1[User Query] --> B2[Query Embedding]
        B2 --> B3[Graph Traversal]
        B3 --> B4[LLM for Answer Synthesis]
        B4 --> B5[Final Answer]
    end
    
    A4 -.->|Uses| LLM1[LLM Adapter]
    B4 -.->|Uses| LLM1

Environment Variables

Proper configuration of environment variables is essential for LLM integrations to function correctly.

VariableRequiredDescription
OPENAI_API_KEYFor OpenAIOpenAI API authentication key
HF_HOMEFor vLLMHugging Face cache directory
CUDA_VISIBLE_DEVICESFor GPUComma-separated GPU device IDs
AWS_ACCESS_KEY_IDFor BedrockAWS access credentials
AWS_SECRET_ACCESS_KEYFor BedrockAWS secret credentials

Sources: README.md:58-66

Testing LLM Integrations

HippoRAG provides dedicated test scripts to verify LLM integration functionality.

OpenAI Test

export OPENAI_API_KEY=<your-api-key>
conda activate hipporag
python tests_openai.py

Local vLLM Test

# Terminal 1: Start vLLM server
export CUDA_VISIBLE_DEVICES=0
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 6578

# Terminal 2: Run test
CUDA_VISIBLE_DEVICES=1 python tests_local.py

Sources: README.md:137-148

Error Handling and Retries

The LLM integrations leverage the tenacity library for automatic retry behavior with exponential backoff. This ensures robust operation when dealing with network issues or rate limiting from LLM providers.

Configuration options for retry behavior:

ParameterDefaultDescription
max_attempts3Maximum number of retry attempts
wait_exponential_multiplier1000Initial wait time in milliseconds
wait_exponential_max10000Maximum wait time in milliseconds

Extending LLM Support

To add support for a new LLM provider, implement a new class that inherits from BaseLLM and implements the required abstract methods:

from hipporag.llm.base import BaseLLM

class CustomLLM(BaseLLM):
    def __init__(self, model_name: str, **kwargs):
        self.model_name = model_name
        # Initialize provider-specific client
        
    def generate(self, prompt: str) -> str:
        # Implement generation logic
        pass
        
    def batch_generate(self, prompts: List[str]) -> List[str]:
        # Implement batch generation
        pass
        
    def get_model_name(self) -> str:
        return self.model_name

Performance Considerations

When selecting and configuring LLM integrations, consider the following factors:

  1. Latency: OpenAI APIs typically offer lower latency for small workloads, while vLLM provides better performance for high-throughput scenarios
  2. Cost: Local vLLM deployment eliminates API costs but requires GPU infrastructure
  3. Privacy: For sensitive data, local deployment via vLLM or Bedrock private endpoints is recommended
  4. Model Size: Larger models (e.g., Llama-3.3-70B) require more GPU memory but often provide better extraction quality

Sources: [README.md:67-72](https://github.com/OSU-NLP-Group/HippoRAG/blob/main/README.md)

Embedding Models

Related topics: Embedding Store and Management, LLM Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section NV-Embed-v2

Continue reading this section for the full explanation and source context.

Section GritLM

Continue reading this section for the full explanation and source context.

Section Transformers (SentenceTransformers)

Continue reading this section for the full explanation and source context.

Related topics: Embedding Store and Management, LLM Integrations

Embedding Models

HippoRAG provides a flexible, modular embedding model system that supports multiple embedding backends including NVIDIA's NV-Embed-v2, GritLM, HuggingFace Transformers, and vLLM endpoints. This modular architecture enables the system to generate high-quality text embeddings for both passage encoding and query understanding in the retrieval pipeline.

Architecture Overview

The embedding model subsystem follows a base class pattern with specialized implementations. All embedding models inherit from BaseEmbeddingModel which defines the common interface and configuration schema.

graph TD
    A[HippoRAG Core] --> B[Embedding Model Factory]
    B --> C[BaseEmbeddingModel]
    C --> D[NVEmbedV2]
    C --> E[GritLM]
    C --> F[TransformersEmbeddingModel]
    C --> G[VLLMEmbeddingModel]

The factory pattern in __init__.py dynamically instantiates the appropriate embedding model based on the model name prefix:

PrefixModel ClassBackend
nvidia/NV-Embed-v2NVEmbedV2HuggingFace
GritLMGritLMGritLM library
Transformers/TransformersEmbeddingModelSentenceTransformers
VLLM/VLLMEmbeddingModelvLLM endpoints

Sources: src/hipporag/embedding_model/__init__.py

Base Configuration

The BaseEmbeddingModel and EmbeddingConfig classes define the configuration schema used across all embedding implementations. Configuration parameters include:

ParameterDefaultDescription
embedding_batch_size16Batch size for encoding operations
embedding_return_as_normalizedTrueWhether to normalize output embeddings
embedding_max_seq_len2048Maximum sequence length for tokenization
embedding_model_dtype"auto"Data type: float16, float32, bfloat16, or auto

Sources: src/hipporag/utils/config_utils.py:16-35

Available Embedding Models

NV-Embed-v2

The NVEmbedV2 class provides integration with NVIDIA's NV-Embed-v2 embedding model, a high-performance encoder optimized for retrieval tasks.

class NVEmbedV2(BaseEmbeddingModel):
    def __init__(self, global_config: BaseConfig, embedding_model_name: str) -> None:
        super().__init__(global_config=global_config)
        # Model initialization with HuggingFace transformers

Sources: src/hipporag/embedding_model/NVEmbedV2.py

GritLM

The GritLM class wraps the GritLM library for generating embeddings with built-in instruction-following capabilities.

class GritLM(BaseEmbeddingModel):
    def __init__(self, global_config: BaseConfig, embedding_model_name: str) -> None:
        super().__init__(global_config=global_config)
        # GritLM-specific initialization

Sources: src/hipporag/embedding_model/GritLM.py

Transformers (SentenceTransformers)

The TransformersEmbeddingModel class enables using any model from the HuggingFace ecosystem via the SentenceTransformers library. Select this implementation by using embedding_model_name that starts with "Transformers/".

class TransformersEmbeddingModel(BaseEmbeddingModel):
    """
    To select this implementation you can initialise HippoRAG with:
        embedding_model_name starts with "Transformers/"
    """
    def __init__(self, global_config: BaseConfig, embedding_model_name: str) -> None:
        super().__init__(global_config=global_config)
        self.model_id = embedding_model_name[len("Transformers/"):]
        self.batch_size = 64
        self.model = SentenceTransformer(
            self.model_id, 
            device="cuda" if torch.cuda.is_available() else "cpu"
        )

Key characteristics:

  • Automatically detects CUDA availability for GPU acceleration
  • Uses batch size of 64 for efficient processing
  • Extracts model ID by removing the "Transformers/" prefix

Sources: src/hipporag/embedding_model/Transformers.py:1-40

VLLM (Endpoint-based)

The VLLMEmbeddingModel class provides integration with OpenAI-compatible vLLM embedding endpoints. Select this implementation by using embedding_model_name that starts with "VLLM/".

class VLLMEmbeddingModel(BaseEmbeddingModel):
    """
    To select this implementation you can initialise HippoRAG with:
        embedding_model_name starts with "VLLM/"
    The embedding base url should contain the v1/embeddings.
    """
    def __init__(self, global_config: BaseConfig, embedding_model_name: str) -> None:
        super().__init__(global_config=global_config)
        self.model_id = embedding_model_name[len("VLLM/"):]
        self.batch_size = 32
        self.url = global_config.embedding_base_url

The model communicates with the endpoint using the OpenAI embeddings API format:

payload = {
    "model": self.model_id,
    "input": input_text,
}
response = requests.post(self.base_url, headers=headers, json=payload)

Sources: src/hipporag/embedding_model/VLLM.py:1-50

Query Instructions

Embedding models support query instruction templates for improving retrieval relevance. The system uses instructions for mapping queries to facts and passages:

self.search_query_instr = set([
    get_query_instruction('query_to_fact'),
    get_query_instruction('query_to_passage')
])

Sources: src/hipporag/embedding_model/Transformers.py:23-27

Usage Patterns

Quick Start with OpenAI-style Models

hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='gpt-4o-mini',
    llm_base_url='https://api.openai.com/v1',
    embedding_model_name='nvidia/NV-Embed-v2',
    embedding_base_url='https://api.openai.com/v1'
)

Using Custom Endpoints

hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='Your LLM Model name',
    llm_base_url='Your LLM Model url',
    embedding_model_name='Your Embedding model name',
    embedding_base_url='Your Embedding model url'
)

Using vLLM Local Deployment

# Start vLLM server
vllm serve meta-llama/Llama-3.1-8B-Instruct --tensor-parallel-size 2

# Configure with VLLM prefix
hipporag = HippoRAG(
    save_dir=save_dir,
    llm_model_name='...',
    embedding_model_name='VLLM/your-model-name',
    embedding_base_url='http://localhost:8000/v1/embeddings'
)

Dependencies

The embedding model system depends on the following packages:

PackageVersionPurpose
transformers4.45.2Core model loading
sentence-transformers(via Transformers)Sentence encoding
gritlm1.0.2GritLM embeddings
torch2.5.1GPU acceleration
einops(latest)Tensor operations

Sources: setup.py:19-32

Configuration Parameters Summary

ParameterTypeDefaultDescription
embedding_batch_sizeint16Batch size for embedding inference
embedding_return_as_normalizedboolTrueL2 normalize embeddings
embedding_max_seq_lenint2048Maximum token sequence length
embedding_model_dtypestr"auto"Model precision (float16/float32/bfloat16/auto)

Sources: src/hipporag/utils/config_utils.py:16-29

Sources: [src/hipporag/embedding_model/__init__.py](src/hipporag/embedding_model/__init__.py)

Open Information Extraction (OpenIE)

Related topics: Knowledge Graph and Retrieval, LLM Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Module Structure

Continue reading this section for the full explanation and source context.

Section ConfigUtils Class Parameters

Continue reading this section for the full explanation and source context.

Section Main Entry Point Configuration

Continue reading this section for the full explanation and source context.

Related topics: Knowledge Graph and Retrieval, LLM Integrations

Open Information Extraction (OpenIE)

Overview

Open Information Extraction (OpenIE) is a critical component in the HippoRAG pipeline that enables the extraction of structured knowledge triples from unstructured text. The system extracts entities, relations, and triples from passages to construct a knowledge graph that mimics hippocampal memory formation in biological systems.

In HippoRAG, OpenIE serves as the foundation for building the associative memory graph. Extracted triples form fact nodes in the knowledge graph, enabling parametric nearest neighbor (PPR) retrieval that connects related information across documents.

Sources: README.md

Architecture

The OpenIE system in HippoRAG supports multiple deployment modes and LLM backends:

graph TD
    A[Unstructured Text] --> B[Information Extraction Module]
    B --> C{openie_mode}
    C -->|online| D[OpenAI GPT]
    C -->|offline| E[vLLM Offline]
    D --> F[Triple Extraction]
    E --> F
    F --> G[NER Processing]
    G --> H[Knowledge Triples]
    H --> I[Knowledge Graph Construction]

Module Structure

ModuleFilePurpose
Base Interfaceinformation_extraction/__init__.pyExports model classes
OpenAI Integrationopenie_openai_gpt.pyOnline OpenIE via OpenAI API
vLLM Offlineopenie_vllm_offline.pyOffline batch processing with vLLM
Triple Extraction Promptprompts/templates/triple_extraction.pyLLM prompt for triple extraction
NER Promptprompts/templates/ner.pyLLM prompt for named entity recognition

Sources: README.md - Code Structure

Configuration

ConfigUtils Class Parameters

The InformationExtractionConfig dataclass provides the following configuration options:

ParameterTypeDefaultDescription
information_extraction_model_nameLiteral["openie_openai_gpt"]"openie_openai_gpt"Class name indicating which information extraction model to use
openie_modeLiteral["offline", "online"]"online"Mode of the OpenIE model: online uses OpenAI API, offline uses vLLM batch processing
skip_graphboolFalseWhether to skip graph construction. Set to True when running vLLM offline indexing for the first time

Sources: src/hipporag/utils/config_utils.py

Main Entry Point Configuration

In the main.py script, OpenIE parameters are passed via command-line arguments:

config = BaseConfig(
    retrieval_top_k=200,
    linking_top_k=5,
    max_qa_steps=3,
    qa_top_k=5,
    graph_type="facts_and_sim_passage_node_unidirectional",
    embedding_batch_size=8,
    max_new_tokens=None,
    corpus_len=len(corpus),
    openie_mode=args.openie_mode  # 'online' or 'offline'
)

Command-line arguments:

  • --openie_mode: Choose between online (OpenAI API) or offline (vLLM)
  • --force_openie_from_scratch: If False, reuse existing OpenIE results if available

Sources: main.py

Extraction Workflow

Triple Extraction Process

The triple extraction workflow follows these steps:

sequenceDiagram
    participant Text as Raw Text Input
    participant Triple as Triple Extraction Prompt
    participant LLM as Language Model
    participant NER as NER Prompt
    participant Output as Knowledge Triples
    
    Text->>Triple: Passage text
    Triple->>LLM: Structured prompt
    LLM->>Output: Subject-Predicate-Object triples
    Output->>NER: Named Entity Recognition
    NER->>LLM: Entity labels
    LLM->>Output: Typed entities

Supported Deployment Modes

ModeBackendUse CaseAPI Key Required
onlineOpenAI GPTQuick testing, small corporaYes (OPENAI_API_KEY)
offlinevLLMLarge-scale indexing, cost efficiencyNo (local deployment)

Knowledge Graph Integration

OpenIE extracted triples are converted into graph structures:

graph LR
    A[Passage Text] -->|OpenIE| B[Triple: Entity1 → Relation → Entity2]
    B --> C[Fact Node]
    C --> D[Knowledge Graph]
    D --> E[Personalized PageRank]
    E --> F[Associative Retrieval]

The extracted triples serve dual purposes:

  1. Fact Nodes: Create direct connections between related entities
  2. Association Links: Enable multi-hop reasoning through the graph

This design mirrors the dentate gyrus pattern separation mechanism in the hippocampus, where similar memories are differentiated to reduce interference.

Sources: README.md - Methodology

Usage Examples

Online Mode (OpenAI)

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

# OpenIE runs automatically during indexing
hipporag.index(docs=["Passage containing facts to extract."])

Offline Mode (vLLM)

# 1. Start vLLM server
vllm serve meta-llama/Llama-3.3-70B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95

# 2. Run indexing with offline OpenIE
python main.py --dataset sample --openie_mode offline

Sources: README.md - Quick Start

Dependencies

The OpenIE system requires the following core dependencies:

PackageVersionPurpose
torch2.5.1PyTorch backend
transformers4.45.2Model architecture
openai1.91.1Online OpenAI API
vllm0.6.6.post1Offline inference
litellm1.73.1Unified LLM interface
tqdm-Progress bars

Sources: setup.py

Extracted Data Format

OpenIE produces structured triples in the following format:

FieldTypeDescription
subjectstrFirst entity
predicatestrRelation verb/phrase
objectstrSecond entity
contextstrSource passage text

These triples are then processed into graph nodes and edges for the knowledge graph construction phase.

Sources: [README.md](https://github.com/OSU-NLP-Group/HippoRAG/blob/main/README.md)

Deployment Options

Related topics: Installation and Setup, LLM Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Configuration Parameters

Continue reading this section for the full explanation and source context.

Section Running with OpenAI Models

Continue reading this section for the full explanation and source context.

Section Programmatic Usage

Continue reading this section for the full explanation and source context.

Related topics: Installation and Setup, LLM Integrations

Deployment Options

HippoRAG supports multiple deployment configurations to accommodate different infrastructure requirements and use cases. This page documents the available deployment options, configuration parameters, and setup procedures for running HippoRAG in various environments.

Overview

HippoRAG provides three primary deployment models:

Deployment TypeLLM BackendEmbedding BackendTypical Use Case
OpenAI APIOpenAI hosted modelsOpenAI/NVIDIA hostedQuickstart, development
vLLM (Local)Self-hosted LLMs via vLLMLocal embedding modelsProduction, cost-sensitive
Azure OpenAIAzure-hosted modelsAzure-hosted embeddingsEnterprise compliance

Sources: README.md

Environment Setup

Regardless of deployment type, certain environment variables must be configured:

export CUDA_VISIBLE_DEVICES=0,1,2,3
export HF_HOME=<path to Huggingface home directory>

For OpenAI and Azure deployments, additional API credentials are required:

export OPENAI_API_KEY=<your openai api key>

Sources: README.md:1

OpenAI API Deployment

The simplest deployment option uses OpenAI's hosted API endpoints for both LLM inference and embeddings.

Configuration Parameters

ParameterDescriptionExample Value
--llm_base_urlOpenAI API endpointhttps://api.openai.com/v1
--llm_nameOpenAI model identifiergpt-4o-mini
--embedding_nameEmbedding model namenvidia/NV-Embed-v2

Running with OpenAI Models

dataset=sample

python main.py --dataset $dataset \
    --llm_base_url https://api.openai.com/v1 \
    --llm_name gpt-4o-mini \
    --embedding_name nvidia/NV-Embed-v2

Sources: README.md:1

Programmatic Usage

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2'
)

Sources: README.md:1

Local vLLM Deployment

For production environments or cost-sensitive deployments, HippoRAG supports self-hosted LLMs using vLLM.

Architecture

graph TD
    A[HippoRAG Main Process] --> B[vLLM Server]
    A --> C[Local Embedding Model]
    B --> D[GPU 0-1]
    C --> D
    E[Indexing Pipeline] --> A
    F[QA Pipeline] --> A

Starting vLLM Server

Launch the vLLM server with tensor parallelism for multi-GPU setups:

export CUDA_VISIBLE_DEVICES=0,1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

vllm serve meta-llama/Llama-3.3-70B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

Sources: README.md:1

Configuration Parameters

ParameterDescriptionDefault
--llm_base_urlvLLM server endpointhttp://localhost:6578/v1
--llm_nameModel name (must match deployed model)meta-llama/Llama-3.1-8B-Instruct
--embedding_nameLocal embedding model identifiernvidia/NV-Embed-v2

Running Main Process

With vLLM server running on GPUs 0-1, run the main process on separate GPUs:

export CUDA_VISIBLE_DEVICES=2,3
export HF_HOME=<path to Huggingface home directory>

python main.py --dataset $dataset \
    --llm_base_url http://localhost:6578/v1 \
    --llm_name meta-llama/Llama-3.3-70B-Instruct \
    --embedding_name nvidia/NV-Embed-v2

Sources: README.md:1

Azure OpenAI Deployment

Enterprise deployments requiring Azure infrastructure can use Azure OpenAI endpoints.

Configuration Parameters

ParameterCLI ArgumentDescription
azure_endpoint--azure_endpointAzure OpenAI chat completions endpoint
azure_embedding_endpoint--azure_embedding_endpointAzure OpenAI embeddings endpoint

Endpoint Format

azure_endpoint = (
    "https://[ENDPOINT_NAME].openai.azure.com/"
    "openai/deployments/gpt-4o-mini/chat/completions"
    "?api-version=2025-01-01-preview"
)

azure_embedding_endpoint = (
    "https://[ENDPOINT_NAME].openai.azure.com/"
    "openai/deployments/text-embedding-3-small/embeddings"
    "?api-version=2023-05-15"
)

Sources: demo_azure.py

Programmatic Usage

from hipporag import HippoRAG

hipporag = HippoRAG(
    save_dir='outputs',
    llm_model_name='gpt-4o-mini',
    embedding_model_name='nvidia/NV-Embed-v2',
    azure_endpoint="https://[ENDPOINT_NAME].openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview",
    azure_embedding_endpoint="https://[ENDPOINT_NAME].openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
)

hipporag.index(docs=docs)

Sources: demo_azure.py

CLI Usage

python main_azure.py \
    --dataset sample \
    --azure_endpoint "https://[ENDPOINT].openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2025-01-01-preview" \
    --azure_embedding_endpoint "https://[ENDPOINT].openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15" \
    --save_dir outputs

Sources: main_azure.py

Indexing Options

OpenIE Modes

HippoRAG supports two Open Information Extraction (OpenIE) modes:

ModeDescriptionResource Usage
onlineUses OpenAI GPT for real-time extractionAPI costs
offlineUses local vLLM batch processingGPU compute
python main.py --dataset $dataset --openie_mode offline

Sources: main.py:1

Force Rebuild Options

ParameterDescription
--force_index_from_scratchIgnores existing storage and rebuilds from scratch
--force_openie_from_scratchIgnores cached OpenIE results and recomputes
python main_azure.py \
    --force_index_from_scratch true \
    --force_openie_from_scratch true

Sources: main_azure.py

StandardRAG vs HippoRAG

The codebase provides two RAG implementations selectable via configuration:

# Standard HippoRAG (default)
hipporag = HippoRAG(global_config=config)

# Alternative DPR-style implementation
hipporag = StandardRAG(global_config=config)

Sources: main.py and main_dpr.py

Installation Requirements

All deployment options require the HippoRAG package and its dependencies:

conda create -n hipporag python=3.10
conda activate hipporag
pip install hipporag

Or install from source:

pip install -e .

Core dependencies include:

PackageVersionPurpose
torch2.5.1Deep learning framework
transformers4.45.2Model loading
vllm0.6.6.post1Local inference
openai1.91.1API client
litellm1.73.1Unified LLM interface
gritlm1.0.2Embedding models
networkx3.4.2Graph operations
pydantic2.10.4Configuration validation

Sources: setup.py

Testing Deployments

OpenAI Test

export OPENAI_API_KEY=<your openai api key>
conda activate hipporag
python tests_openai.py

Sources: README.md:1

Local vLLM Test

export CUDA_VISIBLE_DEVICES=0
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export HF_HOME=<path to Huggingface home directory>

# Start vLLM server
vllm serve meta-llama/Llama-3.1-8B-Instruct \
    --tensor-parallel-size 2 \
    --max_model_len 4096 \
    --gpu-memory-utilization 0.95 \
    --port 6578

# Run tests
CUDA_VISIBLE=1 python tests_local.py

Sources: README.md:1

Azure Test

python tests_azure.py

Sources: tests_azure.py

Deployment Decision Matrix

CriteriaOpenAI APIvLLM LocalAzure
Setup complexityLowHighMedium
CostPay-per-useGPU infrastructureAzure subscription
Data privacyData leaves your environmentAll data stays localConfigurable
LatencyNetwork dependentLocal, optimizedNetwork dependent
Model flexibilityLimited to API modelsAny HuggingFace modelLimited to deployed models
Recommended forDevelopment, prototypingProduction, researchEnterprise compliance

Sources: [README.md](https://github.com/OSU-NLP-Group/HippoRAG/blob/main/README.md)

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high add_fact_edges function adds the same edge twice?

First-time setup may fail or require extra isolation and rollback planning.

high pypi hipporag libraries

First-time setup may fail or require extra isolation and rollback planning.

high Take the "musique" dataset as an example. The process of constructing an index based on individual paragraphs takes an…

The project may affect permissions, credentials, data exposure, or host boundaries.

medium OpenAI version incompatibility in latest 2.0.0a4 version

First-time setup may fail or require extra isolation and rollback planning.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Installation risk: add_fact_edges function adds the same edge twice?

  • Severity: high
  • Finding: Installation risk is backed by a source signal: add_fact_edges function adds the same edge twice?. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/174

2. Installation risk: pypi hipporag libraries

  • Severity: high
  • Finding: Installation risk is backed by a source signal: pypi hipporag libraries. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/168

3. Security or permission risk: Take the "musique" dataset as an example. The process of constructing an index based on individual paragraphs takes an…

  • Severity: high
  • Finding: Security or permission risk is backed by a source signal: Take the "musique" dataset as an example. The process of constructing an index based on individual paragraphs takes an…. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/173

4. Installation risk: OpenAI version incompatibility in latest 2.0.0a4 version

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: OpenAI version incompatibility in latest 2.0.0a4 version. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/140

5. Installation risk: Windows Compatibility Issues with vLLM dependency

  • Severity: medium
  • Finding: Installation risk is backed by a source signal: Windows Compatibility Issues with vLLM dependency. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/117

6. Configuration risk: How to use local embedding_model_

  • Severity: medium
  • Finding: Configuration risk is backed by a source signal: How to use local embedding_model_. Treat it as a review item until the current version is checked.
  • User impact: Users may get misleading failures or incomplete behavior unless configuration is checked carefully.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/127

7. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:805115184 | https://github.com/OSU-NLP-Group/HippoRAG | README/documentation is current enough for a first validation pass.

8. Project risk: Inquiry Regarding OpenIE Extraction Results for HippoRAG 2

  • Severity: medium
  • Finding: Project risk is backed by a source signal: Inquiry Regarding OpenIE Extraction Results for HippoRAG 2. Treat it as a review item until the current version is checked.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/177

9. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:805115184 | https://github.com/OSU-NLP-Group/HippoRAG | last_activity_observed missing

10. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:805115184 | https://github.com/OSU-NLP-Group/HippoRAG | no_demo; severity=medium

11. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:805115184 | https://github.com/OSU-NLP-Group/HippoRAG | no_demo; severity=medium

12. Security or permission risk: How to distinguish Hipporag1 from Hipporag2

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: How to distinguish Hipporag1 from Hipporag2. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/OSU-NLP-Group/HippoRAG/issues/167

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using HippoRAG with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence