Exploring HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

6 min readJul 5, 2024

HippoRAG is a novel retrieval framework, inspired by hippocampal indexing of the Human Brain to retain Long term memory and integrate information from multiple parts of the document. It uses Knowledge graph with Personalized PageRank Algorithm allowing for more accurate retrieval results and ultimately mimicking the role of neocortex in storing human memory.

Overview of RAG

Retrieval augmented generation (RAG) is a widely used framework for integrating external data and giving LLM additional context to improve the quality of information retrieval in domain-specific areas. The documents are converted to chunks, and vector representation are created using embedding techniques. These vector representations are then stored in Vector Databases (Vector DBs).

When a query is made, the relevant vectors are retrieved from the Vector DB based on their similarity to the query. These retrieved chunks of information are then passed as context to the LLM. This allows the LLM to generate more accurate and contextually relevant responses by leveraging the specific information contained within the external documents.

Challenges in Current RAG Systems:

In terms of Multi-hop Question Answering (the answer for the user question consists of different fragments located across different documents), present RAG systems have a lack of ability to generate adequate responses. It uses Multiple Retrievals and LLM Generation iteratively to concatenate different pieces of information. However, even a perfectly implemented multi-step RAG can be insufficient to achieve numerous levels of knowledge integration. This is because the passages are encoded in isolation.

Many challenging domains, like scientific literature review, legal case briefing, and medical diagnosis, acquire context by integrating the content across passages or documents. Current RAG systems seem not to be a one-fit solution for these integration tasks; this is where the HippoRAG comes into action.

HippoRAG is inspired by the Hippocampal Indexing Theory, which states that the hippocampus (a C-shaped structure located in the medial temporal lobe and part of the limbic system) acts as an index or pointer to the locations of memory traces stored in the neocortex.

Mechanism:

Encoding: During the encoding of new information, the hippocampus forms a unique index that represents the new memory.
Storage: The actual memory traces (sensory, contextual, and conceptual information) are distributed across different regions of the neocortex.
Retrieval: When a memory needs to be retrieved, the hippocampus uses the index to reactivate the distributed neocortical patterns, bringing the memory back to consciousness.

Hippocampus creates index for the memories to be stored in different part of neocortex

The functions of the following part are taken as inspiration to develop the HippoRAG:

Neocortex — processing and storing memory
Hippocampus — indexing the memory stored in the neocortex
Para-Hippocampal Region (PHR) — forms a pipeline between neocortex and hippocampus

Applying the same methodology to the RAG part:

LLM acts as an artificial neo-cortex processing and extracting high level information.
Knowledge Graph with Personalized PageRank (PPR) as Hippocampus
PHR as Retrieval encoders, being middle in the pipeline for dense encoders fine-tuned for the retrieval. They determine the similarity and synonymy among nodes

This paper is centered around the Hippocampal Memory Indexing Theory, where Teyler and Discenna propose that human long-term memory is composed of three components (neocortex, hippocampus, and para-hippocampal regions) that work together to accomplish two main objectives: pattern separation, which ensures that the representations of various perceptual experiences are distinctive, and pattern completion, which allows the recall of full memories from partial stimuli.

This happens in two steps:

Memory Encoding: Memory encoding allows for pattern separation. High level manipulatable Perceptual stimuli are received and processed in the neocortex, which then passes through para-hippocampal regions to be indexed by the hippocampus.
Memory Retrieval: Every time partial perceptual stimuli related to previously recorded memory traces are delivered from the PHR pipeline, pattern completion drives memory retrieval subsequent to memory encoding.

The same two steps can be formulated as follows:

The corpora are stored as entities in a knowledge graph (Offline Indexing) and retrieved whenever user queries are passed. (Online Retrieval)
The indexes created by the hippocampus are prominent and high-level, having mutual connections with each other.

Let us analyze it in a detailed fashion:

Offline Indexing:

The instruction-tuned LLM in the initial step of the pipeline handles the input documents by running named entity recognition and extracting the entities, which are then segregated into triples. This process is known as Open Information Extraction (Open IE).
The entities generated are stored in the Schemaless Knowledge Graph, which arranges these entities based on their connections. The triples are discrete noun phrases rather than dense vector representation; this is the key factor in this approach as it allows more fine-grained pattern separation.

Extracting Triples from the passage using Open IE

Online Retrieval:

So, whenever a user queries the system,

We prompt LLM using a 1-shot prompt to extract a set of named entities from a query q. The named entities are extracted and selected from the previously defined query named entities (Cq) from the offline indexing.
The query nodes are chosen as the set of nodes in N with the highest cosine similarity to the query named entities Cq.
After the query nodes are found, we run Personalized PageRank (PPR) over the knowledge graph. The PPR makes sure that each query node has an equal probability and all other nodes have a probability of zero.

The Personalised PageRank (PPR) algorithm, a version of PageRank that distributes probability across a graph only through a set of user-defined source nodes. This constraint allows us to bias the PPR output only towards the set of query nodes

Consider an example.

Offline indexing:

Entities are segregated, and triples are formed eg:(Sarah, researches, Alzheimer’s), (Thomas, researches, Alzheimer’s), (Standford, employs, Mike), (Standford, employs, Thomas) etc.…
They are processed through retrieval encoders for representation to be stored in the knowledge graph.

Online Retrieval:

When the user queries, “Which Stanford professor works on the neuroscience of Alzheimer’s?”. The named entities are extracted here (Stanford, Alzheimer’s) using 1-shot prompting.
These named entities (Stanford, Alzheimer’s) are then linked to nodes in our KG based on the similarity determined by retrieval encoders. Once the query nodes are chosen, they become the partial cues from which our synthetic hippocampus performs pattern completion.
PPR is performed for the query nodes; aggregate the output PPR node probability over the previously indexed passages and use that to rank them for retrieval and returns (Thomas)

Additionally, to increase the retrieval relevancy, this paper makes use of a mechanism known as node specificity.

Global signals for word importance, such as inverse document frequency (IDF), are known to enhance information retrieval. IDF would be complicated as it is a global property and triggers all nodes in the hippocampal index every time retrieval occurs.

To overcome this issue, HippoRAG uses node specificity as an alternative to IDF, which focuses only on local signals. The node specificity of node i is calculated as follows : si = |Pi|^ -1 , where Pi is the set of passages from which node i was extracted.

Node specificity is used in retrieval by multiplying each query node probability with si before PPR; this allows us to modulate each of their neighborhood’s probabilities as well as their own.

Some of the prominent features of this approach are:

Using knowledge graphs and retrieval encoders to link named entities,
Leveraging PPR for ranking passages based on query nodes
Using salient noun phrases for the encoding
Node specificity

These are the intriguing features that streamline long-term memory handling and multi-hop question answering, making the retrieval process more efficient and accurate. This approach enhances the overall performance of the synthetic hippocampus in pattern completion and information retrieval tasks.

A major advantage of HippoRAG over conventional RAG methods in multi-hop QA is its ability to perform multi-hop retrieval in a single step.

Exploring HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

Overview of RAG

Mechanism:

Offline Indexing:

Online Retrieval:

Offline indexing:

Online Retrieval:

Written by SURUTHI S