GraphRAG in Practice: How to Build Cost-Efficient, High-Recall Retrieval Systems

article, Do You Really Need GraphRAG? A Practitioner’s Guide Beyond the Hype, I outlined the core principles of GraphRAG design and introduced an augmented retrieval-and-generation pipeline that combines graph search with vector search. I also discussed why building a perfectly complete graph—one that captures every entity and relation in the corpus—can be prohibitively complex, especially at scale.
In this article, I expand on those ideas with concrete examples and code, demonstrating the practical constraints encountered when building and querying real GraphRAG systems. I also illustrate that the retrieval pipeline helps balance cost and implementation complexity without sacrificing accuracy. Specifically, we will cover:
- Building the graph: Should entity extraction happen on chunks or full documents—and how much does this choice actually matter?
- Querying relations without a dense graph: Can we infer meaningful relations using iterative search-space optimisation instead of encoding every relationship in the graph explicitly?
- Handling weak embeddings: Why alphanumeric entities break vector search and how graph context fixes it.
GraphRAG pipeline
To recall from the previous article, the GraphRAG embedding pipeline used is as follows. The Graph node and relations and their embeddings are stored in a Graph database. Also, the document chunks and their embeddings are stored in the database.
The proposed retrieval and response generation pipeline is as follows:

As can be seen, the graph result is not directly used to respond to user query. Instead it is used in the following ways:
- Node metadata (particularly doc_id) acts as a strong classifier, helping identify the relevant documents before vector search. This is crucial for large corpora where naive vector similarity would be noisy.
- Context enrichment of the user query to retrieve the most relevant chunks. This is crucial for certain types of query with weak vector semantics such as IDs, vehicle numbers, dates, and numeric strings.
- Iterative search space optimisation, first by selecting the most relevant documents, and within those, the most relevant chunks (using context enrichment). This enables us to keep the graph simple, whereby all relations between the entities need not be necessarily extracted into the graph for queries about them to be answered accurately.
To demonstrate these ideas, we will use a dataset of 10 synthetically generated police reports, GPT-4o as the LLM, and Neo4j as the graph database.
Building the Graph
We will be building a simple star graph with the Report Id as the central node and entities connected to the central node. The prompt to build that would be as follows:
custom_prompt = ChatPromptTemplate.from_template("""
You are an information extraction assistant.
Read the text below and identify important entities.
**Extraction rules:**
- Always extract the **Report Id** (this is the central node).
- Extract **people**, **institutions**, **places**, **dates**, **monetary amounts**, and **vehicle registration numbers** (e.g., MH12AB1234, PK-02-4567, KA05MG2020).
- Do not ignore any people names; extract all mentioned in the document, even if they seem minor or role not clear.
Treat all of types of vehicles (eg; cars, bikes etc) as the same kind of entity called "Vehicle".
**Output format:**
1. List all nodes (unique entities).
2. Identify the central node (Report Id).
3. Create relationships of the form:
(Report Id)-[HAS_ENTITY]->(Entity),
4. Do not create any other types of relationships.
Text:
{input}
Return only structured data like:
Nodes:
- Report SYN-REP-2024
- Honda bike ABCD1234
- XYZ College, Chennai
- NNN College, Mumbai
- 1434800
- Mr. John
Relationships:
- (Report SYN-REP-2024)-[HAS_ENTITY]->(Honda bike ABCD1234)
- (Report SYN-REP-2024)-[HAS_ENTITY]->(XYZ college, Chennai)
- ...
""")
Note that in this prompt, we are not extracting any relations such as accused, witness etc. in the graph. All nodes will have a uniform “HAS_ENTITY” relation with the central node which is the Report Id. I have designed this as an extreme case, to illustrate the fact that we can answer queries about relations between entities even with this minimal graph, based on the retrieval pipeline depicted in the previous section. If you wish to include a few important relations, the prompt can be modified to include clauses such as the following:
3. For person entities, the relation should be based on their role in the Report (e.g., complainant, accused, witness, investigator etc).
eg: (Report Id) -[Accused]-> (Person Name)
4. For all others, create relationships of the form:
(Report Id)-[HAS_ENTITY]->(Entity),
llm_transformer = LLMGraphTransformer(
llm=llm,
# allowed_relationships=["HAS_ENTITY"],
prompt= custom_prompt,
)
Next we will create the graph for each document by creating a Langchain document from the full text and then providing to Neo4j.
# Read entire file (no chunking)
with open(file_path, "r", encoding="utf-8") as f:
text_content = f.read()
# Create LangChain Document
document = Document(
page_content=text_content,
metadata={
"doc_id": doc_id,
"source": filename,
"file_path": file_path
},
)
try:
# Convert to graph (entire document)
graph_docs = llm_transformer.convert_to_graph_documents([document])
print(f"✅ Extracted {len(graph_docs[0].nodes)} nodes and {len(graph_docs[0].relationships)} relationships.")
for gdoc in graph_docs:
for node in gdoc.nodes:
node.properties["doc_id"] = doc_id
original_id = node.properties.get("id") or getattr(node, "id", None)
if original_id:
node.properties["entity_id"] = original_id
# Add to Neo4j
graph.add_graph_documents(
graph_docs,
baseEntityLabel=True,
include_source=False
)
except:
...
This creates a graph comprising 10 clusters as follows:

Key Observations
- The number of nodes extracted varies with LLM used and even for different runs of the same LLM. With gpt-4o, each execution extracts between 15 to 30 nodes (depending upon the size of the document) for each of the documents for a total of 200 to 250 nodes. Since each is a star graph, the number of relations is one less than the number of nodes for each document.
- Lengthy documents result in attention dilution of the LLMs, whereby, they do not recall and extract all the specified entities (person, places etc) present in the document.
To see how severe this effect is, lets see the graph of one of the documents (SYN-REPORT-0008). The document has about 4000 words. And the resulting graph has 22 nodes and looks like the following:

Now, lets try generating the graph for this document by chunking it, then extracting entities from each chunk and merging them using the following logic:
- The entities extraction prompt remains same as before, except we ask to extract entities other than the Report Id.
- First extract the Report Id from the document using this prompt.
report_id_prompt = ChatPromptTemplate.from_template("""
Extract ONLY the Report Id from the text.
Report Ids typically look like:
- SYN-REP-2024
Return strictly one line:
Report:
Text:
{input}
""")
Then, extract entities from each chunk using the entities prompt.
def extract_entities_by_chunk(llm, text, chunk_size=2000, overlap=200):
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=overlap
)
chunks = splitter.split_text(text)
all_entities = []
for i, chunk in enumerate(chunks):
print(f"🔍 Processing chunk {i+1}/{len(chunks)}")
raw = run_prompt(llm, entities_prompt, chunk)
pairs = re.findall(r"- (.*?)s*|s*(w+)", raw)
all_entities.extend([(e.strip(), t.strip()) for e, t in pairs])
return all_entities
c. De-duplicate the entities
d. Build the graph by connecting all the entities to the central Report Id node.
The effect is quite remarkable. The graph of SYN-REPORT-0008 now looks like the following. It has 78 nodes, 3X times the count before. The trade-off in building this dense graph are the time and usage incurred for the iterations for chunk extraction.

What are the implications?
The impact of the variation in graph density is in the ability to answer questions related to the entities directly and accurately; i.e if an entity or relation is not present in the graph, a query related to it cannot be answered from the graph.
An approach to minimise this effect with our sparse star graph would be to create a query such that there is a reference to a prominent related entity likely to be present in the graph.
For instance, the investigating officer is mentioned relatively fewer times than the city in a police report, and there is a higher probability of the city to be present in the graph rather than the officer. Therefore, to find out the investigating officer, instead of saying “Which reports have investigating officer as Ravi Sharma?”, one can say “Among the Mumbai reports, which ones have investigating officer as Ravi Sharma?”, if it is known that this officer is from Mumbai office. Our retrieval pipeline will then extract the reports related to Mumbai from the graph, and within those documents, locate the chunks having the officer name accurately. This is demonstrated in the following sections.
Handling weak embeddings
Consider the following similar queries that are likely to be frequently asked of this data.
“Tell me about the incident involving Person_3”
“Tell me about the incident in report SYN-REPORT-0008”
The details about the incident in the report cannot be found in the graph as that holds the entities and relations only, and therefore, the response needs to be derived from the vector similarity search.
So, can the graph be ignored in this case?
If you run these, the first query is likely to return a correct reply for a relatively small corpus like our test dataset here, whereas the second one will not. And the reason is that the LLMs have an inherent understanding of person names and words due to their training, but find hard to attach any semantic meaning to alphanumeric strings such as report_id, vehicle numbers, amounts, dates etc. And therefore, the embedding of a person’s name is much stronger than that of alphanumeric strings. So the chunks retrieved in the case of alphanumeric strings using vector similarity have a weak correlation to the user query, resulting in an incorrect reply.
This is where the context enrichment using Graph helps. For a query like “Tell me about the incident in SYN-REPORT-0008”, we get all the details from the star graph of the central node SYN-REPORT-0008 using a generated cypher, then have the LLM use this to generate a context (interpret the JSON response in natural language). The context also contains the sources for the nodes, which in this case returns 2 documents, one of which is the correct document SYN-REPORT-0008. The other one SYN-REPORT-00010 is due to the fact that one of the attached nodes –city is common (Mumbai) for both the reports.
Now that the search space is refined to only 2 documents, chunks are extracted from both using this context along with the user query. And because the context from the graph mentions persons, places, amounts and other details present in the first report but not in the second, it enables the LLM to easily understand in the response synthesis step that the correct chunks are the ones extracted from SYN-REPORT-0008 and not from 0010. And the reply is formed accurately. Here is the log of the graph query, JSON response and the natural language context depicting this.
Processing log
Generated Cypher:
cypher
MATCH (r:`__Entity__`:Report)
WHERE toLower(r.id) CONTAINS toLower("SYN-REPORT-0008")
OPTIONAL MATCH (r)-[]-(e)
RETURN DISTINCT
r.id AS report_id,
r.doc_id AS report_doc_id,
labels(e) AS entity_labels,
e.id AS entity_id,
e.doc_id AS entity_doc_id
JSON Response:
[{'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Person'], 'entity_id': 'Mr. Person_12', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Place'], 'entity_id': 'New Delhi', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Place'], 'entity_id': 'Kottayam', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Person'], 'entity_id': 'Person_4', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels':… truncated
Natural language context:
The context describes an incident involving multiple entities, including individuals, places, monetary amounts, and dates. The following details are extracted:
1. **Persons Involved**: Several individuals are mentioned, including "Mr. Person_12," "Person_4," "Person_11," "Person_8," "Person_5," "Person_6," "Person_3," "Person_7," "Person_10," and "Person_9."
2. **Places Referenced**: The places mentioned include "New Delhi," "Kottayam," "Delhi," and "Mumbai."
3. **Monetary Amounts**: Two monetary amounts are noted: "0.5 Million" and "43 Thousands."
4. **Dates**: Two specific dates are mentioned: "07/11/2024" and "04/02/2025."
Sources: [SYN-REPORT-0008, SYN-REPORT-00010]
Can relations be successfully found?
What about finding relations between entities? We have ignored all specific relations in our graph and simplified it such that there is only one relation “HAS_ENTITY” between the central report_id node and rest of the entities. This would imply that querying for entities not present in the graph and relations between entities should not be possible. Let’s test our iterative search optimisation pipeline against a variety of such queries. We will consider two reports from Kolkata, and the following queries for this test.

- Where the referred relation is not present in the graph. Eg; “Who is the investigating officer in SYN-REPORT-0006?” Or “Who are the accused in SYN-REPORT-0006?”
- Relation between two entities present in the graph. Eg; “Is there a relation between Ravi Verma and Rakesh Prasad Verma?”
- Relation between any entities related to a third entity. Eg; “Are there brothers in reports from Kolkata?”
- Multi-hop relations: “Who is the investigating officer in the reports where brothers from Kolkata are accused?”
Using our pipeline, all the above queries yield accurate results. Lets look at the process for the last multi-hop query which is the most complex one. Here the cypher does not yield any result, so the flow falls back to semantic matching of nodes. The entities are extracted (Place: Kolkata) from the user query, then matched to get references to all the reports connected to Kolkata, which are SYN-REPORT-0005 and SYN-REPORT-0006 in this case. Based on the context that the user query is inquiring about brothers and investigating officers, the most relevant chunks are extracted from both the documents. The resultant reply successfully retrieves investigating officers for both reports.
Here is the response:
“The investigating officer in the reports where the brothers from Kolkata (Mr. Rakesh Prasad Verma, Mr. Ravi Prasad Verma, and Mr. Vijoy Kumar Varma) are accused is Ajay Kumar Tripathi, Inspector of Police, CBI, ACB, Kolkata, as mentioned in SYN-REPORT-0006. Additionally, Praveen Kumar, Deputy Superintendent of Police, EOB Kolkata, is noted as the investigating officer in SYN-REPORT-0005.
Sources: [SYN-REPORT-0005, SYN-REPORT-0006]”
You can view the processing log here
> Entering new GraphCypherQAChain chain...
2025-12-05 17:08:27 - HTTP Request: ... LLM called
Generated Cypher:
cypher
MATCH (p:`__Entity__`:Person)-[:HAS_ENTITY]-(r:`__Entity__`:Report)-[:HAS_ENTITY]-(pl:`__Entity__`:Place)
WHERE toLower(pl.id) CONTAINS toLower("kolkata") AND toLower(p.id) CONTAINS toLower("brother")
OPTIONAL MATCH (r)-[:HAS_ENTITY]-(officer:`__Entity__`:Person)
WHERE toLower(officer.id) CONTAINS toLower("investigating officer")
RETURN DISTINCT
r.id AS report_id,
r.doc_id AS report_doc_id,
officer.id AS officer_id,
officer.doc_id AS officer_doc_id
Cypher Response:
[]
2025-12-05 17:08:27 - HTTP Request: ...LLM called
> Finished chain.
is_empty: True
❌ Cypher did not produce a confident result.
🔎 Running semantic node search...
📋 Detected labels: ['Place', 'Person', 'Institution', 'Date', 'Vehicle', 'Monetary amount', 'Chunk', 'GraphNode', 'Report']
User query for node search: investigating officer in the reports where brothers from Kolkata are accused
2025-12-05 17:08:29 - HTTP Request: ...LLM called
🔍 Extracted entities: ['Kolkata']
2025-12-05 17:08:30 - HTTP Request: ...LLM called
📌 Hits for entity 'Kolkata': [Document(metadata={'labels': ['Place'], 'node_id': '4:5b11b2a8-045c-4499-9df0-7834359d3713:41'}, page_content='TYPE: PlacenCONTENT: KolkatanDOC: SYN-REPORT-0006')]
📚 Retrieved node hits: [Document(metadata={'labels': ['Place'], 'node_id': '4:5b11b2a8-045c-4499-9df0-7834359d3713:41'}, page_content='TYPE: PlacenCONTENT: KolkatanDOC: SYN-REPORT-0006')]
Expanded node context:
[Node] This is a __Place__ node. It represents 'TYPE: Place
CONTENT: Kolkata
DOC: SYN-REPORT-0006' (doc_id=N/A).
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Institution: Mrs.Sri Balaji Forest Product Private Limited (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Date: 2014 (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Person: Mr. Pallab Biswas (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Date: 2005 (doc_id=SYN-REPORT-0005).. truncated
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Institution: M/S Jkjs & Co. (doc_id=SYN-REPORT-0006)
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Person: B Mishra (doc_id=SYN-REPORT-0006)
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Institution: Vishal Engineering Pvt. Ltd. (doc_id=SYN-REPORT-0006).. truncated
Key Takeaways
- You don’t need a perfect graph. A minimally structured graph—even a star graph—can still support complex queries when combined with iterative search-space refinement.
- Chunking boosts recall, but increases cost. Chunk-level extraction captures far more entities than whole-document extraction, but requires more LLM calls. Use it selectively based on document length and importance.
- Graph context fixes weak embeddings. Entity types like IDs, dates, and numbers have poor semantic embeddings; enriching the vector search with graph-derived context is essential for accurate retrieval.
- Semantic node search is a powerful fallback, to be exercised with caution. Even when Cypher queries fail (due to missing relations), semantic matching can identify relevant nodes and shrink the search space reliably.
- Hybrid retrieval delivers accurate response on relations, without a dense graph. Combining graph-based document filtering with vector chunk retrieval allows accurate answers even when the graph lacks explicit relations.
Conclusion
Building a GraphRAG system that is both accurate and cost-efficient requires acknowledging the practical limitations of LLM-based graph construction. Large documents dilute attention, entity extraction is never perfect, and encoding every relationship quickly becomes expensive and brittle.
However, as shown throughout this article, we can achieve highly accurate retrieval without a fully detailed knowledge graph. A simple graph structure—paired with iterative search-space optimization, semantic node search, and context-enriched vector retrieval—can outperform more complex and expensive designs.
This approach shifts the focus from extracting everything upfront in a Graph to extracting what’s cost-effective, quick to extract and essential, and let the retrieval pipeline fill the gaps. The pipeline balances functionality, scalability and cost, while still enabling sophisticated multi-hop queries across messy, real-world data.
You can read more about the GraphRAG design principles underpinning the concepts demonstrated here at Do You Really Need GraphRAG? A Practitioner’s Guide Beyond the Hype
Connect with me and share your comments at www.linkedin.com/in/partha-sarkar-lets-talk-AI
All images and data used in this article are synthetically generated. Figures and code created by me



