GraphRAG & Knowledge Graphs

When Vector Search Fundamentally Fails

GraphRAG vs Naive RAG

Standard RAG is retrieval over isolated chunks. The question "What are the major themes across all of our quarterly earnings reports?" cannot be answered by Top-K retrieval. No single chunk contains a global answer. This is the problem Microsoft's GraphRAG was designed to solve.

Building a Knowledge Graph from Text

GraphRAG uses an LLM to parse documents and extract entities (Companies, People, Events) and their relationships (Acquired, Reported, Partnered) to construct a directed property Knowledge Graph. Nodes are entities. Edges are typed relationships with attributes (date, sentiment, etc.).

For global queries, GraphRAG doesn't retrieve chunks — it traverses graph communities (clusters of tightly connected nodes), generates summaries for each community, and synthesizes a final answer across summaries. This enables cross-document reasoning that defeats standard vector search.

Local vs. Global Query Routing

Smart GraphRAG systems route queries by type. Local queries ("What did Tim Cook say about AI in Q4 2024?") use standard vector retrieval since they target specific facts. Global queries ("What risks are mentioned across all board meetings?") use graph traversal and community summarization. Misrouting global queries to vector search is a common failure mode.

Neo4j as the Graph Backend

Production knowledge graphs live in dedicated graph databases like Neo4j. The Cypher query language allows direct relationship traversal that SQL and vector databases cannot express. LangChain has a native GraphCypherQAChain that converts natural language queries into Cypher queries via an LLM, executes them, and returns structured results.

Code Example

Full GraphRAG pipeline: LLM extracts structured graph data from text, ingests into Neo4j, then uses natural language → Cypher to query it. This enables cross-document reasoning impossible with vector search.

python

1from langchain_community.graphs import Neo4jGraph
2from langchain.chains import GraphCypherQAChain
3from langchain_openai import ChatOpenAI
4from anthropic import Anthropic
5
6client = Anthropic()
7
8# Step 1: Extract entities and relationships from text using Claude
9def extract_knowledge_graph(text: str) -> dict:
10    """Use LLM to extract a structured knowledge graph from raw text."""
11    response = client.messages.create(
12        model="claude-3-5-sonnet-20241022",
13        max_tokens=1000,
14        messages=[{
15            "role": "user",
16            "content": f"""Extract all entities and relationships from this text.
17Return ONLY valid JSON in this format:
18{{
19  "entities": [{{"id": "E1", "name": "Apple Inc", "type": "Company"}}],
20  "relationships": [{{"source": "E1", "target": "E2", "type": "ACQUIRED", "year": 2023}}]
21}}
22
23TEXT: {text}"""
24        }]
25    )
26    import json
27    return json.loads(response.content[0].text)
28
29# Step 2: Ingest into Neo4j
30def ingest_to_neo4j(graph_data: dict, neo4j_graph: Neo4jGraph):
31    for entity in graph_data["entities"]:
32        neo4j_graph.query(
33            "MERGE (e:Entity {id: $id}) SET e.name = $name, e.type = $type",
34            params=entity
35        )
36    for rel in graph_data["relationships"]:
37        neo4j_graph.query(
38            f"""MATCH (a:Entity {{id: $source}}), (b:Entity {{id: $target}})
39            MERGE (a)-[r:{rel['type']}]->(b) SET r.year = $year""",
40            params=rel
41        )
42
43# Step 3: Natural Language → Cypher → Answer
44def query_knowledge_graph(question: str, neo4j_graph: Neo4jGraph):
45    llm = ChatOpenAI(model="gpt-4o", temperature=0)
46    chain = GraphCypherQAChain.from_llm(
47        llm=llm,
48        graph=neo4j_graph,
49        verbose=True  # See the generated Cypher queries for debugging
50    )
51    return chain.invoke({"query": question})
52
53# Example: "What companies did Apple acquire after 2020?"
54# LLM generates: MATCH (a:Entity {name: "Apple Inc"})-[r:ACQUIRED]->(b)
55#                WHERE r.year > 2020 RETURN b.name

Use Cases

Legal firms querying relationships between contracts, parties, and clauses

Financial analysis requiring cross-document entity relationship queries

Drug discovery knowledge bases linking proteins, compounds, and clinical trials

Common Mistakes

Using GraphRAG for simple factual lookups — the overhead is massive and overkill for local queries

Not defining a strict entity extraction schema, causing inconsistent node types in the graph

Forgetting to deduplicate entities (Apple vs Apple Inc vs AAPL) before ingestion

Interview Insight

Relevance

High - Cutting edge architecture for complex enterprise knowledge systems.

LLM Foundations

Advanced Prompt Engineering

RAG & Vector Databases

Building AI Agents

AI Engineering Stack

Advanced RAG Engineering

LLM Inference Engineering

Fine-Tuning & Model Alignment

Context & Memory Management