GraphRAG & Knowledge Graphs
Microsoft GraphRAG, entity extraction, and global summarization queries.
When Vector Search Fundamentally Fails
GraphRAG vs Naive RAG
Standard RAG is retrieval over isolated chunks. The question "What are the major themes across all of our quarterly earnings reports?" cannot be answered by Top-K retrieval. No single chunk contains a global answer. This is the problem Microsoft's GraphRAG was designed to solve.
Building a Knowledge Graph from Text
GraphRAG uses an LLM to parse documents and extract entities (Companies, People, Events) and their relationships (Acquired, Reported, Partnered) to construct a directed property Knowledge Graph. Nodes are entities. Edges are typed relationships with attributes (date, sentiment, etc.).
For global queries, GraphRAG doesn't retrieve chunks — it traverses graph communities (clusters of tightly connected nodes), generates summaries for each community, and synthesizes a final answer across summaries. This enables cross-document reasoning that defeats standard vector search.
Local vs. Global Query Routing
Smart GraphRAG systems route queries by type. Local queries ("What did Tim Cook say about AI in Q4 2024?") use standard vector retrieval since they target specific facts. Global queries ("What risks are mentioned across all board meetings?") use graph traversal and community summarization. Misrouting global queries to vector search is a common failure mode.
Neo4j as the Graph Backend
Production knowledge graphs live in dedicated graph databases like Neo4j. The Cypher query language allows direct relationship traversal that SQL and vector databases cannot express. LangChain has a native GraphCypherQAChain that converts natural language queries into Cypher queries via an LLM, executes them, and returns structured results.
Code Example
Full GraphRAG pipeline: LLM extracts structured graph data from text, ingests into Neo4j, then uses natural language → Cypher to query it. This enables cross-document reasoning impossible with vector search.
1from langchain_community.graphs import Neo4jGraph
2from langchain.chains import GraphCypherQAChain
3from langchain_openai import ChatOpenAI
4from anthropic import Anthropic
5
6client = Anthropic()
7
8# Step 1: Extract entities and relationships from text using Claude
9def extract_knowledge_graph(text: str) -> dict:
10 """Use LLM to extract a structured knowledge graph from raw text."""
11 response = client.messages.create(
12 model="claude-3-5-sonnet-20241022",
13 max_tokens=1000,
14 messages=[{
15 "role": "user",
16 "content": f"""Extract all entities and relationships from this text.
17Return ONLY valid JSON in this format:
18{{
19 "entities": [{{"id": "E1", "name": "Apple Inc", "type": "Company"}}],
20 "relationships": [{{"source": "E1", "target": "E2", "type": "ACQUIRED", "year": 2023}}]
21}}
22
23TEXT: {text}"""
24 }]
25 )
26 import json
27 return json.loads(response.content[0].text)
28
29# Step 2: Ingest into Neo4j
30def ingest_to_neo4j(graph_data: dict, neo4j_graph: Neo4jGraph):
31 for entity in graph_data["entities"]:
32 neo4j_graph.query(
33 "MERGE (e:Entity {id: $id}) SET e.name = $name, e.type = $type",
34 params=entity
35 )
36 for rel in graph_data["relationships"]:
37 neo4j_graph.query(
38 f"""MATCH (a:Entity {{id: $source}}), (b:Entity {{id: $target}})
39 MERGE (a)-[r:{rel['type']}]->(b) SET r.year = $year""",
40 params=rel
41 )
42
43# Step 3: Natural Language → Cypher → Answer
44def query_knowledge_graph(question: str, neo4j_graph: Neo4jGraph):
45 llm = ChatOpenAI(model="gpt-4o", temperature=0)
46 chain = GraphCypherQAChain.from_llm(
47 llm=llm,
48 graph=neo4j_graph,
49 verbose=True # See the generated Cypher queries for debugging
50 )
51 return chain.invoke({"query": question})
52
53# Example: "What companies did Apple acquire after 2020?"
54# LLM generates: MATCH (a:Entity {name: "Apple Inc"})-[r:ACQUIRED]->(b)
55# WHERE r.year > 2020 RETURN b.nameUse Cases
Common Mistakes
Interview Insight
Relevance
High - Cutting edge architecture for complex enterprise knowledge systems.