From Local to Global: A Graph RAG Approach to Query-Focused Summarization
https://arxiv.org/abs/2404.16130
GraphRAG is a "global question" that looks at the entire corpus (e.g., What are the main themes? What are the conflicting viewpoints?) ). Entity-relationship graphs are created in advance, and summaries are generated for each community (cluster). When a question is asked, these are combined in a Map→Reduce synthesis to produce the final answer. (arXiv)
What the study did.
- The conventional "vector RAG" is strong in extracting local facts, but weak in summarizing and synthesizing the entire context (sensemaking). Therefore, GraphRAG is proposed. (arXiv)
- Pipeline (Figure 1)
- document → chunking
- extract entities, relations, and (if necessary) claims from each chunk with LLM
- build a knowledge graph with them
- community partitioning of graphs using Leiden method etc.
- generate a summary for each community (hierarchical bottom-up)
- partial answers to questions from each community summary (Map) → combined into a whole answer (Reduce). (arXiv)
- Entity extraction is performed with multipart prompts. The self-reflection prompt compensates for the problem of increased misses in larger chunks. (arXiv)
How we evaluated
- Dataset (assumed to be approximately 1 million tokens in size)
- Podcast: public transcript of Behind the Tech with Kevin Scott (600 token chunks x 1669, overlap 100).
- News: news for 2013/9-2023/12 (600 token chunks x 3197, overlap 100). (arXiv)
- Condition comparison: 4 levels of GraphRAG (C0-C3), Text Direct Summary (TS), Vector RAG (SS). Generation prompts and context windows are unified. Community detection is Leiden with GRASPologic.(arXiv)
- The evaluation indicators are relative ratings in LLM-as-a-judge (inclusiveness, diversity, empowerment, and directness as a control). In addition, objective indicators are also used together with the number of factual claims and clusters extracted by Claimify. (arXiv)
Main Results
- Global system method (GraphRAG/TS) > Vector RAG
- Inclusivity win rates: 72-83% for Podcasts, 72-80% for News.
- Diversity win rates: 75-82% for Podcasts, 62-71% for News.
- On the other hand, the vector RAG has the highest directness (i.e., short and direct). (arXiv)
- Efficiency aspect: Root layer C0 consumes 9-43 times fewer tokens per question (advantageous for iterative search for global understanding). The deeper the hierarchy, the more information, but also the more tokens. (arXiv)
- Even in objective measures, the global system is superior to the vector RAG in number of claims (comprehensiveness) and number of claim clusters (diversity) (e.g. p<.05). (arXiv)
Implementation Tips (from the paper)
- Chunk size: If too large, extraction leakage will increase, so use self-reflection combination to balance. (arXiv)
- Use of hierarchy:
- C0 (top level community summary) = iterative search at very low cost.
- C2/C3 (lower) = When you want to extend inclusiveness and diversity a bit. (arXiv)
- Realistic cost of indexing: 600 token window for GPT-4-turbo as an example, about 281 minutes for podcast vocabulary (VM configuration and API conditions specified). (arXiv)
- OSS: Microsoft implementation available; also extensions for LangChain/LlamaIndex/NebulaGraph/Neo4j. (arXiv)
Limitations and Future
- Limited to validation with approximately 1 million tokens scale and 2 corpora. Cross-disciplinary generalization and comparison of hallucination rates (e.g. SelfCheckGPT) are future work. (arXiv)
When to use GraphRAG (personal opinion)
- Questions that require "overall picture, coverage of issues, and multiple perspectives" (organizing trends, overlooking conflicting views, and extracting major themes).
- Workloads that throw global questions at the same data over and over (low-cost iterations at C0 -> "dig" to lower layers as needed).
If the only purpose is to accurately identify individual facts, a conventional vector RAG is more likely to provide a short and direct answer. (arXiv)
This page is auto-translated from [/nishio/From Local to Global: A Graph RAG Approach to Query-Focused Summarization](https://scrapbox.io/nishio/From Local to Global: A Graph RAG Approach to Query-Focused Summarization) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.