ScatterAI
Issue #5 · March 13, 2026

Knowledge Graph RAG Breaks on Multi-Hop Questions — Entity Summaries Fix the Retrieval Phase

Research

01 [RAG] Knowledge Graph RAG Breaks on Multi-Hop Questions — Entity Summaries Fix the Retrieval Phase

Standard RAG (Retrieval-Augmented Generation) over KGs (Knowledge Graphs) converts text into triples — subject, predicate, object — to enable structured retrieval. That compression discards the contextual nuance that multi-hop questions depend on. Answering “Who founded the company that acquired DeepMind?” requires chaining three entities across three relations, and losing the surrounding context at indexing time means the retrieval phase never had a chance.

MDER-DR attacks this at two stages. The indexing pipeline, Map-Disambiguate-Enrich-Reduce, generates natural-language descriptions for each triple rather than storing bare structured facts, then fuses those descriptions into entity-level summaries. Retrieval no longer needs to traverse graph edges explicitly, because the contextual connections are already embedded in the index. The retrieval phase then uses query decomposition (breaking multi-hop questions into single-hop sub-questions) and re-ranking to assemble answers from the right entity summaries in sequence.

The framework is domain-agnostic, which matters for teams working across verticals. The limitation is real: this has been evaluated on KG-based QA benchmarks, and production KGs vary wildly in completeness and triple quality. Garbage triples still produce garbage summaries regardless of how well the pipeline wraps them.

Key takeaways:

Source: MDER-DR: Multi-Hop Question Answering with Entity-Centric Summaries


02 [RAG] VAE Posterior Collapse Is a Prior Selection Problem, Not an Architecture Problem

Posterior collapse in VAEs (Variational Autoencoders — generative models that compress data into a compact representation) has been treated as a training stability problem for years. The standard fixes are architectural constraints, KL (Kullback-Leibler divergence — a measure of how different two probability distributions are) annealing schedules, or careful hyperparameter tuning. This paper takes a different angle: collapse is inevitable when the prior is wrong, and the right prior can make collapse structurally impossible.

The mechanism is a GMM (Gaussian Mixture Model — a probabilistic model that represents data as a blend of several bell-curve distributions) prior refined through iterative alternating optimization. Rather than fitting one prior and hoping it aligns with the data, Historical Consensus Training maintains a set of candidate GMM clusterings and progressively selects among them. The key constraint: only clusterings that reach consensus across training history survive. This eliminates degenerate solutions where the approximate posterior collapses onto the prior, because the prior itself is iteratively forced to reflect actual data structure rather than a convenient mathematical default.

The limitation is real: this adds a selection loop to training, and the compute overhead of maintaining candidate clusterings at scale is not yet characterized for large datasets. For teams building retrieval or embedding pipelines on top of VAE-style architectures, the practical implication is direct: if your latent representations are collapsing and you’ve already tuned KL weight and learning rate, the prior is likely the culprit, and treating prior selection as an optimization target is a concrete path forward.

Key takeaways:

Source: Historical Consensus: Preventing Posterior Collapse via Iterative Selection of Gaussian Mixture Priors


03 [RAG] KV Cache Eviction Gets a Cheap Oracle — At a Fraction of the Lookahead Cost

KV (Key-Value) cache eviction methods that “glimpse into the future” — generating a draft response to estimate which cached tokens matter — produce better eviction decisions than static importance scoring. The problem: those draft generators are expensive, often requiring a full forward pass or a separate draft model. That cost undercuts the efficiency gains from eviction in the first place.

LookaheadKV keeps the future-glimpse insight but replaces the expensive draft generator with a cheap one. Rather than running a full speculative decode, it uses the model’s existing prefill computation to project a lightweight surrogate future response, with no extra generation step and no separate model. Importance scores are then computed against this surrogate, identifying which KV entries to evict. The surrogate is rough, but rough is enough: eviction quality depends on relative importance ranking, not on response fidelity.

The catch is scope. Results are on standard long-context benchmarks (LongBench-class tasks), and the method’s advantage shrinks on tasks where token importance is uniformly distributed, such as dense retrieval over structured documents. For teams running LLM serving infrastructure with long-context workloads, this is a practical lever: better eviction decisions at near-zero overhead compared to draft-based alternatives.

Key takeaways:

Source: LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation