ScatterAI
Brief · AI Research Papers Archive →

Brief

AI research papers, explained for builders.

Thursday 4 entries
Paper 1

Long video QA breaks when models ignore what the video is already telling them

Most video QA systems fail on long videos because they match query words to segments in isolation, ignoring how scenes connect visually and temporally. VideoDetective treats the video as a graph where segments influence each other's relevance scores, letting it find clues that only make sense in context—fixing a fundamental flaw in how we retrieve answers from hours of footage.

Paper 2

Deep research agents do not need the internet; they need the right offline corpus

Paper 3

DoRA's memory wall breaks at high rank: a systems fix, not a math fix

Also Worth Noting

Also Worth Noting — 2026-03-26

A new benchmark better evaluates AI video-generating world models by testing temporal dynamics and object interactions.

Monday 1 entries
Paper 1

OpenAI's Safety Stack for Sora 2 Reveals How Hard Real-Time Video Moderation Actually Is

Real-time video generation breaks old safety tools designed for images—watermarks degrade under compression, and new user behaviors outpace single-layer defenses. OpenAI's Sora now combines prompt filtering, output classification, and platform enforcement across multiple layers to catch harmful content at scale, but developers building on video APIs can't rely on upstream safety alone.

Sunday 2 entries
Paper 1

3D reasoning in VLMs stems from perception issues, not language processing.

Vision-language models struggle with 3D spatial reasoning because they lack training signal, not because they need richer input data. This work trains models to reconstruct scenes and understand their own position within them, enabling video-based AI systems and AR applications to reason about space without preprocessing geometric data at inference time.

Also Worth Noting

Also Worth Noting — 2026-03-22

New method trains satellite image AI using free OpenStreetMap data instead of expensive labeled datasets.

Thursday 4 entries
Paper 1

Real websites will get your agent banned — synthetic clones will get it trained

VeriEnv lets AI agents train on synthetic website clones instead of real sites, eliminating bot detection blocks and unreliable LLM judges. Agents now get deterministic feedback by reading internal site state, making web automation training 10x safer and faster—perfect for companies building search tools and automation pipelines before deploying to production.

Paper 2

The Search Agent Data Gap Has a Structural Fix — and the Numbers Behind It Are Now Public

Paper 3

Residual connections assume every layer matters equally — these results say they're wrong by design

Also Worth Noting

Also Worth Noting — 2026-03-19

Deep AI models now retain early insights better using a new attention mechanism that prevents information loss across layers.

Wednesday 3 entries
Paper 1

Most researchers are using AI wrong — here's the five-level map that shows why

For the first time, we have a clear map for where AI-assisted research actually sits—from asking ChatGPT questions to running fully autonomous agents overnight. The key insight: most teams lack guardrails to stop agents from reporting plausible-looking false results, turning verification itself into the critical failure point that needs explicit rules built into the agent's instructions.

Paper 2

Coding Agents Fail at Real-World Optimization—and Current Benchmarks Can't Even See It

Also Worth Noting

Also Worth Noting — 2026-03-18

New attention mechanism lets AI models access useful information from earlier processing layers, improving accuracy without larger models.

Tuesday 4 entries
Paper 1

Ensemble weighting that punishes disagreement outperforms static mixing in non-stationary sequential tasks

For ensemble models in shifting environments, a new weighting system tracks both individual performance and how much each model agrees with the others—penalizing those that drift from consensus. This catches failing specialists before their raw accuracy numbers do, and comes with formal guarantees that the approach won't fall too far behind an ideal fixed strategy, even as the optimal expert changes over time.

Paper 2

Industrial Crypto Benchmark Exposes the Gap Between Theorem Proving and Real Code Reasoning

Paper 3

Low-Resource Languages Expose a Structural Gap in Code LLMs

Also Worth Noting

Also Worth Noting — 2026-03-17

AI safety and ethics communities clash over governance, but understanding their conflict patterns could improve policy-making.

Sunday 4 entries
Paper 1

Static ensemble weights fail in non-stationary environments, and coherence between models carries the signal you're missing

When LLMs retrieve documents to answer questions, they excel at math puzzles but fail catastrophically on cryptographic proofs—even when the correct answer sits in their retrieved context. The problem: models trained on clean benchmarks don't learn to verify retrieved information against subtle real-world constraints, leaving production systems vulnerable to confident hallucinations on security-critical tasks.

Paper 2

LLMs That Ace Math Olympiads Collapse on Real Cryptographic Code Proofs

Paper 3

LLMs That Ace Python Collapse on a General-Purpose Language With Thin Training Data

Also Worth Noting

Also Worth Noting — 2026-03-15

An AI search agent that learns from past mistakes to improve its search strategy over time instead of starting fresh each session.

Saturday 4 entries
Paper 1

Text-to-image models fail at complex text because glyph templates were never in the loop

GlyphBanana lets AI image generators finally render complex text—formulas, CJK characters, mathematical symbols—by anchoring them with pre-made character templates instead of relying on training data that never existed. It works instantly on existing models without retraining, making it a direct solution for design tools and document generation systems that need reliable text in images.

Paper 2

DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning

For the first time, a single AI system can understand complex sports videos across multiple sports and tasks simultaneously—recognizing plays, interpreting rules, and analyzing tactics all at once. This works because the system learns through trial-and-error reasoning rather than memorization, enabling it to handle the fast motion and rule complexity that stump previous narrow models. Sports analytics teams and video AI researchers now have a unified blueprint replacing fragmented tool chains.

Paper 3

Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework

AI agents that remember conversations over time are becoming common, but no one has yet figured out how to stop those memories from getting corrupted, manipulated, or drifting into false beliefs. This paper introduces the first framework to actively protect evolving agent memory—catching contradictions before they're stored and flagging memories that slowly change meaning—making long-term AI agents actually trustworthy.

Also Worth Noting

Also Worth Noting — 2026-03-14

A curated roundup of additional AI research papers worth tracking this week.

Friday 4 entries
Paper 1

Knowledge Graph RAG Breaks on Multi-Hop Questions — Entity Summaries Fix the Retrieval Phase

Knowledge graphs struggle to answer complex questions because indexing strips away context needed to trace connections across multiple steps. Entity-level summaries that preserve this context—built during indexing rather than at query time—restore the ability to answer "who founded the company that acquired X?" without graph traversal. This breaks the indexing bottleneck that's been silently capping multi-hop reasoning in knowledge graph systems.

Paper 2

Using Code as Intermediate Representation Improves VLM Spatial Reasoning by 68.8%

AI image-understanding systems now accurately answer spatial questions like "where is the glass?" by first writing code to map object locations, boosting accuracy by 69%. This breakthrough helps developers build more reliable robots and automation tools that need to understand physical layouts.

Paper 3

Imitation Learning Can't Teach Judgment — Agents Trained on Perfect Demos Fail Out-of-Distribution

AI agents trained by copying human experts fail when conditions change slightly—they've never learned what *not* to do. New research shows agents need to experience and learn from failures in safe environments to develop real judgment, making them four times more resilient to unexpected situations.

Also Worth Noting

Also Worth Noting — 2026-03-13

A curated roundup of additional AI research papers worth tracking this week.

Thursday 4 entries
Paper 1

Diffusion Models Don't Fail at Text Because They Can't Reason — They Fail Because They've Never Seen the Input

Text-to-image AI models fail at rendering complex text and formulas not because they can't reason, but because they've never encountered these inputs during training. GlyphBanana solves this by injecting character templates directly into the model's processing, bypassing the gap entirely—a practical tool for teams automating documents, scientific figures, and multilingual designs without retraining.

Paper 2

Unsupervised RLVR Hits a Ceiling Set by the Initial Distribution, Not Compute

A new study reveals that training AI systems through self-improvement has a hard limit set by the initial training data, not raw computing power. Once models exhaust the knowledge embedded in their starting point, they begin collapsing into repetitive, useless outputs—meaning better pre-training data is more critical than throwing more compute at the problem.

Paper 3

Sparse Attention Degrades Long-Form Quality in Ways Standard Perplexity Benchmarks Don't Catch

Sparse attention speeds up AI models for massive documents but secretly breaks their ability to connect ideas across long distances—while appearing perfect on standard tests. This discovery exposes a critical blind spot: efficiency tricks that look safe actually cripple reasoning on real long-document tasks, affecting anyone building document search or analysis systems.

Also Worth Noting

Also Worth Noting — 2026-03-12

A curated roundup of additional AI research papers worth tracking this week.

Tuesday 4 entries
Paper 1

CBCT Tells You Where the Tissue Was. Ultrasound Tells You Where It Is Now.

Surgeons navigate using CT scans that become outdated the moment a patient breathes or tissue shifts. This framework pairs CT with a robotic ultrasound probe that continuously tracks tissue movement in real time, automatically updating the surgical map without needing new scans. It transforms static imaging into live, deformable guidance for abdominal surgery.

Paper 2

High-Noise Diffusion Steps Contain Low-Res Information — Processing at Full Resolution Is Wasted Compute

High-noise diffusion models waste compute by processing images at full resolution when early denoising steps only need low-resolution information. This research cuts computational costs by 40% by dynamically lowering resolution in early stages and gradually increasing it as details emerge, enabling faster image generation on phones and cheaper server inference without sacrificing quality.

Paper 3

Factual Associations in LLMs Are Stored as Low-Rank Subspaces in Mid-Layer MLP Weights

Scientists pinpointed exactly where language models store facts—in tiny, compressed sections of mid-layer weights—enabling surgical corrections to individual false beliefs without damaging related knowledge. This breakthrough lets AI developers fix errors and update outdated information without expensive retraining, moving toward safer, more maintainable AI systems.

Also Worth Noting

Also Worth Noting — 2026-03-10

A curated roundup of additional AI research papers worth tracking this week.

View all issues in the archive →