Also Worth Noting

04 [RAG] AI Search Agent That Learns From Its Own Past Mistakes MR-Search is a search agent that remembers how previous attempts went and adjusts its strategy accordingly, rather than starting from scratch each time. Most AI agents only learn within a single session with weak feedback signals, so building one that genuinely improves across separate attempts by reflecting on past failures is a meaningful step up in sophistication. This could make AI research assistants and information-retrieval tools dramatically more effective over time, getting smarter the more you use them. link

05 [Evaluation] Cloning Real Websites So AI Agents Can Practice Safely A new framework called VeriEnv automatically recreates real websites as safe, resettable practice environments where AI agents can learn to browse and complete tasks without touching live systems. The hard part is that real websites break when you poke them, can’t easily be rewound, and rarely tell you whether you did something right — VeriEnv solves all three by using language models to clone sites into fully checkable simulations. This means web-browsing AI agents could finally be trained and tested at scale without the risk of accidentally placing orders, deleting accounts, or triggering other irreversible real-world actions. link

06 [Evaluation] AI System That Judges Whether Research Ideas Are Truly New A new benchmark tests whether AI can automatically decide if a research idea is genuinely novel or just a minor rehash of existing work. This is surprisingly hard because it requires the system to understand not just what’s been done before, but how meaningfully different a new idea really is — a judgment that even human experts struggle to make consistently. As scientific publishing accelerates beyond anyone’s ability to manually track, tools like this could help researchers and reviewers quickly spot what’s actually worth pursuing. link

07 [Evaluation] Graph Transformers Catch Malicious Domains Without Labeled Data A new system learns to recognize suspicious internet domains by studying the patterns in how DNS queries connect to each other, rather than relying on pre-labeled examples of known attacks. This is technically tricky because most security datasets are heavily imbalanced — malicious domains are rare — and the system must generalize to threats it has never seen before. Security teams could use this to catch cyberattacks earlier and with less manual effort, even when dealing with novel or previously unknown threats. link

08 [RAG] Tiny Model Beats Giants by Training on Point Cloud Data Only A lightweight AI model learned to understand 3D point clouds — the dot-map data used by lidar sensors and 3D scanners — without borrowing knowledge from images or language. Most top-performing models cheat by training on millions of images or text first, so beating them using only 39,000 pure 3D examples is a meaningful technical achievement. This could make high-quality 3D perception cheaper and more accessible for robotics, self-driving cars, and 3D scanning tools that can’t rely on massive cross-modal datasets. link

09 [Evaluation] New Tool Ranks AI Reasoning Models More Rigorously A new library called Scorio gives researchers a rigorous way to compare AI reasoning models when each model is allowed to make multiple attempts at a problem before giving a final answer. Simply counting correct answers falls apart in this setting because models that “try harder” get an unfair advantage, so Scorio brings in statistical techniques borrowed from voting theory, psychometrics, and graph analysis to level the playing field. Anyone building or buying AI systems that use extended reasoning — like OpenAI’s o-series or DeepSeek-R1 — now has a more honest way to know which one is actually better. link

10 [Evaluation] Measuring How Easily Synthetic Data Leaks Real People Synthetic data — fake-but-realistic data meant to protect privacy — can still reveal whether a real person’s information was used to create it. Detecting this “membership inference” risk is tricky because it requires estimating statistical patterns across complex datasets, which the team tackled using kernel density estimators to build a precise, quantifiable risk score. Anyone using synthetic healthcare or financial data to claim privacy compliance now has a concrete tool to check whether that claim actually holds up. link

11 [RAG] Laser Scanning Identifies Street-Level Surface Materials in 3D A new system automatically identifies what materials (asphalt, concrete, metal, etc.) coat real-world urban surfaces by combining mobile laser scan data with existing 3D city maps. Matching physical reflectance “fingerprints” from lidar to semantic map objects is tricky because lighting, sensor angle, and surface wear all distort readings. Cities and infrastructure planners could use this to keep digital twins accurate and up-to-date without expensive manual surveys. link

12 [Image Gen] AI Image Colors Are Too Vivid — Here’s the Fix Most AI image generators secretly cheat by making colors punchier and more saturated than real photos, because that’s what makes humans click “thumbs up” in training. The problem runs deep: both human raters and the automated metrics used to judge image quality are systematically biased toward eye-catching over accurate, meaning generators have been optimized for the wrong target all along. This work exposes that bias and introduces a way to measure and correct for it, which could push the next generation of AI imagery toward something that actually looks like it came from a real camera. link

13 [Robotics] Robots That Keep Learning New Tasks Without Forgetting Old Ones A new training framework lets robots continuously learn new skills over time without losing what they already know how to do. The trick is storing tiny compressed snapshots of past experiences — combining what the robot saw, heard, and felt — instead of saving expensive raw data, making it practical under real memory limits. This could mean robots in homes or warehouses that genuinely improve on the job, picking up new tasks from demonstration without needing to be retrained from scratch. link

Looking at this paper, the tag “RAG” doesn’t fit — this is about image processing/computer vision. The closest standard taxonomy tag would be something like Fine-Tuning or Vision. I’ll use Fine-Tuning as it’s the core technical contribution.

14 [Fine-Tuning] Lightweight LoRA Adapters Clear Hazy Photos Without Labeled Data A team built a system that removes haze from real-world photos by combining lightweight model add-ons (LoRA) with AI-powered text guidance — no clean reference images required for training. Getting this to work is tricky because haze looks wildly different across scenes, and retraining a full vision model for each new environment is prohibitively expensive. Photographers, autonomous vehicles, and surveillance systems operating in foggy or polluted conditions could now adapt to new environments quickly and cheaply. link

15 [Multimodal] Fixing AI’s Tendency to “Forget” Images in Long Conversations Multimodal AI models struggle to stay visually grounded as conversations get longer — images effectively fade from the model’s attention the more text piles up. The problem traces back to how position encoding works: existing methods treat the distance between image and text tokens as ever-growing, making the model mathematically discount visuals over time. This fix keeps image tokens perpetually “close” to the text regardless of document length, meaning AI assistants could finally give reliable, image-consistent answers in long documents or extended chats. link