export const prerender = true; An AI Agent Ran 700 Experiments and Found What Human Researchers Missed — ScatterAI
ScatterAI
March 13, 2026 · Issue #3

An AI Agent Ran 700 Experiments and Found What Human Researchers Missed

Setup

The standard objection to AI-driven research has always been that agents lack the intuition to form good hypotheses. That objection got harder to hold in March 2026. Andrej Karpathy ran an autoresearch agent against nanochat — a small, clean language model codebase — and let it run. It logged 700 experiments. It found an 11% improvement. Human researchers working the same codebase hadn’t found it. The gap wasn’t creativity. It was throughput.

What They Found

The agent identified a configuration change — not a novel architecture, but a tuning adjustment in the training loop — that produced an 11% gain on the model’s benchmark suite. The result took 700 runs to surface. A human researcher running sequential experiments would have needed weeks; the agent ran them in parallel with no context switching cost. Karpathy’s conclusion was direct: every serious AI lab will run autoresearch pipelines within 18 months.

How It Works

Autoresearch agents follow a simple loop: propose a change, run a training or evaluation job, record the result, update a hypothesis table, repeat. The intelligence is in the proposal function — which change to try next based on what’s worked before. This is structured search over a configuration space, not reasoning from first principles. It works because ML improvement is largely empirical: the correct answer exists in the search space, it just requires enough runs to find it. Compute replaces intuition.

Why It Matters

Source: Karpathy on Autoresearch — X