Andrej Karpathy's AutoResearch: AI Agents Running ML Experiments Autonomously

Link: https://github.com/karpathy/autoresearch

Released: March 7, 2026 — 21,000+ GitHub stars within days

AutoResearch is a 630-line Python framework built around one premise: a human should only have to write what they want researched, not how to run the experiments. You describe your research directions in a plain Markdown file (program.md), point an AI coding agent at the repository, and by morning you have a validated git history of everything it tried. The agent modifies only train.py (the model training loop), and every experiment runs for exactly 5 minutes on a single NVIDIA GPU — a fixed budget that keeps all results comparable regardless of architectural changes. Karpathy ran the system for two continuous days: it executed 700 experiments, discovered 20 optimizations, and produced an 11% speed-up when those findings were applied to a larger language model.

The design philosophy is worth noting: the agent's scope is deliberately narrow (only train.py is editable), human review stays tractable (diffs are small and bounded), and no distributed infrastructure is required. This is intentionally not a general AGI research lab — it is a tightly constrained improvement loop where humans set strategy and machines run evidence. The simplicity is the point. It makes the loop auditable, restartable, and actually useful rather than impressively open-ended but impossible to trust.

For anyone building agentic systems, this repository is a concrete proof-of-concept that the "meta-loop" — an agent that improves an agent — works in practice and at human-relevant timescales. The 21,000-star reception in days signals the broader community recognized this immediately. Reading the README alongside the three-file architecture gives you a transferable mental model for designing any self-improving pipeline, regardless of domain.