Small Open Models Match Anthropic's Mythos on Vulnerability Discovery

Anthropic's Mythos announcement made waves — an AI autonomously finding thousands of zero-days including a 27-year-old OpenBSD bug. But AISLE ran the same showcase vulnerabilities through small, cheap open-weights models. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6B active parameters costing $0.11/M tokens. A 5.1B open model recovered the core chain of the 27-year-old OpenBSD bug.

The capability frontier in cybersecurity is jagged — it doesn't scale smoothly with model size. Small models outperformed most frontier models on basic security reasoning tasks. Rankings reshuffled completely across tasks. There is no stable "best model."

Signal for you: This is the "good enough" thesis playing out in real-time. If a $0.11/M token model can find the same bugs as Mythos, the moat isn't the model — it's the system, the pipeline, and maintainer trust. AISLE has 180+ validated CVEs with a model-agnostic approach. This applies broadly: in most specialized domains, the winning play is system-level integration, not raw model capability.