Skip to main content
TokenCost logoTokenCost
IndustryApril 8, 2026·9 min read

Project Glasswing and Claude Mythos Preview: the AI that found a 27-year-old bug for under $50

On April 7, Anthropic officially announced Project Glasswing - a coalition of 52+ organizations getting exclusive access to Claude Mythos Preview for defensive security work. The model costs $25 per million input tokens, is not publicly available, and has already found zero-days in every major OS and browser. Here's what the numbers actually say.

Official Project Glasswing announcement image from Anthropic showing glasswing butterfly wing pattern

Image source: Anthropic

TL;DR

Claude Mythos Preview costs $25/1M input and $125/1M output - five times the price of Opus 4.6 - and access is restricted to the 52+ organizations in Project Glasswing. There is no public release date. On SWE-bench Verified it scores 93.9% vs Opus 4.6's 80.8%, and on the Firefox exploit task it produced working exploits 181 times vs 2 for Opus.

Anthropic has already found zero-days in every major OS and browser using the model, including a 27-year-old OpenBSD TCP SACK vulnerability patched March 25 for under $50. Partners received $100M in usage credits and Anthropic donated $4M directly to open-source security organizations.

How the model went public before Anthropic planned

On March 26, Fortune broke the story after two researchers - Roy Paz from LayerX Security and Alexandre Pauwels from Cambridge - found nearly 3,000 unpublished Anthropic assets in an unprotected, publicly searchable data lake. Among them was a draft blog post about a model codenamed "Capybara," described internally as "a new name for a new tier of model: larger and more intelligent than our Opus models - which were, until now, our most powerful."

Anthropic confirmed the leak was real, attributed it to "human error" in CMS configuration, and said they were testing the model for cybersecurity applications. Two weeks later, on April 7, came the official announcement. The leaked draft and the final announcement tell basically the same story - the leak just forced the timeline.

What Project Glasswing actually is

The name comes from the glasswing butterfly (Greta oto), whose transparent wings let it hide in plain sight - a reference to vulnerabilities hiding in code that is constantly in view. Anthropic's stated logic: Mythos is too capable to release broadly, so instead of a public API they built a controlled coalition of organizations whose job is specifically to find and fix vulnerabilities.

Twelve named launch partners signed on at launch: Anthropic, Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Beyond those 12, another 40+ organizations responsible for critical software infrastructure also received access. Anthropic has not published the full list.

On the financial side: $100 million in Mythos Preview usage credits distributed to Glasswing participants, plus $2.5 million donated to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5 million to the Apache Software Foundation. $4 million total in direct donations.

The benchmark numbers

Most benchmark tables are mostly noise. A few of these are not. The gap on SWE-bench Verified is 13 points over Opus 4.6 - that is large for a model in the same family. The Firefox exploit number is the one that keeps coming up in discussions: 181 successful exploits vs 2 for Opus is not a marginal improvement.

BenchmarkMythos PreviewOpus 4.6Gap
SWE-bench Verified93.9%80.8%+13.1 pts
SWE-bench Pro77.8%53.4%+24.4 pts
SWE-bench Multilingual87.3%77.8%+9.5 pts
CyberGym83.1%66.6%+16.5 pts
GPQA Diamond94.6%91.3%+3.3 pts
HLE (no tools)56.8%40.0%+16.8 pts
HLE (with tools)64.7%53.1%+11.6 pts
Terminal-Bench 2.082.0%65.4%+16.6 pts
BrowseComp86.9%83.7%+3.2 pts
OSWorld-Verified79.6%72.7%+6.9 pts

Third-party ranking from BenchLM.ai as of April 7: overall score 82/100, ranked 5th of 106 models, with the top coding score in the entire leaderboard (79.5). Its weakest category is instruction following, where it ranks 64th. The model is optimized for agentic code tasks, not for following elaborate multi-step formatting instructions.

The zero-days

This is where the announcement gets interesting. These are not synthetic benchmark results - they are actual vulnerabilities that are now patched.

The most-cited one: a bug in OpenBSD's TCP SACK implementation that had been in the codebase since 1998. Twenty-seven years. The bug let any attacker remotely crash any OpenBSD host with a crafted TCP packet. OpenBSD patched it on March 25, 2026 (errata #025) before the public announcement. Anthropic says the specific run that found it cost under $50. Total cost across 1,000 scaffold runs to explore the OpenBSD codebase was under $20,000.

The FFmpeg bug: 16 years old, and automated fuzzing tools had hit the vulnerable code path 5 million times without catching it. Mythos caught it.

Beyond those two: a Linux kernel privilege escalation chain (ordinary user to full machine control), a browser exploit chaining 4 vulnerabilities with a JIT heap spray that escaped both renderer and OS sandboxes, and a FreeBSD NFS server exploit that split a 20-gadget ROP chain across multiple packets to grant root access to unauthenticated users. Anthropic says Mythos has found vulnerabilities in every major operating system and every major web browser.

For comparison: Opus 4.6 produced working Firefox exploits 2 times across several hundred runs. Mythos produced 181, and achieved register control on 29 additional attempts. On OSS-Fuzz with fully patched targets, Sonnet 4.6 and Opus 4.6 each found 1 tier-5 crash (full control flow hijack). Mythos found 10.

What Claude Mythos Preview costs

The official pricing from Anthropic's Glasswing page: $25 per million input tokens, $125 per million output tokens. Available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry - but in all cases access is gated to Glasswing participants.

ModelInput / 1MOutput / 1MAccess
Claude Mythos Preview$25.00$125.00Glasswing only
Claude Opus 4.6$5.00$25.00General
Claude Sonnet 4.6$3.00$15.00General
GPT-5.4$2.50$10.00General
Gemini 3.1 Pro$2.00$12.00General

Those usage credits change the effective cost picture for Glasswing participants. Spread across 52 organizations, that averages roughly $1.9 million per organization before anyone starts paying. At $25/1M input, that covers about 76 billion input tokens per organization. For context: the OpenBSD run that found the 27-year-old bug cost under $20,000 total across 1,000 runs - meaning Glasswing partners can run that same experiment roughly 100 times before touching their real budget.

The part nobody has a clean answer to

Mythos can chain 4 browser vulnerabilities into a working sandbox escape. It found a 27-year-old zero-day for under $50. The capabilities exist. Glasswing controls access right now, but Anthropic has already said they plan to introduce cybersecurity safeguards with an upcoming Opus model and eventually enable these capabilities at scale.

The math on the defensive side is genuinely good. Catching a 27-year-old zero-day before someone else finds it and sells it is real value. The FFmpeg bug sitting undetected after 5 million automated fuzzer hits makes a real argument for the approach.

What is harder to answer: whether a gated coalition of 52 organizations can move faster than everyone outside the coalition. The defenders got a head start. But the capability gap between Mythos and what is publicly available will shrink - open-weight models get more capable every few months. The window where this gating strategy provides a real defensive advantage probably is not infinite.

There is also this: Anthropic describes Mythos as a "general-purpose model" whose cybersecurity capabilities "emerged as a downstream consequence of general improvements" - not the result of dedicated security training. That matters because it means the next generation of openly available models will probably pick up these capabilities without anyone explicitly training for them.

Sources

Compare LLM pricing

Claude Mythos Preview pricing is now in our database. Compare it against Opus, GPT-5.4, Gemini 3.1 Pro, and 100+ other models.