Anthropic Unveils Project Glasswing: Claude Mythos Targets Software Vulnerabilities

The company’s new initiative will be available to partners including Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Together, they aim to secure the world’s most critical software infrastructure.

How does it work?

According to the company, its new unreleased frontier model, Claude Mythos Preview, has reached a level of coding capability at identifying software vulnerabilities where it outperforms nearly all humans except for the most elite experts.

Anthropic’s partners will use the model to detect vulnerabilities in their own systems, while the company will share insights across the industry. In addition to core partners, more than 40 organizations responsible for building and maintaining critical software infrastructure have been granted access. Anthropic has allocated $100 million in usage credits for Claude Mythos and donated an additional $4 million to open-source security organizations.

The company notes that many vulnerabilities remain undiscovered for years due to the need for highly skilled experts or specialized tooling. However, modern AI models are rapidly changing this dynamic. Over the past year, they have become highly effective at analyzing code and identifying vulnerabilities. Anthropic states that Claude Mythos Preview is capable of uncovering bugs that have gone unnoticed for decades and generating increasingly sophisticated exploits.

Why does it matter?

Anthropic emphasizes that as AI advances rapidly, so do the capabilities of cyberattacks. This creates serious risks for the economy, public safety, and national security. Through Project Glasswing, the company aims to mitigate future threats while addressing existing ones.

In its statement, the company notes:

“Project Glasswing is a starting point. No one organization can solve these cybersecurity problems alone: frontier AI developers, other software companies, security researchers, open-source maintainers, and governments across the world all have essential roles to play. The work of defending the world’s cyber infrastructure might take years; frontier AI capabilities are likely to advance substantially over just the next few months. For cyber defenders to come out ahead, we need to act now.”

Anthropic estimates that global cybercrime losses reach approximately $500 billion annually. Cyberattacks are already affecting corporate networks, healthcare systems, energy infrastructure, transportation, and government institutions. While some vulnerabilities are minor, others can have severe consequences - risks the company aims to reduce.

Early results

In recent weeks, Mythos Preview has identified thousands of zero-day vulnerabilities, including critical issues across all major operating systems and browsers.

Among them:

A 27-year-old vulnerability in OpenBSD - an operating system known for its strong security and widely used in firewalls and critical infrastructure. The flaw allowed attackers to remotely crash any system simply by connecting to it;
A 16-year-old vulnerability in FFmpeg, a widely used multimedia framework. Notably, the issue was located in a line of code that had been executed by automated testing tools around five million times without detection;
A chain of vulnerabilities in the Linux kernel, the backbone of most global server infrastructure, which allowed attackers to escalate privileges from a regular user to full system control.

All of these vulnerabilities have now been patched.

Benchmark results, including CyberGym, highlight a significant performance gap between Mythos Preview and Claude Opus 4.6, previously considered the company’s most capable model. On the Cybersecurity Vulnerability Reproduction benchmark, Mythos scored 83.1%, compared to 66.6% for Opus 4.6.

Industry reactions

Partners already testing Mythos Preview have shared early feedback:

Amy Herzog, Vice President and CISO at Amazon Web Services:

“At AWS, we build defenses before threats emerge, from our custom silicon up through the technology stack. …We’ve been testing Claude Mythos Preview in our own security operations, applying it to critical codebases, where it’s already helping us strengthen our code. We’re bringing deep security expertise to our partnership with Anthropic and are helping to harden Claude Mythos Preview so even more organizations can advance their most ambitious work with security that sets the standard.”

Igor Tsyganskiy, EVP of Cybersecurity and Microsoft Research at Microsoft:

“As we enter a phase where cybersecurity is no longer bound by purely human capacity, the opportunity to use AI responsibly to improve security and reduce risk at scale is unprecedented. Joining Project Glasswing, with access to Claude Mythos Preview, allows us to identify and mitigate risk early and augment our security and development solutions so we can better protect customers and Microsoft. When tested against CTI-REALM, our open-source security benchmark, Claude Mythos Preview showed substantial improvements compared to previous models. We look forward to partnering with Anthropic and the broader industry to improve security outcomes for all.”

Heather Adkins, VP of Security Engineering at Google:

“Google is pleased to see this cross-industry cybersecurity initiative coming together and to make Mythos Preview available to participants via Vertex AI. It’s always been critical that the industry work together on emerging security issues, whether it’s post-quantum cryptography, responsible zero-day disclosure, secure open source software, or defense against AI-based attacks…”

Comparison with previous models

In programming and so-called agentic coding - the ability to independently solve engineering tasks and execute workflows - Mythos significantly outperforms previous models. Across different SWE-bench variants, its scores range from approximately 59% to nearly 94%, compared to roughly 27% to 81% for Opus. The gap is especially pronounced in complex and multimodal tasks.

In reasoning benchmarks, the gap is smaller but consistent. Mythos outperforms Opus by around 10-15 percentage points in advanced scientific and interdisciplinary tests such as GPQA and Humanity’s Last Exam, indicating improvements in both coding and reasoning capabilities.

In agentic search and computer use, Mythos also maintains an advantage, though less dramatic.

Cybersecurity benchmarks such as CyberGym show a clear leap forward, reflecting the model’s ability to discover and chain vulnerabilities - one of the core drivers behind Project Glasswing.

Access and rollout

Anthropic does not plan to make Claude Mythos Preview publicly available. Instead, the company’s long-term goal is to enable the safe deployment of Mythos-class models at scale. As part of this effort, new safeguards will be introduced in future Claude Opus models:

“We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.”

However, Glasswing partners will gain access to the model to identify and fix vulnerabilities in their core systems. The work is expected to focus on local vulnerability detection, black-box testing of binaries, endpoint security, and penetration testing.

Claude Mythos Preview will be available to participants at $25/$125 per million input/output tokens via Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

In addition to the $100 million in usage credits, Anthropic has donated $2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation, as well as $1.5 million to the Apache Software Foundation, to support open-source maintainers.

Anthropic also plans to publish a report within 90 days summarizing findings, disclosed vulnerabilities, and security improvements. The company will collaborate with security organizations to develop best practices for the AI era. It also confirmed ongoing discussions with U.S. government officials regarding the model’s offensive and defensive capabilities.

In the company’s words:

“We are ready to work with local, state, and federal representatives to assist in these tasks. …We are hopeful that Project Glasswing can seed a larger effort across industry and the public sector, with all parties helping to address the biggest questions around the impact of powerful models on security. We invite other AI industry members to join us in helping to set the standards for the industry. In the medium term, an independent, third-party body—one that can bring together private- and public-sector organizations—might be the ideal home for continued work on these large-scale cybersecurity projects.”

How does it work?#

Why does it matter?#

Early results#

Industry reactions#

Comparison with previous models#

Access and rollout#