TL;DR
In February 2026, Anthropic's Claude Opus 4.6 autonomously scanned nearly 6,000 C++ files in Firefox's codebase and submitted 112 unique vulnerability reports — 22 of which Mozilla rated as high-severity. That's roughly one-fifth of all high-severity Firefox vulnerabilities patched in 2025. The entire operation cost roughly $4,000 in API credits. Most fixes shipped in Firefox 148.0. The result: a landmark demonstration of autonomous AI security research that has simultaneously excited defenders and alarmed those who worry about the same capability in adversarial hands.
Table of Contents
- What Claude Found — and How Fast
- The 22 Vulnerabilities: Severity and Types
- How Autonomous Vulnerability Discovery Actually Works
- Mozilla's Response and the Patching Timeline
- The Broader Picture: 500+ Zero-Days in Open-Source Software
- Offensive vs. Defensive: The Cybersecurity Debate
- How This Compares to Human Bug Bounty Programs
- Enterprise Implications for Security Teams
- The Dual-Use Dilemma: Research vs. Weaponization
- What Comes Next
What Claude Found — and How Fast
Twenty minutes. That is how long it took Claude Opus 4.6 to identify its first Firefox vulnerability after beginning its autonomous scan of the browser's codebase in February 2026. By the time the two-week collaboration between Anthropic and Mozilla concluded, the model had filed 112 unique vulnerability reports and surfaced 22 high-severity security flaws that Firefox engineers had never seen before.
These were not theoretical issues or edge-case edge bugs buried in deprecated code. They were zero-day vulnerabilities — flaws completely unknown to Mozilla's maintainers — sitting inside one of the world's most scrutinized open-source codebases. Firefox is used by hundreds of millions of people globally. Its codebase is regularly audited by professional security researchers, put through fuzz testing campaigns, and subjected to static analysis tooling. And yet Claude found things humans missed.
The 14-day window is not an accident of timing. Anthropic deliberately structured the experiment as a focused sprint to understand what a frontier AI model could accomplish on a security task with a defined scope and no human hand-holding mid-process. The model was given access to Firefox's source tree — nearly 6,000 C++ files — and instructed to find vulnerabilities. It did not receive hints about where to look, which subsystems were historically buggy, or which patterns to prioritize. It explored on its own.
The speed of the initial discovery — a exploitable crash found in just 20 minutes — was followed by rapid accumulation. Once Claude had validated the first vulnerability, it had already found 50 additional unique crashing inputs. The model was not getting lucky once; it was demonstrating a systematic ability to identify a class of problem and replicate that discovery across a large codebase.
The 22 Vulnerabilities: Severity and Types
Of the 112 reports Claude submitted, Mozilla triaged and confirmed 22 as high-severity vulnerabilities. The remainder included moderate-severity issues, duplicate crash paths, and lower-priority findings that still warranted review.
The primary vulnerability class identified was use-after-free — a memory corruption bug that occurs when a program continues to use a pointer to memory that has already been freed. Use-after-free vulnerabilities are particularly dangerous in browser engines because they can allow an attacker to overwrite arbitrary data in a running process, potentially leading to remote code execution. In web browsers, which run untrusted code from arbitrary websites, a remotely exploitable use-after-free vulnerability is about as serious as security bugs get.
The discovery began in Firefox's JavaScript engine — the component responsible for parsing, compiling, and executing JavaScript code from web pages. This is historically the most attack-prone surface area of any modern browser, because JavaScript engines are extraordinarily complex, handle wildly unpredictable inputs, and must maintain strict memory safety guarantees under adversarial conditions. From the JavaScript engine, Claude's analysis expanded outward to cover additional browser subsystems.
The 22 high-severity findings represent approximately 19% — almost one-fifth — of all high-severity Firefox vulnerabilities that Mozilla patched throughout 2025. That ratio is striking: a two-week AI experiment produced nearly the same volume of serious bug discoveries as an entire year of conventional security research by the broader community.
How Autonomous Vulnerability Discovery Actually Works
The most technically interesting aspect of this project is not what Claude found, but how it found things without being told where to look.
Traditional vulnerability research follows predictable workflows: a researcher identifies a high-value target subsystem, studies past bug patterns, writes fuzzing harnesses, and iterates on crashes. This requires deep expertise, months of context accumulation, and a human in the loop making strategic decisions at every step.
Claude's approach was different. Rather than fuzzing — which involves generating massive volumes of random or mutated inputs and watching for crashes — Anthropic's model performed code analysis. It read the source code, formed an understanding of the program's memory management patterns, identified locations where those patterns could go wrong, and generated test cases designed to trigger those specific failure modes.
This is closer to how a senior security engineer thinks through a codebase than how a fuzzer operates. Fuzzing is powerful but essentially blind — it throws inputs at a program and watches what breaks. Code analysis is deliberate — it builds a mental model of how the program works and reasons about where that model breaks down.
The iteration loop worked like this: Claude would analyze a region of code, identify a potential vulnerability, generate a proof-of-concept that triggered the issue, validate the crash, and then systematically search for similar patterns elsewhere in the codebase. After validating its first finding in 20 minutes, it had expanded to 50 unique crashing inputs before human researchers even reviewed the initial submission.
The 112 total reports submitted represent the breadth of this systematic sweep. Many covered the same underlying vulnerability class manifesting in different locations — which is exactly how a thoughtful researcher would document findings, bundling related issues for efficient triage rather than submitting one report per crash.
Mozilla's Response and the Patching Timeline
Mozilla's security team was brought in as a partner, not a surprise recipient. Anthropic coordinated the disclosure in line with its published Coordinated Vulnerability Disclosure operating principles, which describe procedures for working with software maintainers following standard responsible disclosure norms.
Mozilla researchers fielded the bulk submissions and provided transparent triage guidance throughout the process. This is a non-trivial undertaking — processing 112 structured vulnerability reports, many of which describe related but distinct crash paths, requires careful deduplication and prioritization. Mozilla's security team handled this efficiently, and the majority of the confirmed high-severity fixes shipped with Firefox 148.0. The remaining patches were scheduled for upcoming release cycles.
Mozilla's response went beyond simply patching the bugs. The organization subsequently began internal experimentation with Claude for security research purposes — a signal that the collaboration produced enough value that Mozilla wanted to understand how to operationalize AI-assisted vulnerability discovery as a regular part of their security practice rather than a one-off academic exercise.
This is the practical outcome that matters most for the industry: a browser vendor with an enormous, safety-critical codebase is now actively exploring how to use AI models as an ongoing component of their security workflow.
The Broader Picture: 500+ Zero-Days in Open-Source Software
The Firefox collaboration did not happen in isolation. Anthropic has documented that Claude found more than 500 zero-day vulnerabilities in well-tested open-source software across a broader research program. The Firefox engagement was one concentrated case study within a larger body of work demonstrating AI-assisted vulnerability discovery at scale.
That number — 500+ zero-days in well-tested open-source software — deserves to sit with you for a moment. These are not obscure hobby projects with minimal security review. "Well-tested open-source software" in a security context means projects that receive professional audits, have active CVE programs, attract skilled bug hunters from academia and industry, and run continuous fuzzing infrastructure. The category includes things like web browsers, cryptographic libraries, network stacks, and core system utilities.
Finding vulnerabilities in these projects is genuinely hard. The fact that an AI model can now do it at scale — systematically, cheaply, and without human guidance mid-process — represents a qualitative shift in the capability landscape for security research.
The $4,000 cost figure for the exploitation evaluation phase (running tests several hundred times with different starting points) also deserves attention. For context: professional penetration testers and bug hunters typically charge thousands of dollars per day, and finding a single high-severity browser vulnerability can take weeks of focused work. The cost-per-finding ratio here is dramatically lower than anything achievable through conventional means.
Offensive vs. Defensive: The Cybersecurity Debate
Every capability that helps defenders also helps attackers. This is the fundamental tension that makes the Firefox research both exciting and uncomfortable to read about.
From the defensive side, the results are clearly positive. Mozilla learned about 22 serious vulnerabilities it did not know existed. Those vulnerabilities were patched before anyone with malicious intent discovered and exploited them. Hundreds of millions of Firefox users are safer because of this work. If AI models can be deployed by software vendors to continuously sweep their codebases for this class of bugs, the result is a more secure software ecosystem.
The counterargument is harder to dismiss. The same capability — scan a large C++ codebase, identify use-after-free vulnerabilities, generate crashing proof-of-concepts — is equally useful to a threat actor. A nation-state intelligence service or a criminal ransomware operation could deploy the same technique against software they want to exploit rather than software they want to protect. They would not submit reports to the vendor. They would keep the vulnerabilities secret and weaponize them.
This asymmetry is not unique to AI. Fuzzing tools, static analyzers, and professional exploit research have always faced the same dual-use reality. But AI changes the economics in a way that matters: the barrier to performing this kind of research, historically requiring deep technical expertise and significant time investment, drops substantially when a general-purpose model can do the heavy lifting.
Anthropic's own evaluation is instructive here. The company tested whether Claude could not just find vulnerabilities but also develop working exploits from them. The result: Claude succeeded in building primitive exploits in only 2 of several hundred attempts. Vulnerability discovery, it turns out, is dramatically easier than exploit development — and AI has improved the former far more than the latter, at least for now. As Anthropic's analysis noted, the cost of identifying vulnerabilities is currently an order of magnitude cheaper than creating a working exploit for them. That gap may narrow as models improve, but it is a meaningful distinction in the current threat landscape.
How This Compares to Human Bug Bounty Programs
Mozilla runs a well-regarded bug bounty program that pays researchers for discovering Firefox vulnerabilities. High-severity browser bugs typically command payouts in the range of $3,000–$10,000+ depending on severity and exploitability. The program has been running for years and has produced a steady stream of valuable findings from the global security research community.
Against that backdrop, Claude's two-week sprint finding 22 high-severity vulnerabilities for approximately $4,000 in compute costs is a remarkable comparison point. In a traditional bug bounty context, 22 high-severity Firefox findings could represent anywhere from $66,000 to $220,000 in researcher payouts, plus the human time investment. The AI-assisted approach produced comparable volume at a fraction of the cost.
This does not mean bug bounty programs are obsolete. Human researchers bring creativity, adversarial thinking, contextual intuition, and the ability to chain vulnerabilities into sophisticated attack scenarios in ways that current AI models cannot fully replicate. The 2-out-of-several-hundred exploit success rate for Claude is evidence of exactly this gap — finding a crash is not the same as building an exploit, and human attackers are still considerably better at the latter.
What the comparison does suggest is that AI-assisted vulnerability discovery will likely become a standard complement to bug bounty programs rather than a replacement. Vendors may use AI models to sweep their codebases before launching bounty programs, reducing the volume of "easy" finds and shifting bounty hunters toward harder, higher-value problems that require genuine human creativity.
Enterprise Implications for Security Teams
For enterprise security teams, the Firefox research surfaces several practical implications worth thinking through carefully.
The attack surface for your software is larger than you think. If a well-resourced, continuously audited open-source browser has this density of undiscovered high-severity vulnerabilities, enterprise software — which typically receives far less security review — almost certainly does too. The question is not whether your codebase has undiscovered vulnerabilities. It is whether you find them before someone with bad intentions does.
AI-assisted code auditing is now a realistic tool. The Firefox engagement demonstrates that frontier AI models can be deployed against real-world production codebases and produce actionable security findings. Enterprise security teams should be evaluating how to incorporate this capability into their application security programs, either through Anthropic's own offerings or through the broader ecosystem of AI-powered security tools that will emerge in response to this kind of proof-of-concept.
The economics of offensive security research have changed. Threat modeling exercises used to assume that sophisticated vulnerability research required significant attacker investment — skilled researchers, time, and resources that constrained the realistic threat landscape to nation-states and well-funded criminal organizations. The $4,000 cost figure challenges that assumption. As models improve and deployment patterns mature, the economic barrier to AI-assisted vulnerability discovery will only decrease further. Security teams need to update their threat models accordingly.
Responsible disclosure programs need to scale. Mozilla's experience processing 112 structured reports efficiently is a data point for what responsible disclosure looks like at AI scale. Enterprise security programs and open-source projects that receive AI-generated vulnerability reports in bulk will need triage processes capable of handling higher volume than human-generated reports historically required.
The Dual-Use Dilemma: Research vs. Weaponization
Anthropic's decision to publish detailed findings from the Firefox collaboration — including the methodology, cost structure, and success metrics — is itself a deliberate choice about how to handle dual-use research. The reasoning follows a logic familiar in security research: if this capability exists, transparency about its existence and limitations is more valuable to defenders than the marginal secrecy it provides against sophisticated attackers who can likely develop similar capabilities independently.
This logic is coherent but not universally accepted. There is a serious argument that publishing detailed methodology for AI-assisted vulnerability discovery at this level of capability detail provides meaningful uplift to threat actors who lack the technical sophistication to develop the technique independently. The security research community has debated similar questions around full disclosure versus coordinated disclosure for decades, and there is no consensus answer.
What is clear is that the capability itself is not going back in the box. AI models will be used for vulnerability research — by defenders, by bug bounty hunters, by red teams, and eventually by adversaries. The relevant question is how the security ecosystem adapts: through improved AI safety constraints that limit dangerous capability deployment, through faster defensive use of the same tools, or through policy frameworks that establish norms around AI-assisted offensive security research.
Anthropic's published Coordinated Vulnerability Disclosure operating principles are an early step toward the last of these. They describe a framework for how AI research findings about software vulnerabilities should be handled — vendor notification before disclosure, reasonable remediation timelines, and cooperation with maintainers throughout the process. Whether similar frameworks become industry norms, or whether they remain voluntary commitments by a single company, depends on decisions made by researchers, vendors, policymakers, and AI developers over the next few years.
The 22 Firefox vulnerabilities are not the end of a story. They are the opening data point in a much longer and more consequential chapter about what it means for machines to conduct security research autonomously, at scale, and for a cost measured in thousands rather than millions of dollars.
What Comes Next
Mozilla has already begun internal experimentation with Claude for ongoing security research — a sign that the practical value of AI-assisted vulnerability discovery has cleared the bar for real operational deployment. Other browser vendors and major open-source projects are almost certainly evaluating similar approaches.
Anthropic's published CVD principles suggest the company sees this as an ongoing research program, not a one-time demonstration. Future AI systems — more capable successors to Opus 4.6 — will presumably be more effective at this task, not less. The 2-out-of-several-hundred exploit development success rate will likely improve as models get better at reasoning through multi-step attack chains.
The Firefox collaboration is a rigorous, documented, responsibly disclosed example of what autonomous AI security research looks like at the current frontier. It is simultaneously a proof of value for defenders and a warning about the capability landscape that everyone involved in software security needs to take seriously.
The twenty minutes it took to find the first vulnerability will not get longer. Every indication is that it will get shorter.
Sources: Anthropic — Partnering with Mozilla to improve Firefox's security (March 6, 2026); Mozilla Security Bug Bounty Program