Web infrastructure company Cloudflare says Claude Mythos reasoning ‘looks like the work of a senior researcher’

It’s been difficult as a layman in cybersecurity to figure out how seriously we should take all the hype and fear over Anthropic’s Claude Mythos AI. But now I have something a little more concrete to grasp onto, as Cloudflare has been busy figuring out exactly what Mythos seems good for.

The AI model recently swept through the cybersecurity industry and caused a stir by showing the magnitude of AI’s potential threat to software security—for instance, finding thousands of vulnerabilities in every OS and major web browser. But as a mere plebian sitting far from the techbro classes, and despite banks and other institutions rushing to reckon with it, I’d not been sure exactly how much stock to put behind the stink the AI model was causing. Checking over Cloudflare’s analysis earlier today, however, has given me a little bit of a better idea.

The company has been part of Anthropic’s Project Glasswing. The idea behind this project (and presumably Mythos in general) seems to be to get in ahead of any bad actors in the AI arms race. It essentially has Anthropic as the ‘good guy’ that gets companies secured against the latest AI threats to cybersecurity, by using AI to identify the same threats a bad actor might.

Anthropic explains: “Claude Mythos Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities… Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes.”

Glasswing gives selected important tech companies access to Mythos Preview to “scan and secure both first-party and open-source systems”, with Anthropic giving up to $100 million of credits for them to use. The companies include Amazon Web Services, Apple, Google, Microsoft, Nvidia, and more, including Cloudflare.

It might seem strange to have your cybersecurity strategy raise vulnerabilities to light that bad actors could, in theory, exploit, but that’s nothing new: companies often hire ‘red teams‘ to do such things so they can patch them. This is essentially the same idea, but on a whole new scale, given the use of AI.

Overall, Cloudflare is impressed with Mythos, saying it’s a “real step forward… not just a refinement of what came before… what changed with Mythos Preview is that a model can now take those low-severity bugs (which would traditionally sit invisible in a backlog) and chain them into a single, more severe exploit.”

An image showing the Claude AI logo displayed on the screen of a smartphone placed on a reflective surface onto which lines of computer code are projected.

(Image credit: NurPhoto via Getty Images)

Two features that Cloudflare says stood out about Mythos during the company’s testing were its “exploit chain construction” (ie, its ability to chain vulnerabilities together intelligently into a single attack) and “proof generation” (ie, actually demonstrating that what it comes up with works).

The model isn’t perfect, however—would one expect a “preview” to be so? Cloudflare, for instance, found it would sometimes pop up guardrails that didn’t make sense, preventing legitimate security research.

The company also seems to suggest that a lot of people have been thinking about Mythos somewhat incorrectly, focusing on how quickly it can find vulnerabilities for quick patching. And it discovered (the hard way) that it’s better to use the AI model in a more directed and split-up way rather than just setting one Mythos agent to a big review with hands off.

If you just set it to check out a giant codebase, it might struggle to maintain relevant context throughout the entire process in a way that a human researcher wouldn’t. “Using the model directly in a coding agent turns out to be fine for manual investigation when a researcher already has a lead and wants a second pair of eyes. However, it’s the wrong tool for achieving high coverage.”

A Cloudflare diagram of the 'vulnerability discovery harness' it uses for Claude Mythos.

(Image credit: Cloudflare)

The company ultimately found that using Mythos effectively means using a ‘harness’ that narrows its scope, relying on a second agent to clear signal from noise, and using multiple agents along the chain as well as in parallel. In other words, it seems like having lots of Mythos ‘worker’ with specific tasks works better than trying to have one super-worker Mythos taking on the entire codebase.

Moving forward, rather than using Mythos solely with a focus on patching faster, Cloudflare thinks people’s focus should be on architecture:

“The harder question is what the architecture around the vulnerability should look like. The principle is to make exploitation harder for an attacker even when a bug exists, so that the gap between when a vulnerability is disclosed and when it is patched matters less.

That means defenses that sit in front of the application and block the bug from being reached. It means designing the application so that a flaw in one part of the code cannot give an attacker access to other parts. It means being able to roll out a fix to every place the code is running at the same moment, rather than waiting on individual teams to deploy it.”

Exactly how Cloudflare plans to use Mythos in this way is something the company is still keeping close to its chest, but it says it will “share more on what that means for customers in the weeks ahead.”

Cloudflare is far from unfamiliar with AI. The company has previously said that “increasingly the distinction between bots and humans is moot”, at least when it comes to how websites (which often run through Cloudflare servers) treat users. So I suppose it’s no surprise the company is diving into a heavily agentic approach to cybersecurity. Though if Mythos really is as much of a leap ahead as the company is suggesting, it might be a case of ‘get to it before your adversaries do’—the AI arms race churns on whether we want it or not.

Advertisements

Leave a Reply

Your email address will not be published. Required fields are marked *