More

thombles · 2026-05-22T22:30:48 1779489048

As one of those commenters on the previous post - yep, that theory appears to have been comprehensively trounced. Unless anything comes to light that mythos was applied poorly to curl, the evidence suggests that it’s not uniquely effective vs other AI-assisted approaches. I’ll be interested to see what’s reported in the next curl release.

thombles · 2026-05-11T10:08:55 1778494135

Curl simply isn't a good data point. It's one of the most picked-over codebases in existence with extensive security testing practices. All the researchers using not-quite-Mythos models have had plenty of time to report bugs up to this point. Daniel may be right that Mythos hasn't been a game changer for curl but the preconditions are different for virtually any other codebase. Perhaps the real marketing here is his own modesty about curl's maturity.

GuB-42 · 2026-05-11T10:40:34 1778496034

To me, it is a very good data point.

Curl uses all sorts of tools, including AI tools to find bugs. These tools, according to the article found hundreds of bugs including a dozen CVE.

Mythos found one vulnerability. It means the Mythos is just another tool, not the revolution it claims to be.

It is common that when a new tool is introduced that a bunch of bugs are found, with diminishing returns. Mythos finding one vulnerability is consistent to what I would expect for a major update to an existing tool, which Mythos is over existing LLM-based solutions.

atonse · 2026-05-11T14:02:47 1778508167

I had a totally different take. The fact that Mythos found only one vulnerability is testament to how solid curl is, not how bad Mythos is.

Look at the Firefox blog post where they found something like 400 (or more) findings.

I have no doubt Mythos is very good at this, but I also don't think it's something unattainable by other labs within the next few months, with focus.

skywhopper · 2026-05-11T16:06:49 1778515609

The point is that Anthropic claims it’s a huge leap over everything else. But it isn’t.

rohit89 · 2026-05-11T16:26:24 1778516784

This depends on the actual number of undiscovered bugs still in curl. If there is nothing to find then even a 10x better Mythos will find nothing. Also I think the quality of the codebase matters a lot when it comes to finding bugs. Its possible that the curl is so well written that it is relatively straightforward for existing ai tools to find bugs.

atonse · 2026-05-11T18:05:30 1778522730

But both things can be true. It could be a huge leap (see Firefox’s example) but also find almost nothing in an already well maintained and audited codebase, and that could mean there isn’t much to find.

ethin · 2026-05-11T19:23:19 1778527399

Okay, but how do we know that all 400 plus hits were actual vulnerabilities? I didn't read too deeply into it so I might've missed something but did someone test and validate each of those vulns to confirm that they were actually vulns?

atonse · 2026-05-12T03:52:05 1778557925

You can see the details here: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

HDThoreaun · 2026-05-11T19:35:24 1778528124

There is no way to tell until we find examples of vulnerabilities that mythos missed. For all we know curl currently has 0 vulnerabilities right now

empath75 · 2026-05-11T13:55:04 1778507704

It's not, really. Curl is an extraordinarily high value target that has already been picked over by well funded security researchers and state-sponsored groups using state of the art tooling for decades. That is not the target for which Mythos is a threat.

The threat isn't high value targets, which already had sophisticated folks picking over the code base using state of the art tools and tests, it's medium to low value targets which can now be picked over by random hackers who barely know anything about security themselves at a cost of a few dollars.

thombles · 2026-05-11T10:53:56 1778496836

The question is how many security vulnerabilities are actually left in the code after all the recent AI attention. Either Mythos is a nothingburger, or it's substantially more powerful but there's nothing left to do. Even a large amount of C can be correct eventually. Curl has the _potential_ to become a good data point maybe 6-12 months from now - if researchers and new tools find many more vulnerabilities then Mythos is proved to be hype. If they don't, then maybe Mythos is overkill for today's curl and its capabilities are better deployed elsewhere (like Firefox, apparently).

GuB-42 · 2026-05-11T11:35:19 1778499319

I have a hard time believing that Mythos found the only remaining Curl vulnerability. It is possible, but highly improbable.

And it is not overkill, the proof is that it found that vulnerability. It is like saying the new version of some static analyzer with some new rules is "overkill" because it only found only one more bug than the previous version. Deciding whether it is overkill or not is more about context. Using a very expensive model like Mythos for some little used non-critical software is overkill, but for Curl, it absolutely isn't.

If Mythos found loads of vulnerabilities in Firefox but not in Curl, I wouldn't say that's because of Mythos is so good, but rather that with the release of Mythos, they did some testing that could have been done before using the same tools Curl have used.

thombles · 2026-05-11T11:46:41 1778500001

We will see. As for "testing that could have been done before", Mozilla's posts indicate otherwise. Use of Opus 4.6 led to 22 security-sensitive bugs vs Mythos' 271 (https://blog.mozilla.org/en/privacy-security/ai-security-zer...). They already had the methodology in place when the more powerful model came along (https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...):

> Once the end-to-end pipeline is in place, it’s trivial to swap in different models when they become available. Building this pipeline early helped us find a number of serious bugs using publicly-available models, and it also helped us hit the ground running when we had the opportunity to evaluate Claude Mythos Preview. In our experience, model upgrades increase the effectiveness of the entire pipeline: the system gets simultaneously better at finding potential bugs, creating proof-of-concept test cases to demonstrate them, and articulating their pathology and impact.

sitkack · 2026-05-11T13:35:14 1778506514

False dichotomy

spongebobstoes · 2026-05-11T10:59:13 1778497153

that makes it a good data point, because it is better able to illustrate the incremental capabilities of Mythos compared to previous tooling

that helps us to understand how much of Mythos is hype and how much is real

20k · 2026-05-11T10:32:48 1778495568

We see this exact hypetrain every time a new model is released. Mythos simply hasn't lived up to the "we're all gunna die from the flood of vulnerabilities" hype even slightly. Its slightly better than previous models by all accounts, cool stuff

I've seen literally near word-for-word this exact chain of events multiple times previously

thombles · 2026-05-05T02:33:27 1777948407

The answer is in the next sentence: "Bun owns its event loop and syscalls." They clearly want to manage their use of threads explicitly, which is not _unusual_ for systems programming but probably less common. Note that `rayon` is different from most of these in that it has nothing to do with async Rust - it's a tool for spreading computation over a thread pool, very popular in non-async projects, but it would also go against their goals here.

thombles · 2026-05-03T20:58:22 1777841902

Is the poster maybe confusing bandwidth (range of frequencies over which a single board can work) with bandwidth (data transfer speeds in bits per second)?

thombles · 2026-05-02T21:26:07 1777757167

I saw this the other day and was pretty confused - I prefer to write my own commit messages and wondered if I’d accidentally let the AI do it this time. Nope, just MS changing things behind my back. Sigh.

thombles · 2026-04-28T20:36:56 1777408616

I didn’t read this as a flex. More a rueful admission of his connection/addiction to GitHub.

i_think_so · 2026-04-29T03:53:52 1777434832

I saw it as a sad combination of the two.

thombles · 2026-04-15T23:30:26 1776295826

It's a meaningful difference for SaaS. Most likely an attacker doesn't have access to your running binary let alone source code, and if they probe it like a pentester would it will be noisy and blocked/flagged by your WAF.

thombles · 2026-04-10T06:56:44 1775804204

Microsoft could tone it down a bit (especially all the full screen harassment after windows updates) but I wonder how many casual users have had their bacon saved precisely because their documents and desktop got pushed to the cloud?

thombles · 2026-04-10T06:54:20 1775804060

It’s super hostile. I realised I was going to press it by accident eventually so I switched to Fossify Gallery before I did.

thombles · 2026-04-10T06:47:48 1775803668

It doesn’t? I use the OneDrive app for scanning documents all the time. + button then “Capture”

deckar01 · 2026-04-10T12:11:34 1775823094

Oh, I have to use their app to take the pic. I can’t use my existing photos anymore.