Hacker Newsnew | past | comments | ask | show | jobs | submit | j2kun's commentslogin

My website has search without a query string: https://www.jeremykun.com/

> Luckily AI can speed up defenders as well as attackers here, allowing embargoes that would previously have been uselessly short.

This is an important facet of the problem space: security risks turning into an arms race for who wants to spend more tokens.


One interesting thing is that this makes closed source code even greater asset for the defenders. Attacker cannot spend tokens for it, but defenders can spend tokens for hardening based on source code, while attacker is stuck with blackbox testing.

You would be surprised how adept SOTA models are at reverse engineering with IDA/Ghidra or even plain old objdump. Opus basically knows IDAPython on the back of its hand.

They can be, but the most interesting parts (backend code, deployment confs) are not usually available. Reversing clients can help to understand a bit, but not with equal level.

On the other hand, any source code leak could be catastrophic

Decompilation is quite good these days as well

There are only some 700 open Erdos problems left, so when they're all solved you can finally rest.

The blog post has many links to papers and preprints discussing this exact question.

The CANOS arxiv link says absolutely nothing about AlphaEvolve, Gemini, or LLMs. It seems to use purely traditional ML models. If AE did in fact write a quick script to test different configurations in order to optimize the results, they don't seem to have bothered to write about it.

I can't read the Nature paper about DeepConsensus, but from the summary, it doesn't really explain what role AE had in improving DC. It would be nice to be able to read about what role it actually played, and whether it used traditional or novel methods of performing it


I for one can't tell the difference between Claude and Gemini for coding. And the internal agent tooling is many times faster than Claude Code in my experience.

Lie? Gemini CLI is unuseable. The IF of gemini models is atrocious. Honestly, how often does your gemini CLI go insane in thought loops and you have to stop it?

I use Jetski.

what does that have to do with gemini coding?

Rather than "the book explains how bread is made" say "the book has a recipe for baking bread" and do not say, "the book is my soul mate"

The people who know what a "child process" is are under no false pretenses about the humanity of the underlying system.

The people who are writing op eds in major news publications about how their favorite chatbot is an "astonishing creature" and how it truly understands them are the ones who need this sort of law.


Python programming with 156 MHz and 3.5 MiB of RAM? Can a Python REPL even start up with that profile?

3.5 MB is pretty generous, actually! Some older TI-84 models had MicroPython running on a secondary ATSAMD21 processor with 32 KB of RAM - that was effectively unusable.

If you write an article shitting on a popular thing because it eclipses the popularity of your favorite thing (and essentially calling the people who use it ignorant sheep), chances are good you are part of a self-fulfilling prophecy, pushing people away from your community.

The article heavily quotes the "AI Security Institute" as a third-party analysis. It was the first I heard of them, so I looked up their about page, and it appears to be primarily people from the AI industry (former Deepmind/OpenAI staff, etc.), with no folks from the security industry mentioned. So while the security landscape is clearly evolving (cf. also Big Sleep and Project Zero), the conclusion of "to harden a system we need to spend more tokens" sounds like yet more AI boosting from a different angle. It raises the question of why no other alternatives (like formal verification) are mentioned in the article or the AISI report.

I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.


I would be interested in which notable security researchers you can find to take the other side of this argument. I don't know anything about the "AI Security Institute", but they're saying something broadly mirrored by security researchers. From what I can see, the "debate" in the actual practitioner community is whether frontier models are merely as big a deal as fuzzing was, or something signficantly bigger. Fuzzing was a profound shift in vulnerability research.

(Fan of your writing, btw.)


It's less that I think they would take the other side of the argument, than that they would lend some credence to the content of the analysis. For example, I would not particularly trust a bunch of AI researchers to come up with a representative set of CTF tasks, which seems to be the basis of this analysis.


Yeah, you might be right about this particular analysis! The sense I have from talking to people at the labs is that they're really just picking deliberately diverse and high-profile targets to see what the models are capable of.


> but they're saying something broadly mirrored by security researchers.

You might well be right, it is not an area I know much of or work in. But I'm a fan of reliable sources for claims. It is far to easy to make general statements on the internet that appear authorative.


They are a UK government unit: "The AI Security Institute is a research organisation within the Department of Science, Innovation and Technology."

Unfortunately, they fit straight lines to graphs with y axis from 0 to 100% and x axis being time - which is not great. Should do logistic instead.


If true, that's naked, shameless and brutal capitalism.

Seems much like those secretly tobacco industry funded reports about tobacco being safe and such.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: