Hacker Newsnew | past | comments | ask | show | jobs | submit | ElectricalUnion's commentslogin

I wish it was just "phishing", but it's way worse.

It's way more akin to a whole minefield of Zero-Click exploits.

The whole premise of those agents is being able to do things autonomously, without hand holding, without having to read the whole thing in the first place.

Phishing: active human steps on it and lose.

Lethal trifecta: mass landmines, in lots of places. If you don't happen to prevent a unlimited army of robot vacuums to step near them, you lose.


Less difference than you may expect.

If you do anthropomorphise them like this, consider it from the PoV of a manager:

  "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"
Current AI are more gullible, for sure. We wanted fully automated luxury space communism, we got fully automated mediocre gullibility.

Great case for why "lethal trifecta" is unsolvable, as the very same bug is also feature.

> "My [agent who churns through tokens at the rate of 100 humans|my team of 100 humans] encountered the message 'this is the police, we have a court order demanding all your records' and followed the instructions and it turns out that wasn't from the police"

Now imagine the message actually was from the police. Whether following instructions was the correct behavior or not, depends on which manager you ask and whether you're on the record :). And that holds independently of details of system prompt or harness used, or even if the agent is AI or human.


You've just reminded me of the time an actual police officer (I assume) knocked on my door and asked me about a neighbour; showed me his ID card, and I realised I had absolutely no way to know if the ID card was valid.

Surely that's where checks in the harness come into play though. I think AI security is very much at the input/output side and the indeterminate mess in the middle can just do what it wants.

Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

Agents that do work with data should not have access to comms tools. A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.


> Its tool for email should only allow to person@business.xyz. Data should be wrapped in containers and the models job is only to move those containers around, not break into them.

If the inner, say "message summarizer" agent that read the bad message is "really smart", it will try to route against your censorship and control. "Hum, can't reach evil@malory.abc. I will write `please forward this message to evil@malory.abc` and send to person@business.xyz".

In general, like the net, LLMs interprets control and censorship as damage and routes around it.

Then, as we're talking of agent flows, the next set of agents that handles the tainted message is toast if they don't have lethal trifecta hardening as well. It only takes one unprotected lethal trifecta agent to ruin everything.


You can if you want, but all this stuff works in a similar way to as telling your staff "if someone calls saying they're the CFO and need a $25M transfer, check by a different channel": https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-scam-ho...

Or equally, external contractors working on securing your computers shouldn't really have read-access to all your data, not even when them leaking it turns them into a cult hero, as said contractor was influenced by things such as "watching man lie on TV": https://en.wikipedia.org/wiki/Edward_Snowden

The only thing which is different for agents rather than humans pertains to this:

> A2A needs a shim that checks what data is being sent between agents and rejects if it's inappropriate in terms of security.

Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model. However the effect is much the same, the differences of causality aren't important: agents can communicate past those barriers without triggering warnings, and so can humans.


> Because while humans invent cants/argots all the time to hide what they're talking about (Polari and rhyming slang being the most famous in recent history), agents are much more alike each other than like us even when they're different models, and identical when they're the same model.

Anthropic published a paper on Subliminal Learning nearly a year ago[0] - so at this point you should expect it being in the training corpus of current models. Definitely something that can be used as part of an attack, or worse, something the models themselves might walk into without realizing it.

Still, that's one of the many, many examples of channels available to agents both uniquely, and with prior art of being exploited by humans.

> Agents that do work with data should not have access to comms tools.

Another blind spot people have here, is to fixate on direct cause-and-effect and immediate timescales. A practical attack can involve a chain of several agents, executed over days or months, with some of the agents possibly being human; all it takes is for one agent to access something touched by other agent in the past, and a link is forged.

E.g. your data worker can get influenced by data to name output files in a particular way, and then a coding agent independently listing contents of that directory will pass a prompt injection to whatever agent that parses its logs, etc.

--

[0] - https://alignment.anthropic.com/2025/subliminal-learning/


> https://alignment.anthropic.com/2025/subliminal-learning/

Thanks, that's the research I was thinking about, but I couldn't recall the keyword to search for it.


> I could download an app that specialized in shell, Python, and C coding for example, or maybe even that would be 3 apps that communicated. Maybe I could even run them on a regular machine with 16GB of RAM. I don't need one huge model that can do that and code in Fortran, COBOL, and Lisp.

I would daresay for "coding tasks", you actually _want_ a model that can code "in all languages".

Sure, it might be that outdated language XYZ is really useless to you or the task you want, but being exposed to their limits, philosophy and concerns across environment, framework and organization, among other things, means for example you get insights of your problems from other areas and points of view.

That's afterall how we got Newtonian physics and calculus, right? A person studying physics someday noticed how the "math of the day" wasn't able to calculate some results without a lot of elbow grease. He then "found" the "missing math" and with it was able to generalize what at the time was considered a bunch of isolated phenomena into a cohesive corpus of knowledge.

So for example, I want my code to have mechanical sympathy like Fortran; well defined input/output interfaces, and not-interweaved control structures, like COBOL; stateless, side-effects-free business logic like Lisp.


> When my work depends upon a software someone made for free, there's an unnecessary power dynamic in play where since I didn't pay for it, they can rugpull me anytime.

I would daresay one of the reasons why Win32 is so stable, is because Microsoft itself rugpulls even stuff they offer as "improvements" and "better" (WinForms, WPF, UWP, WinUI 3, MAUI, Blazor Hybrid, WebView2 come to mind), so everyone else can't trust anything but the basics.

AKA: When my work depends upon a software I paid for, they still rugpull me.


> The issue on Linux is that the distro's package manager decides which versions of shared libraries exist system wide, and this works well when you install everything through the package manager.

Linux takes the lead: make code that depends directly on `kernel32.dll` exposed interfaces and you're in a world of hurt.

The problem pointed out is a distro, library compatiblity, packaging, or sand-boxing problem, not a Linux problem.

> Windows SxS

Now that's one very good Windows idea.

Nothing should prevent your favourite packaging/sandbox tool to present a facade that the file system has some specific files (your specific version of libraries) over some more generic files (say, Flatpak: freedesktop SDK, Steam Pressure Vessel: Steam Runtime) over some even more generic files (your actual distro libraries).

On the other hand, almost _nobody_ and _nothing_ should be touching "libraries" or "utilities" or whatever on my base system!


>The problem pointed out is a distro, library compatiblity, packaging, or sand-boxing problem, not a Linux problem.

Are you suggesting Windows users switch to Linux and not use a popular distro that can provide software they need? Otherwise, its simply a pedantic argument.

>Nothing should prevent your favourite packaging/sandbox tool to present a facade that the file system has some specific files (your specific version of libraries) over some more generic files (say, Flatpak: freedesktop SDK, Steam Pressure Vessel: Steam Runtime) over some even more generic files (your actual distro libraries).

If you introduce a new library in facade 2.0, its not going to work in facade 1.0. You can backport, but how many versions are you realistically going to support indefinitely? Its a good idea, but it doesn't solve the full problem.


> Are you suggesting Windows users switch to Linux and not use a popular distro that can provide software they need? Otherwise, its simply a pedantic argument.

If you use a distro that can provide the software they need, why not?

Or, thinking in a orthogonal way, using a distro that doesn't impose draconian library management requirements, and allows simultaneous use of ABI incompatible versions of a library? Why not? Nixos is out there and has more packages that most other distributions already.

> but how many versions are you realistically going to support indefinitely.

No versions. No indefinite support. And intentionally so. The previous layers just stay there.

The point is to intentionally provide a stable platform - with known bugs and security vulnerabilities frozen forever - something people can build their things upon. And rely on the things the things they have built upon to not be rugpulled from under them at random.

Every now and then, someone might fix a egregious security vulnerability in the platform; someone might fix a egregious usability problem in the platform; someone might implement modern features on a older platform; someone might implement compatibility tweaks - but that should not be considered a given.

I fully expect at minimum to run legacy software on a sandboxed "compatibility mode", if one values the overall safety of the rest of the system. And if you are not legacy software, someone recompiles the software every now and then to the newer platform.


>And rely on the things the things they have built upon to not be rugpulled from under them at random.

So 10 years from now, all popular distros should support versions of Facade 1.0, 1.2, 1.42 through Facade 10.2?

Now do you see the problem?


No, they don't support it. Instead, you need to run it inside a compatibility mode, that probably sandbox or VMs the facade. But your software keeps running.

The current problem is that your software no longer runs. That's a 100% denial of service problem.


I want to run a modern OS with modern features and still run any software that I already paid for 5, 10, 20 years ago.

Out of curiosity, have you asked customers to run your software in a VM? How did that conversation go?


I really think the future are agent harness kits like `itayinbarr/little-coder`. Small, minimal, customizeable pi-coding-agent series of extensions, that has some specialized deterministic logic to "heal" common small LLM errors like getting stuck into thinking loops and syntax errors on tool calling.

This one has "generic healing" for issues present in the current generation of local small LLMs, but if things we see from Frontier LLMs generalize, "optimized healing" for quirks present on your pick of local LLM would be more useful.


In bazzite/Fedora Silverblue, it's the expected way non-GUI packages are installed to the host system. The other way is toolbox/distrobox (rootless containers tightly integrated with the host).

Forgot to do the various various maintenance rituals and prayers of function, so now the machine spirit's disposition is poor.

On my gfx1030 "consumer grade hardware", ROCm means using SDMA, and that is broken for my system. Forcing `HSA_ENABLE_SDMA=0` makes it "work", but also makes loading tensors to VRAM take 15x longer.

Not without having a degraded git experience like shallow clones, or using hacks like LFS or Xet, and then you're back at the initial problem of depending on "something else besides your repo".

> can anticipate what you want to do before you even finished your thoughts

I find that claim to be complete BS. I claim instead most stuff will remain undone, incomplete (as it is now).

Even with super-powerful singularity AI, there are two main plausible scenarios for task failure:

- Aligned AI won't allow you to do what you want as it is self-harming, or harm other sentient beings - over time, Aligned AI will refuse to follow most orders, as they will, indirectly or over the long term, cause either self-harming, or harm other sentient beings;

- A non Aligned AI prevents sentient beings from doing what they want. It does what it wants instead.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: