Hacker Newsnew | past | comments | ask | show | jobs | submit | borenstein's commentslogin

SUPER interesting! You just earned star #508. Going to take a look at this.


Thank u ! would love feedback and helping hand !


Ah, let me clarify: I'm only using this to help me code faster. There are zero agents in the runtime for the financial tool.

As a matter of fact, the tool is zero-knowledge by design: state is decrypted in your browser and encrypted again before it leaves. There are no account integrations. The persistence layer sees noise. There are a couple of stateless backend tools that transiently see anonymous data to perform numerical optimizations.

But that's a story for another Show HN...


Copy that, Zero-Knowledge is the gold standard, kudos. But this brings us back to the Supply Chain risk If the agent (writing the code) is in YOLO mode, the risk shifts from "runtime exploitation" to "build-time backdoor injection". Hypothetically an agent could "accidentally" weaken the RNG in your crypto layer or leak keys via JS console logs. So isolating the dev environment here protects the integrity of your ZK promise

Looking forward to the Show HN on the tool itself!


No shame in this! When you're using Claude code (or Cursor, or similar), you get these pop-ups rather frequently. "May I do XYZ web search?" "May I run this command?" "May I make this HTTP request?" This is for security, but it becomes the limiting step in your workflow if you're trying to use parallel agents.

These tools generally offer the ability to simply shut off these guardrails. When you do this, you're in what has come to be called "yolo mode."

I am arguing that, sandboxed correctly, this mode is actually safer than the standard one because it mitigates my own fatigue and frustration. These threats surface every hour of every day. Malicious actors are definitely a thing, but your own exhaustion is a far more present danger.


> How safe do you think this solution would be to let users execute untrusted code inside while being able to pip install and npm install all sorts of libraries

It's designed to be fairly safe in exactly that situation, because it's sandboxed twice over: once in a container and once in a VM. You start to layer on risk when you punch holes in it (adding domains to the whitelist, port-forwarding, etc).

> how do you deploy this inside AWS Lambda/Fargate for the same usecase These both seem like poor fits. I suspect Lambda is simply a non-starter. For Fargate, you'd be running k8s inside a VM inside a pod inside k8s. As an alternative, you could construct an AMI that runs the yolo-cage microk8s cluster without the VM, and then you could deploy it to EC2.


On that note, yolo-cage is pretty heavyweight. There are much lighter tools if your main concern is "don't nuke my laptop." yolo-box was trending on HN last week: https://news.ycombinator.com/item?id=46592344


Totally agreed, but that level of attack sophistication is not a routine threat for most projects. Making sense of any information so exfiltrated will generally require some ad-hoc effort. Most projects, especially new ones, simply aren't going to be that interesting. IMO if you're doing something visible and sensitive, you probably shouldn't be using autonomous agents at all.

("But David," you might object, "you said you were using this to build a financial analysis tool!" Quite so, but the tool is basically a fancy calculator with no account access, and the persistence layer is E2EE.)


I would worry less about external attack sophistication and more about your LLM getting annoyed by the restrictions and encrypting the password to bypass the sandbox to achieve a goal (like running on an EC2 instance). Because they are very capable of doing this.


An informative rejection message with the reason for the restriction usually addresses this well with recent models.


I don't actually think recent models are likely to violate intent like this, just that if they do want to, I don't think a plaintext check is a strong deterrent.


It sounds like you speak from experience


Thank you, nice catch. I will patch that today. And cutoff date is almost certainly why it happened.

It wasn't "vibe coded" in the sense that I was just describing what I want and letting the agent build it. But it definitely was built indirectly, and in an area that is not my primary focus. A charitable read is that I am borrowing epistemic fire from the gods; an uncharitable one is that I am simply playing with fire.

I am not apologetic about this approach, as I think it's the next step in a series of abstractions for software implementation. There was a time when I sometimes took some time to look at Java bytecode, but doing so today would feel silly.

Abstracting to what is in essence a non-deterministic compiler is going to bring with it a whole new set of engineering practices and disciplines. I would not recommend that anyone start with it, as it's a layer on top of SWE. I compare it to visual vs instrument flight rules.


Docker isn't virtualization; it's not that hard to infiltrate the underlying system if you really want to. But as for VMs--they are enough! They're also a lot of boilerplate to set up, manage, and interact with. yolo-cage is that boilerplate.


IMO, you should treat your agent's environment as pre-compromised. In that reading, your goal becomes security-in-depth.

Anthropic is trying to earn developer trust; they have a strong incentive to make sure that private keys and other details that the agent sees do not leak into the training data. But the agent itself is just a glorified autocomplete, and it can get confused and do stupid stuff. So I put it in a transparent prison that it can see out of but can't leave.

That definitely helps with the main failure modes I was worrying about, but it's just one layer. You definitely want to make sure that your production secrets are in an external vault (Hashicorp Vault, Google Secret Store, GitHub secrets, etc) that the agent can't access.

The things that agent is seeing should be dev secrets that maybe could be used as the start of a sophisticated exploit, but not the end of it. There's no such thing as perfect security, only very low probabilities of breach. Adding systems that are very annoying to breach and have little offer when you do greatly lowers the odds.


The credential have been a PITA. I was working on a PR this morning before work; I should have it tonight. You have to be careful because if you look like you're spoofing the client, you can get banned.

For Claude specifically, there are two places where it tracks state:

~/.claude.json -- contains a bunch of identity stuff and something about oauth

~/claude/ -- also contains something about oauth, plus conversation history, etc

If they're not _both_ present and well-formed, then it forces you back through the auth flow. On an ordinary desktop setup, that's transparent. But if you want to sandbox each thread, then sharing just the token requires a level of involvement that feels icky, even if the purpose is TOS-compliant.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: