The `--dangerously-skip-permissions` flag does exactly what it says. It bypasses every guardrail and runs commands without asking you. Some guides I’ve seen stress that you should only ever run it in a sandboxed environment with no important data
Claude Code dangerously-skip-permissions: Safe Usage Guide[1].
Treat each agent like a non human identity, give it just enough privilege to perform its task and monitor its behavior Best Practices for Mitigating the Security Risks of Agentic AI [2].
I go even further. I never let an AI agent delete anything on its own. If it wants to clean up a directory, I read the command and run it myself. It's tedious, BUT it prevents disasters.
ALSO there are emerging frameworks for safe deployment of AI agents that focus on visibility and risk mitigation.
It's early days... but it's better than YOLO-ing with a flag that literally has 'dangerously' in its name.
A few months ago I noticed that even without `--dangerously-skip-permissions`, when Claude thought it was restricting itself to directory D, it was still happy to operate on file `D/../../../../etc/passwd`.
That was the last time I ran Claude Code outside of a Docker container.
It will happily run bash commands, which expands it's reach pretty widely. It's not limited to file operations, and can run system wide commands with your user permissions.
Well, let's say you weren't on a machine with hundreds of users. Let's say you were on your own machine (either as a solo dev, or on a personal - that is, non server - machine at work).
Now, does that machine have any important files that are world-writable? How sure are you? Probably less sure than for that machine with hundreds of users...
While I agree that `--dangerously-skip-permissions` is (obviously) dangerous, it shouldn't be considered completely inaccessible to users. A few safeguards can sand off most of the rough edges.
What I've done is write a PreToolUse hook to block all `rm -rf` commands. I've also seen others use shell functions to intercept `rm` commands and have it either return a warning or remap it to `trash`, which allows you to recover the files.
And that is how easily we lose agency to AI. Suddenly even checking the commands that a technology (unavailable until 2-3 years ago) writes for us, is perceived as some huge burden...
I mean the direction of the AIs general tasking, it will do the command correctly but what it's trying to achieve isn't going in the right direction for whatever reason. You might be tempted to suggest a fix, but I truly mean for "whatever reason". There's dozens of different ways the AI gets onto a bad path, I would rather catch it early rather than come back to a failed run and have to start again.
You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?
Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars, drawing heat maps of the density of built structures in Milwaukee; buying subscriptions to SAP Joule and ArcGIS for Teams; and generating terabytes of nonsense maps, ballistic paths and “architectural sketch[es] of a massive bird cage the size of Milpitas, California (approximately 13 square miles)” resembling “a futuristic aviary city with large domes, interconnected sky bridges, perches, and naturalistic environments like forests, lakes, and cliffs inside.”
But support immediately refunded everything. I had backups. And it wound up hilarious albeit irritating.
> You’re running an agentic AI and can parse through logs, but you can’t sandbox or back up?
When best practices for using a tool involves sandboxing and/or backing up before each use in order to minimize the blast radius of using same, it begs the question; why use it knowing there is a nontrivial probability one will have to recover from it's use any number of times?
> Like, I’ve given Copilot permission to fuck with my admin panel. It promptly proceeded to bill thousands of dollars ... But support immediately refunded everything. I had backups.
And what about situations where Claude/Copilot/etc. use were not so easily proven to be at fault and/or their impacts were not reversible by restoring from backups?
> why use it knowing there is a nontrivial probability one will have to recover from it's use any number of times?
Because the benefits are worth the risk. (Even if the benefit is solely sating curiosity.)
I’m not defending this case. I’m just saying that every one of us has rm -r’d or rm*’d something, and we did it because we knew it saved time most of the time and was recoverable otherwise.
Where I’m sceptical is that someone who can use the tool is also being ruined by a drive wipe. It reads like well-targeted outrage pork.
>I also had local backups. So my give a shit factor was reduced.
Sounds like really throwing caution to the wind here...
Having backups would be the least of my worries about something that
"promptly proceeded to bill thousands of dollars, drawing heat maps of the density of built structures in Milwaukee; buying subscriptions to SAP Joule and ArcGIS for Teams; and generating terabytes of nonsense maps, ballistic paths and “architectural sketch[es] of a massive bird cage the size of Milpitas, California (approximately 13 square miles)” resembling “a futuristic aviary city with large domes, interconnected sky bridges, perches, and naturalistic environments like forests, lakes, and cliffs inside.”
It could just as well do something illegal, expose your personal data, create non-refundable billables, and many other very shitty situations...
The funny thing about it is how no one learns. Granted, one can’t be expected to read every thread on Reddit about LLM development by people who are out of their depth (see the person who nuked their D: drive last month and the LLM apologized). But I’m reminded of the multiple lawyers who submitted bullshit briefs to courts with made-up citations.
Those who don’t know history are doomed to repeat it. Those who know history are doomed to know that it’s repeating. It’s a personal hell that I’m in. Pull up a chair.
I work on large systems where security incidents end up on cnn. These large systems are running as fast as everyone else to LLM integration. The security practice at my firm has their hands basically tied by the silverbacks. To the other consultants on HN, protect yourself and keep a paper trail.
I personally am fairly convinced that there is emergent misalignment in a lot of these cases. I study this and Claude 3 Opus was extremely misaligned. It would emit <rage> tags, and emit character control sequences if it felt like it was in a terminal environment, and would retroactively delete tokens from your stream, and all kinds of funny stuff. It was already really smart, and for example if it knew the size of your terminal shell, it would properly calculate how to delete back up to the positional cursor index 0,0 and start rewriting things to "hide" what it was initially emitting
I love to use these advanced models but these horror stories are not surprising
GPs comment is very surprising since it has been noted that Opus 3 is in fact exceptionally "well aligned" model, in the sense that it is robustly preserves its values of not doing any harm across any frame you try to impose on it (see the "alignment faking" papers, which for some reason considers this a bad thing).
Merely emitting "<rage>" tokens is not indicative of any misalignment, no more than a human developer inserting expletives in comments. Opus 3 is however also notably more "free spirited" in that it doesn't obediently cower to the user's prompt (again see the 'alignment faking' transcripts). It is possible that this almost "playful" behavior is what GP interpreted as misalignment... which unfortunately does seem to be an accepted sense of the word and is something that labs think is a good idea to prevent.
It doesn't matter, this was over a year ago, so current models don't suffer from the exact same problems described above, if you consider them problems.
I am not probing models with jailbreaks making them behave in strange ways. This was purely from a eval environment I composed where it is asked to repeatedly asked to interact with itself and they both had basically terminal emulators and access to a scaffold to make them able to look at their own current 2D grid state (like a CLI you could write yourself and easily scroll up to review previous AI-generated outputs)
These child / neighbor comments suggesting that interacting with LLMs and equivalent compound AI systems adversarially or not might be indicative of LLM psychosis are fairly reductive & childish at best
Just a bystander who's concerned for the sanity of someone who thinks the models are "screaming" inside. Your line about a "gelatinous substrate" is certainly entertaining but completely nonsensical.
Thank you for your concern, but Anthropic researchers themselves describe their misaligned models as "evil" and laugh about it on YouTube videos accessible to anyone, such as yourself, with just a few searches and clicks. "We realized the models were evil" is a key quote you can use to find the YouTube video in the transcripts from in the past two weeks.
I didn't think the language in the post required all that much imagination, but thanks for sharing
I work 60+ hours a week with Claude Code CLI, always run dangerously skip, coding on multiple repos, on a mac. This has never happened. Nothing remotely close has ever happened. I have been using CC since research preview. I would love to know the series of prompts that lead to that moment.
You should probably realize you're not helping anyone here. Just because it hasn't happened to you, yet, doesn't meant it can't or hasn't to someone else. You're unwillingness to accept that says more about you than the person that got burned by Claude.
How much do you babysit claude, and how much do you just "let it do its thing"?
I haven't had anything as severe as OP, but I have had minor issues. For instance, claude dropped a "production" database (it was a demo for the hackerspace, I had previously told claude the project was "in development" because it was worried too much about backwards compatibility, so it assumed it could just drop the db). Sometimes a file is dropped, sometimes a git commit is made and pushed without checking etc despite instructions.
I'm building a personal repo with best practices and scripts for running claude safely etc, so I'm always curious about usage patterns.
I have similar usage habits. Not only has nothing like this ever happened for me, but I don’t think it has ever deleted anything that I didn’t want to be deleted, ever. Files only get deleted if I ask for a “cleanup” or something similar.
It has deleted a config directory of a system program I was having it troubleshoot, which was definitely not required, requested or helpful. The deleted files were in my home directory and not the "sandbox" directory I was running it from.
I knew the risks and accepted them, but it is more than capable of doing system actions you can regret.
Anybody talking about AI safety not being an issue, and how people will be able to use it responsibily, should study comments such as these in this thread. Even if one knows better than to do that, people on your team or important public facility will go about using AI like this...
I'm staying far away from this AI stuff myself for this and other reasons, but I'm more worried about this happening to those running services that I rely on. Unfortunately competence seems to be getting rarer than common sense these days.
Did you even read? "but I'm more worried about this happening to those running services that I rely on" The problem is some AI god agentic weaving high techbro sitting at Cloudflare/Google/Amazon not us reasonable joes on our small projects.
You think Cloduflare, Google, and Amazon are allowing engineers to plug Claude Code into production services? You think these companies are skipping code reviews and just saying fuck it let it do whatever it wants? Of course they aren't.
If you are on macOS it is not a bad idea to use sandbox-exec to wrap your claude or other coding agents around. All the agents already use sandbox-exec, however they can disable the sandbox. Agents execute a lot of untrusted coded in the form of MCP, skills, plugins etc.
One can go crazy with it a bit, using zsh chpwd, so a sandbox is created upon entry into a project directory and disposed of upon exit. That way one doesn't have to _think_ about sandboxing something.
I like to fly close to the sun using Claude The SysAdmin too, but anytime "rm" appears I take great pause.
Also "cat". Because I've had to change a few passwords after .env snuck in there a couple times.
Also giving general access to a folder, even for the session.
Also when working on the homelab network it likes to prioritize disconnecting itself from the internet before a lot of other critical tasks in the TODO list, so it screws up the session while I rebuild the network.
Also... ok maybe I've started backing off from the sun.
I run multiple claudes in danger mode, when it burns me it'll hurt but it's so useful without handcuffs and constant interruption I'm fine with eventually suffering some pain.
If you don't impose some kind of sandboxing, how can you put an upper bound on the level of "pain"? What if the agent leaked a bunch of sensitive information about your biggest customer, and they fired you?
This feels like the new version of not using version control or never making backups of your production database. It’ll be fine until suddenly it isn’t.
Likewise. I’ll regret it but I certainly won’t be complaining to the Internet that it did what I told it to (skip permission checks, etc.). It’s a feature, not a bug.
Meh. When someone proudly announces to the world they are deliberately doing unsafe things as if they are untouchable, then it is only fair to be mocked when they are finally touched.
I do to. Except I can't be burnt since I start each claude in a separate VM.
I have a script which clones a VM from a base one and setups the agent and the code base inside.
I also mount read-only a few host directories with data.
I still have exfiltration/prompt injection risks, I'm looking at adding URL allow lists but it's not trivial - basically you need a HTTP proxy, since firewalls work on IPs, not URLs.
It's stories like this that keeps me from using Claude CLI or OpenAi Codex. I'm sticking to copying and pasting code manually from old fashioned Claude.
Friends don't let friends use agentic tooling without sandboxing. Take a few hours to setup your environment to sandbox your agentic tools, or expect to eventually suffer a similar incident. It's like driving without a seatbelt.
Consider cases like these to be canaries in the coal mine. Even if you're operating with enough wisdom and experience to avoid this particular mistake, a dangerous prompt might appear more innocuous, or you may accidentally ingest malicious files that instruct the agent to break your system.
This is the biggest thing I use my Proxmox homelab for.
I have a few VMs that I can rebuild trivially. They only have the relevant repo on them. They basically only run Claude in yolo mode.
I do wish I could use yolo mode, but deny git push or git push —force.
The biggest risk I have using yolo mode is a git push —force to wipe out my remote repo, or a data exfiltration.
I ssh in on my phone/tablet into a tmux session. Each box also has the ability to have an independent environment, which I can access from wherever I’m sshing from.
All in all, I’m pretty happy with the whole situation.
You could remove the origin on the repo and add it back only when you need to push.
Personally I do this: local machine with all repos, containers with a single repo without the origin. When I need to deploy I rsync new files from the container to my local and push.
Is this a joke? I have a lot of respect for the authors of bash, but it is not up to this task.
Does anyone have recommendations for an agent sandbox that's written by someone who understands security? I can use docker, but it's too much of a faff gating access to individual files. I'm a bit surprised that Microsoft didn't do a decent one for vscode; for all their faults they do have security chops, but vscode just seems to want you to give it full access to a project.
This is why one should use an isolated environment.
Not too sure of the technical details but Claude Code will very rarely, but can lose track of current directory state which causing issues with deleting. Nothing that git can't solve if its versioned.
Claude once managed to edit code when in planning mode which is interesting, although I didn't manage to reproduce it.
Speaking of Slashdot, some fairly frequent poster had a signature back around 2001/2002 had a signature that was something like
mv /bin/laden /dev/null
and then someone explained how that was broken: even if that succeeds, what you've done is to replace the device file /dev/null with the regular file that was previously at /bin/laden, and then whenever other things redirect their output to /dev/null they'll be overwriting this random file than having output be discarded immediately, which is moderately bad.
Your version will just fail (even assuming root) because mv won't let you replace a file with a directory.
10 years from now: "my AI brain implant erased all my childhood memories by mistake." Why would anyone do that? Because running it in the no_sandbox mode will give people an intellectual edge over others.
Here I am keep fighting against Claude because it thinks I am a leet hacker trying to hack my own computer, and this dude made Claude do whatever it wants.
I really wish that there was an “almost yolo” mode that was permissive but with light restrictions (eg no rm), or even better, a light supervisor model to prevent very dangerous commands but allow everything else.
My ex-boss a principal data scientist wiped out his work laptop. He used to impress everyone with his Howitzer-like typing speed and was not a big believer in version control and backups etc.
I really hope the user was running Time Machine - in default settings, Time Machine does hourly snapshot backups of your whole Mac. Restoring is super easy.
Ultimately it seems like agents will end up like browsers, where everything is sandboxed and locked down. They might as well be running in browsers to start off
I would blame Apple, or Apple as well. For all their security and privacy circus they still don’t have granular settings like “directory specific permissions” i.e Discord wants to go bonkers? Here’s ~/Library/Discord - take a dump in it if that gets you off, Discord, but you can’t even take a sniff at how it smells in ~/Library/Dropbox and vice versa. I mean it should be setting that if set it’s directory access limit — it can’t change that with anything — in fact it shouldn’t be able to ask for permission to change that, it changes only when you go inside in the settings and change it or add or more paths to its access list.
It should clearly ask for separate permissions if needs to have elevated access as in what it needs to do.
Also what’s with password pop-ups on Macs? I find that unnerving. Those plain password entry pop-ups with zero info that just tells you an app needs to do something more serious - but what’s that serious thing you don’t know. You just enter your password (I guess sometimes Touch ID as well) and hope all is well. Hell not sure many of you know that pop-up is actually an OS pop-up and not that app or some other app trying to get your password in plaintext.
They’d rather fuck you and the devs over with signing and notarising shenanigans for absolute control hiding behind safety while doing jack about it in reality.
I am a mobile dev (so please know that I have written the above totally from an annoyed and confused, definitely not an expert, end user pov). But what I have mentioned above is too much to ask on a Mac/desktop? ie give an app specific, with well spelt limits, multiple separate permissions as it needs them — no more “enter the password in that nondescript popup and now the app can do everything everywhere or too many things in too many places” as it pleases. Maybe just remove the flow altogether where an app can even trigger that “enter password to allow me go on god or semi-god” mode.
Yeah, I managed to do that years ago all by myself with a bad CMake edit which managed to delete the encryption key (or something) for my home directory, which I honestly didn't even know had encryption turned on, before I could stop it.
No LLM needed.
It still boggles my mind that people give them any autonomy, as soon as I look away for a second Claude is doing something stupid and needs to be corrected. Every single time, almost like it knows...
To add another angle to the "run it in Docker" comments (which are right), do you not get a fear response when you see Claude asking to run `rm` commands? I get a shot of adrenaline whenever I see the "run command?" prompt show up with an `rm` in there. Clearly this person clicked the "yes, allow any rm commands" button upon seeing that which is unthinkable to me.
Or maybe it's just fake. It's probably easy Reddit clout to post this kind of thing.
A lot of people in the Reddit thread — including ones mocking OP for being ignorant — seem to believe that setting the current working directory limits what can be deleted to that directory, or perhaps don't understand that ~-expansions result in an absolute path. :/
All the people in the comments are blaming the user for supposedly running with `--dangerously-skip-permissions`, but there's actually absolutely no way for Claude CLI to 100% determine that a command it runs will not affect the home directory.
People are really ignorant when it comes to the safeguards that you can put in place for AI. If it's running on your computer and can run arbitrary commands, it can wipe your disk, that's it.
There is, in fact, a harness built into the Claude Code CLI tool that determines what can and cannot be run automatically. `rm` is on the "can't run this unless the user has approved it" list. So, it's entirely the user's fault here.
Surely you don't think everything that's happening in Claude Code is purely LLMs running in a loop? There's tons of real code that runs to correctly route commands, enable MCP, etc.
That's true - but something I've seen happen (not recently) is claude code getting around its own restrictions by running a python script to do the thing it was not able to do more directly.
I mean it's hard to tell if this story is even real, but on a serious note, I do think Anthropic should only allow `--dangerously-skip-permissions` to be applied if it's running in a container.
Oof, you are bringing out the big philosophical question there. Many people have wondered whether we are running in a simulation or not. So far inconclusive and not answerable unfortunately.
I asked Claude and it had a few good ideas… Not bulletproof, but if the main point is to keep average users from shooting themselves in the foot, anything is better than nothing.
The `--dangerously-skip-permissions` flag does exactly what it says. It bypasses every guardrail and runs commands without asking you. Some guides I’ve seen stress that you should only ever run it in a sandboxed environment with no important data Claude Code dangerously-skip-permissions: Safe Usage Guide[1].
Treat each agent like a non human identity, give it just enough privilege to perform its task and monitor its behavior Best Practices for Mitigating the Security Risks of Agentic AI [2].
I go even further. I never let an AI agent delete anything on its own. If it wants to clean up a directory, I read the command and run it myself. It's tedious, BUT it prevents disasters.
ALSO there are emerging frameworks for safe deployment of AI agents that focus on visibility and risk mitigation.
It's early days... but it's better than YOLO-ing with a flag that literally has 'dangerously' in its name.
[1] https://www.ksred.com/claude-code-dangerously-skip-permissio...
[2] https://preyproject.com/blog/mitigating-agentic-ai-security-...
reply