I "fixed" this for myself with tweakcc which let's you patch the system prompts. I changed the malware part to just be "watch out for malware" and it's stopped being unaligned.
They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.
I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.
Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.
I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.
Using tweakcc I can see the system prompt is supposed to mean “if it’s malware, refuse to improve or augment the code”. But due to all the malware noise it’s confusing the instruction as “don’t improve or augment after reading”.
I thought this was integral to LLM context design. LLMs can’t prompt their way to controls like this. Surprised they took such a hard headed approach to try and manage cybersecurity risks.
Sometimes it feels like as developers we live in a a bubble. Don't most jobs endanger human development? I can't help but think about all the billions of factory, food service, assembly line type jobs. Do these not threaten "human development"? My cynical take would be all AI endangers is "white collar" work.
I think you're not wrong and I also think the author is not wrong -- and this just may be how technology/civilization/humans are going to change inevitably?
For example a possibly trajectory might be that many years in the future because human thinking has degraded due to AI-assisted cognition, most people will get a chip implant and AI-assistance becomes integrated with the brain. Basically same pattern as most everything else -- technology augments solve the new reality. I'm not saying this will happen, but just a possible outcome of this.
I just discovered Pi Coding Agent and found that it's lean System Prompt + a tuned CLAUDE.md brought back a lot of the intelligence that Opus seemed to lose over the last month.
Sucks to be pushed back to Claude Code with opaque system behavior and inconsistency. I bet many would rather pay more for stability than less for gambling on the model intelligence.
We use Pi at work (where we pay per token) and I’d love to use it personally too. From what I’ve read, nobody has been banned for using Pi yet… I wonder if Anthropic minds this much as long as it’s still human usage, or if they’re mostly focused on stamping out the autonomous harnesses. Unfortunately Pi is also what OpenClaw uses so it could easily get swept up in the enforcement attention.
Or maybe I’ll just get a Codex subscription instead. OpenAI has semi-officially blessed usage of third party harnesses, right?
It appears that OpenAI has blessed third party harnesses. I know they officially support OpenCode and they have this on their developer portal:
"Developers should code in the tools they prefer, whether that's Codex, OpenCode, Cline, pi, OpenClaw, or something else, and this program supports that work."
Obviously, the context is that OpenAI is telling open source developers who are using free subscriptions/tokens from the Codex for Open Source program that they can use any harness they want. But it would be strange for that to not extend to paying subscribers.
They have, but they also just announced this week that for business and enterprise plans, they’re switching from quotas for codex to token use based pricing, and I would expect that to eventually propagate to all their plans for all the same reasons.
I’d be surprised if that propagated to personal subscription plans, simply because it would put them at a huge competitive disadvantage against Anthropic, which they’ve already signaled they care about by saying they allow third-party harnesses. But I wouldn’t be surprised if they required third-party harnesses to use per-token billing, since that’d put them on par with Anthropic.
This is one of the things that GitHub Spec Kit solves for me. The specify.plan step launches code exploration agents and builds itself the latest data model, migrations, etc etc. Really reduces the need to document stuff when the agent self discovers codebase needs.
Give Claude sqlite/supabase MCP, GitHub CLI, Linear CLI, Chrome or launch.json and it can really autonomously solve this.
Who else struggles with both sides of this? My engineer side values curiosity, brain power, and artistanship. My capitalist side says it's always the product not the process. My formula is something like this: product = money, process = happiness, money != happiness, no money = unhappiness.
I think the optimal solution is min/maxing this thing. Find the AI process that minimizes unhappiness, and maximizes money.
> My capitalist side says it's always the product not the process.
Your capitalist side needs to read some Deming. "Your system is perfectly tuned to produce the results that you are getting." Obviously, then, if you want better results, you need to improve your system.
Also "the product" is ambiguous. Is it the overall product, like how the product sits in the market, how the user interacts with it to achieve their goals, the manufacturability of the product, etc.? That is Steve Jobs sort of focus on the product, and it is really more of a system (how does the product relate to its user, environment, etc). However, AI doesn't produce that product, nor does any individual engineer. If "the product" means "the result of a task", you don't want to optimize that. That's how you get Microsoft and enterprise products. Nothing works well together, and using it is like cutting a steak with a spoon, but it has a truckload of features.
I definitely struggle with both sides, or maybe multiple sides. On the one hand most of my daily output at my job is coming from AI these days. On the other hand I find the explosion of AI-generated "writing" (and other forms of art) to be aesthetically abhorrent. And I've just recently started a ... weird sort of metaphysics / spirituality / but also AI related writing project, so the difference between creation with and without AI is in really sharp focus for me right now.
I wrote an article about this, but honestly I don't think I really captured the totality of my feelings. I really haven't decided where I land. I'm definitely using the tools for economic purposes, and I even have some "pure-fun" side project stuff where I'm getting value from it.
I did some napkin math the other day, and my kids at half my size prob hit the ground with 1/2 the stress that I do. Certainly could take more risks falling with a 50% reduction in harm. The extra rotational energy from 70" vs 40" will do it.
I'm starting to think for software it's produce 2,000 loaves per month. I'm realizing now software was supply-constrained and organizations had to be very strategic about what apps/UIs to build. Now everything and anything can be an app and so we can build more targeted frontends for all kinds of business units that would've been overlooked before.
They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.
reply