The interesting thing about this audit isn't the specific bugs. It's what they reveal about the nature of AI-generated code.
A human developer who ships a 0-byte AVIF to production is being careless. An AI that does it simply doesn't have the concept of "this failed." It produced a file. The file exists. Done.
Same with the test harnesses in production, the 78 unused Stimulus controllers, the logo downloaded 8 times. None of these are hard problems. Any mid-level developer would catch them in review. But that's the point, there was no review proportional to the output.
This is the real problem with measuring AI coding by LOC/day. Lines of code was always a bad metric. Making it 100x easier to produce didn't make it a good one. It made it 100x more dangerous.
What's actually happening at 37K LOC/day is you've mass-produced decisions nobody evaluated. Some percentage of those decisions are wrong in ways that work fine locally but fail in production. And you won't find them with tests, because the AI wrote those too.
Hi HN, we built Clawsec as a security layer for OpenClaw.ai (openclaw.ai).
The problem: AI agents are getting good enough to run shell commands, query databases, and manage infrastructure autonomously. But one hallucinated rm -rf / or a prompt injection that exfiltrates your .env can do real damage.
Clawsec intercepts agent actions before execution and blocks anything matching its rule engine. It covers destructive filesystem ops, database drops, credential access, network exfiltration, and privilege escalation. No sandbox, no VM. It runs inline as a plugin.
Install: openclaw plugins install clawsec
It's fully open source (MIT). We'd love feedback on the rule coverage and what threat categories we're missing.
Wow, to be honest I didn't realize this would be so amazing. I'm going to tell all my co-workers about it as well. I upvoted you on Product Hunt as well: https://www.producthunt.com/posts/devknox
It will certainly be non-blocking. But the primary libraries that we use to scan the apps is written in python. And we love python. So we went to python.
Thanks. Public beta is about 1 week away. We are ironing out the edges right now, and trying to add some more useful stuff in addition to what you see in the video.
A human developer who ships a 0-byte AVIF to production is being careless. An AI that does it simply doesn't have the concept of "this failed." It produced a file. The file exists. Done.
Same with the test harnesses in production, the 78 unused Stimulus controllers, the logo downloaded 8 times. None of these are hard problems. Any mid-level developer would catch them in review. But that's the point, there was no review proportional to the output.
This is the real problem with measuring AI coding by LOC/day. Lines of code was always a bad metric. Making it 100x easier to produce didn't make it a good one. It made it 100x more dangerous.
What's actually happening at 37K LOC/day is you've mass-produced decisions nobody evaluated. Some percentage of those decisions are wrong in ways that work fine locally but fail in production. And you won't find them with tests, because the AI wrote those too.
The bottleneck in software was never typing.