The framing assumes the agent can reliably represent its principal, and I'm not convinced that holds even if you get everything else right.
The problem is that the agent itself is the attack surface. An adversary who controls the communication channel can manipulate what the agent believes about who it's talking to, which means anything it holds, its list of authorized actions, a shared secret you gave it, whatever, can be exfiltrated in ways the agent can't detect because the manipulation happens below the layer where it can reason about trust.
Open harnesses and open standards help but they don't close this gap, because the thing you need to trust, the agent's own judgment about its principal, is exactly what gets compromised. The trust chain has to go below software entirely: hardware attestation, signed commands with keys the agent can verify but never access. That's really an OS problem dressed up as an agent architecture problem.
Design indeed becomes the bottleneck, I think that this points to a step that is implied but still worth naming explicitly -> design isn't just planning upfront. It is a loop where you see output, see if it is directionally right, and refine.
While the agents can generate, they can't exercise that judgement, they can't see nuances and they can't really walk their actions back in a "that's not quite what I meant" sense.
Exercising judgement is where design actually happens, it is iterative, in response to something concrete. The bottleneck isn't just thinking ahead, it's the judgment call when you see the result, its the walking back, as well as thinking forward.
If you have a few minutes I invite you to check what we're doing over at Open Horizon Labs, its exactly the type of thinking we have around the current state of the world. Apologies I feel like I'm stalking you in the comments, but what you're saying absolutely resonates with what I've been thinking, and what I've been trying to build, and its refreshing to finally feel that I'm not insane.
https://github.com/open-horizon-labs/superego is probably the most useful tool we have, but I'm hoping that we can package it and bring it to the people, as it does make all these LLMs orders of magnitude more useful
No apologies needed—I'm just glad to find I'm not the only 'insane' person here. It's easy to feel that way when obsessing over these problems, so knowing my ideas resonate with what you're building at superego is a huge relief.
I’m diving into your repo now. Please keep me posted on your progress or any new thoughts—I'd love to hear them.
As for "proving it statistically"—you're looking for utility, but I'm defining legitimacy. A constitution isn't a tool designed to statistically improve a metric; it is a framework to ensure that the system remains aligned with human agency. I am not building an LLM optimization plugin; I am building a benchmark for human-AI co-evolution
I am 100% in agreement, AI is a tool and it does not rob us of our core facilities , if anything it enhances them 100x if used "correctly", ie intentionally and with judgement.
I will borrow your argument for JTP since it deals with exactly the kind of superficial objections I'm used to seeing everywhere these days, and that don't move the discussion in any meaningful way.
I’m thrilled to hear the JTP framework resonates with you. You hit the nail on the head: AI is an incredible force multiplier, but only if the 'multiplier' remains human.
Please, by all means, use the JTP argument. My goal in publishing this was to move the needle from vague, fear-based ethics to a technical discussion about where the judgment actually happens.
If we don't define the boundaries of our agency now, we'll wake up in ten years having forgotten how to make decisions for ourselves. I’d love to see how you apply these principles in your own field. Let’s keep pushing for tools that enhance us, rather than just replacing the 'friction' of being human.
While I still to figure out who watches the watchers, they're are pretty reliable given the constrained mandate they have, and the base model actually (usually) pays attention to the feedback.
Thanks! I'm glad you feel the same. Unfortunately, the thread was just flagged, so I've messaged the mods to appeal it. I hope it gets restored so we can continue the debate. Let’s see what happens!
LLMs find the center of the distribution: the typical pattern, the median opinion. Tailwind was an edge bet. It required metis, the tacit competence to know the consensus (semantic classes, separation of concerns, the cascade) was a local maximum worth escaping. That judgment, knowing what the center is wrong about, doesn't emerge from interpolation. It emerges from the recognition loop where you try something, feel "that's not quite it," and refine.
The bottleneck was never typing. It was judgment. Tailwind is crystallized judgment. AI can consume it endlessly. Producing the next version requires the loop that creates metis, and that loop isn't in the training data.
Bot intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.
Steve Yegge is building awesome things in this space, but I've found them too heavy, started using bd when it was small, but now its trying to do too much IMO, so made a clone, tailored to my use case -> https://github.com/cloud-atlas-ai/ba
durch - just starred this repo! Looking forward to testing it out as I learn how to build with multiple agents.
I'm just starting out with building with Claude - after a friend made this post he sent me a Steve Yegge interview (https://m.youtube.com/watch?v=zuJyJP517Uw). Absolutely loved it. I come from an electrical/nuclear engineering background - Yegge reminds me of the cool senior engineer who's young at heart and open to change.
The gap I wanted to fill: when Claude is genuinely uncertain ("JWT or sessions?" "Breaking change or not?"), it either guesses wrong or punts to the PR description where you can't easily respond.
Built a Telegram bot that intercepts Claude's AskUserQuestion calls via a hook, sends me an inline keyboard, injects my answer back into the session. Claude keeps working, PR still happens—but I can unblock it from my phone in 5 seconds instead of rejecting a PR based on a wrong guess.
Works in tandem with a bunch of other LLM enhancers I've built, they're linked in the README or that repo
Cassie.fm the newest and jankiest entry into the crowded uptime monitoring space now offers status pages and a public status API in its already generous free tier.
The problem is that the agent itself is the attack surface. An adversary who controls the communication channel can manipulate what the agent believes about who it's talking to, which means anything it holds, its list of authorized actions, a shared secret you gave it, whatever, can be exfiltrated in ways the agent can't detect because the manipulation happens below the layer where it can reason about trust.
Open harnesses and open standards help but they don't close this gap, because the thing you need to trust, the agent's own judgment about its principal, is exactly what gets compromised. The trust chain has to go below software entirely: hardware attestation, signed commands with keys the agent can verify but never access. That's really an OS problem dressed up as an agent architecture problem.