Hacker Newsnew | past | comments | ask | show | jobs | submit | buschleague's commentslogin

This is exactly right. The mental model gap is the real risk for AI-first builders. The code works until it doesn't, and when it breaks you have no framework for understanding why.

One thing that helped us: externalize the structure that experienced developers carry in their heads. Things like test driven development or wheel-and-spoke based file size limitations etc. are the distilled judgment of decades of software engineering. But if you've never written code traditionally, you don't know they exist.

We formalized these into enforced workflows. What I found pretty exciting about it, from an educational tool standpoint, is that the side effect was that new team members and vibe coders working within those constraints started absorbing the patterns themselves. They learn why tests matter because the system won't let them skip them and learn why file size matters because the system blocks them and forces decomposition etc.


This isn't a surprise at all. I sat down with the dev team at OpenAI during dev day last year and the biggest shocker to me: these "kids" are over here vibe coding the whole damn thing.

This is exactly why enforcement needs to be architectural. The "challenges around maintainability and scalability" your clients hit exist because their AI workflows had zero structural constraints. The output quality problem isn't the model, it's the lack of workflow infrastructure around it.

Is this not just “build a better prompt” in more words?

At what point do we realize that the best way to prompt is with formal language? I.e. a programming language?


No, the suite of linters, test suite and documentation in your codebase cannot be equated to “a better prompt” except in the sense that all feedback of any kind is part of what the model uses to make decisions about how to act.

A properly set up and maintained codebase is the core duty of a software engineer. Sounds like the great-grandparent comment’s client needed a software engineer.

What if LLMs, at the end of the day are machines, so for now generally dumber than humans and the best they can provide are at most statistically median implementantions (and if 80% of code out there is crap, the median will be low)?

Now that's a scary thought that basically goes against "1 trillion dollars can't be wrong".

Now, LLMs are probably great range extenders, but they're not wonder weapons.


Also who is to say what is actually crap? Writing great code is completely dependent on context. An AI could exclusively be trained on the most beautiful and clean code in the world, yet if it chooses the wrong paradigm in the wrong context, it doesn't matter how beautiful that code is - it's still gonna be totally broken code.

We run agent teams (Navigator/Driver/Reviewer roles) on a 71K-line codebase. The trust problem is solved by not trusting the agents at all. You enforce externally. Python gates that block task completion until tests pass, acceptance criteria are verified, and architecture limits are met. The agents can't bypass enforcement mechanisms they can't touch. It's not about better prompts or more capable models. It's about infrastructure that makes "going off the rails" structurally impossible.

Which service or application?

State management. The agents lose track of what they already did, re-implement things, or contradict decisions from 20 minutes ago. You need external state that survives compaction because the agent can't be trusted to maintain its own.

Constraint adherence degrades over long chains. You can put rules in system prompts, but agents follow them for the first few steps, then gradually drift. Instructions are suggestions. The longer the chain, the more they're ignored.

Cost unpredictability is real but solvable.

Ultimately, the systems need external enforcement rather than internal instruction. Markdown rules, or jinja templates etc., that the agent can read (and ignore) don't work at production scale. We ended up solving this by building Python enforcement gates that block task completion until acceptance criteria are verified, tests pass, and architecture limits are met. The core learning being that agents can't bypass what they don't control.


Exactly. Agents drift fast-internal state just can’t be trusted over long chains, and prompt rules degrade immediately.

Curious; have you seen drift follow a pattern, like step count or constraint complexity?

We’ve tried hybrid setups: ephemeral agent state plus external validation gates. Cuts down rollbacks while keeping control tight.

Would love to hear if anyone else has experimented with something similar.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: