Why AI code fails differently: What I learned talking to 200 engineering teams

veunes · 2025-11-13T12:17:48 1763036268

The observation is very accurate, but the conclusion is incomplete. The "unwritten rules" problem is, first and foremost, a symptom of a weak engineering culture and a lack of documentation. If a rule is critical to system stability (like the async analytics), it shouldn't be "living in engineers' heads"

Instead of layering on another AI for validation, maybe code generation should be used as a catalyst to finally formalize these rules. Turn them into custom linting rules, architectural tests (like with ArchUnit), or just well-written documentation that a model can be fine-tuned on. Using AI as a crutch for bad processes is a dangerous path

GreenGames · 2025-11-12T15:45:37 1762962337

Super interesting take Paul. Curious btw, how are these teams actually encoding their “institutional knowledge” into constraints? Like is it some manual config or more like natural‑language rules that evolve with the codebase?

pomarie · 2025-11-12T15:50:28 1762962628

Good q! So it depends.

Some teams are using Claude or similar models in GitHub Actions, which automatically review PRs. The rules are basically natural language encoded in a YAML file that's committed in the codebase. Pretty lightweight to get started.

Other teams upgrade to dedicated tools like cubic. We have a feature where you can encode your rules either in our UI, or we're releasing a feature where you can write them directly in your codebase. We'll check them on every PR and leave comments when something violates a constraint.

The in-codebase approach is nice because the rules live next to the code they're protecting, so they evolve naturally as your system changes.

veunes · 2025-11-13T12:24:41 1763036681

The "in-codebase" approach is the right one, but a YAML file with plain text is a half-measure. The most reliable rule that "lives next to the code" is an architectural test. An ArchUnit test verifying that "all routes in /billing/* call requireAuth" is also code, it's versioned with the project, and it breaks the build deterministically That is a more robust engineering solution, unlike semantic text interpretation, which can fail

pomarie · 2025-11-12T15:41:49 1762962109

We're building something at cubic that helps with this. You write your constraints in plain English, and AI enforces them semantically on every PR.

If you're curious, you can check it out here: https://cubic.dev

Happy to answer any questions about what we've seen working (or not working) across different teams.