More

simonw · 2026-03-04T16:44:15 1772642655

The thing I'm most excited about is the moment that I run a model on my 64GB M2 that can usefully drive a coding agent harness.

Maybe Qwen3.5-35B-A3B is that model? This comment reports good results: https://news.ycombinator.com/item?id=47249343#47249782

I need to put that through its paces.

simonw · 2026-03-04T16:08:49 1772640529

I do see agents pop out tests that look like this occasionally:

  it { expect(classroom).to have_many(:students) }

If I catch them I tell them not to and they remove it again, but a few do end up slipping through.

I'm not sure that they're particularly harmful any more though. It used to be that they added extra weight to your test suite, meaning when you make changes you have to update pointless tests.

But if the agent is updating the pointless tests for you I can afford a little bit of unnecessary testing bloat.

simonw · 2026-03-04T16:06:54 1772640414

I hadn't heard that term before, is it widely used?

https://agentexperience.ax/ describes it as "refers to the holistic experience AI agents have when interacting with a product, platform, or system" which feels to me like a different concept to figuring out patterns for effectively using coding agents as a software engineer.

simonw · 2026-03-04T16:05:28 1772640328

Yeah, I think that's one of the biggest anti-patterns right now: dumping thousands of lines of agent-generated code on your team to review, which effectively delegates the real work to other people.

simonw · 2026-03-04T15:25:47 1772637947

I had an example in that section but it got picked apart by pedants (who had good points) so I removed it. I plan to add another soon. You can still see it in the changelog: https://simonwillison.net/guides/agentic-engineering-pattern...

simonw · 2026-03-04T15:24:11 1772637851

That's part of the reason I like red/green TDD - you make the agent show that the test fails before the implementation and passes afterwards.

It can still cheat, but it's less likely to cheat.

simonw · 2026-03-04T15:22:40 1772637760

Personally my plan is to hoard more.

simonw · 2026-03-04T15:21:40 1772637700

This is genuinely one of the most interesting questions right now. I don't have solid answers yet, and I'm very keen to learn what people are finding works.

If you accelerate the pace of code creation it inevitably creates bottlenecks elsewhere. Code review is by far the biggest of those right now.

There may be an argument for leaning less on code review. When code is expensive to produce and is likely to stay in production for many years it's obviously important to review it very carefully. If code is cheap and can be inexpensively replaced maybe we can lower our review standards?

But I don't want to lower my standards! I want the code I'm producing with coding agents to be better than the code I would produce without them.

There are some aspects of code review that you cannot skimp on. Things like coding standards may not matter as much, but security review will never be optional.

I've recently been wondering what we can learn from security teams at large companies. Once you have dozens or hundreds of teams shipping features at the same time - teams with varying levels of experience - you can no longer trust those teams not to make mistakes. I expect that the same strategies used by security teams at Facebook/Google-scale organizations could now be relevant to smaller organizations where coding agents are responsible for increasing amounts of code.

Generally though I think this is very much an unsolved problem. I hope to document the effective patterns for this as they emerge.

malexw · 2026-03-04T15:43:37 1772639017

I think Martin Fowler's "Refactoring" might give a bit of insight here. One of my take-aways after reading that book is that the specific implementation of a function is not very important if you have tests. He argues that it can sometimes be easier to completely re-write a function than to take the time to understand it - as long as you can validate that your re-write performs the same way. This mindset lines up pretty closely with how I've been using LLMs.

If that's true, then I would think the emphasis in code review should be more on test quality and verifying that the spec is captured accurately, and as you suggest, the actual implementation is less important.

ep103 · 2026-03-04T16:04:59 1772640299

Counter-point, developers that get used to not caring about function implementation, are going to culturally also not care as much about test implementation, making this proposed ideal impossible.

cma256 · 2026-03-04T15:39:59 1772638799

> There may be an argument for leaning less on code review. When code is expensive to produce and is likely to stay in production for many years it's obviously important to review it very carefully. If code is cheap and can be inexpensively replaced maybe we can lower our review standards?

Agree with everything else you said except this. In my opinion, this assumes code becomes more like a consumable as code-production costs reduce. But I don't think that's the case. Incorrect, but not visibly incorrect, code will sit in place for years.

simonw · 2026-03-04T16:03:35 1772640215

> Agree with everything else you said except this.

Yeah, I'm not sure I agree with what I said there myself!

> Incorrect, but not visibly incorrect, code will sit in place for years.

If you let incorrect code sit in place for years I think that suggests a gap in your wider process somewhere.

I'm still trying to figure out what closing those gaps looks like.

The StrongDM pattern is interesting - having an ongoing swarm of testing agents which hammer away at a staging cluster trying different things and noting stuff that breaks. Effectively an agent-driven QA team.

I'm not going to add that to the guide until I've heard it working for other teams and experienced it myself though!

simonw · 2026-03-04T14:41:15 1772635275

That's a great tip, thanks! I did not know Zig could do this.

You can "pip install ziglang" and get the right version for different platforms too.

simonw · 2026-03-04T13:55:30 1772632530

Which pieces of my writing are garbage?

andy_ppp · 2026-03-04T15:21:46 1772637706

They won't have a decent response, this is the Internet after all. I really enjoyed it thanks for writing it and I'll take a lot of it onboard. I think everyone will have their own software stack and AIs designed perfectly for them to do their work in the future.