I had a prompt I used for this just using Claude Code:
Let's review <filepath or specific file> for architectural issues. Spawn 10 agents, create personas for them, have them review the _api.go and write their review to reviews/<persona>-review.md, then have each agent do a round robin response to 3 of the reviews of their choosing (based on the abstract at the beginning of each review) and write the response to response/<original file name>-<agent persona name>-response.md. Then we do rebuttals to the responses in rebuttals/<response file name>-rebuttal.md. Finally, each agent should launch agents to review the reviews, responses, and rebuttals to their review, and compile findings to findings/<original file name>-findings.md. Finally, have another agent compile the findings and write that to review-findings.md. Present a concise version of the findings here.
This works well with frontier models and even locally hosted models (last I used it was with Qwen 3.5).
I'm new to using more than one agent in a flow so forgive my ignorance here but I have a few questions.
Do you review all the files that are generated to ensure there's no hallucinations? Do you just review the last file of concise findings instead?
Is the intent here that the hallucinations will be countered by running through multiple agents that you end up with only the truth? Have you seen anything in the last version that was egregiously wrong?
I was worried about the cost but if you are using local hosted models, then I suppose you don't need to deal with that as much. Locally hosted models still have issues running commands locally and reaching out to the internet right? So this is all just them running with the context of the file, without reference tot he rest of the project?
> Do you review all the files that are generated to ensure there's no hallucinations? Do you just review the last file of concise findings instead?
Sometimes, yes. Typically I'll read the final fusion doc, and then trace backwards if there is something that looks relevant. Sometimes I'll read all the abstracts as they come through.
> Is the intent here that the hallucinations will be countered by running through multiple agents that you end up with only the truth? Have you seen anything in the last version that was egregiously wrong?
The intent isn't so much avoiding hallucinations, rather I was attempting to acquire unique, domain specific insight. My read-through of the final doc is a 'pick and choose', where I weed out what isn't relevant, and keep what is. I haven't seen anything way out of whack, as agents will typically check each other and call each other out in the review/rebuttal stage.
> So this is all just them running with the context of the file, without reference tot he rest of the project?
This is run in an agent harness locally, so each reviewer has access to the whole project. The review rebuttal stage rarely sees agents re-read files, though, unless one of them is particularly aggressive and is going deep on something (GPT 5 series will do this to make a point, sometimes).
> Thanks for any responses to this.
No worries. This is pretty easy to try, even with something you don't own. You can run it in any harness that allows spawning sub-agents.
If there's a lot of interest, I could spin up a simple web app that does this just so folks can see it churn on a target git repo or project files or whatever.
The idea is pretty simple, identity drives focus and outcomes of each of the 'agents'. The review/rebuttal stage filters stuff that likely doesn't matter in the grand scheme (you'll find reviewers get pedantic and take hard lines on things that ultimately don't matter very much).
This is based on earlier work from late '25 where people were doing similar things. My added bit is the 'unique identities'. What I found was that the root agent typically picks identities relevant to the project, and so you get relevant 'data' or 'views' from a variety of angles.