Same experience. The smaller and more focused the context, the higher the consis...

Same experience.

The smaller and more focused the context, the higher the consistency of output, and the lower the chance of jank.

Fundamentally no different than giving instructions to a junior dev. Be more specific -- point them to the right docs, distill the requirements, identify the relevant areas of the source -- to get good output.

My last attempt at a workflow of agents was at the 3.5 to 4 transition and OpenAI wasn't good enough at that point to produce consistently good output and was slow to boot.

My team has taken the stance that getting consistently good output from LLMs is really an ETL exercise: acquire, aggregate, and transform the minimum relevant data for the output to reach the desired level of quality and depth and let the LLM do it's thing.