When they spit out these subtle bugs, are you promoting the LLM to watch our for that particular bug? I wonder if it just needs a vir more guidance in more explicit terms
At a certain point it becomes more work to prompt the LLM with each and every edge case than it is to just write the dang code.
I work out what the edge cases are by writing and rewriting the code. It's in the process of shaping it that I see where things might go wrong. If an LLM can't do that on its own it isn't of much value for anything complicated.