You can’t (practically) unit test LLM responses, at least not in the traditional sense. Instead, you do runtime validation with a technique called “LLM as judge.”
This involves having another prompt, and possibly another model, evaluate the quality of the first response. Then you write your code to try again in a loop and raise an alert if it keeps failing.
This involves having another prompt, and possibly another model, evaluate the quality of the first response. Then you write your code to try again in a loop and raise an alert if it keeps failing.