Hey HN, I've been using GPT a lot lately in some side projects around data generation and benchmarking. During the course of prompt tuning I ended up with a pretty complicated request: the value that I was looking for, an explanation, a criticism, etc. JSON was the most natural output format for this but results would often be broken, have wrong types, or contain missing fields.
There's been some positive movement in this space, like with jsonformer (https://github.com/1rgs/jsonformer) the other day. But nothing that was plug and play with GPT.
This library consolidates the separate logic that I built across 5 different projects. It lets you prompt the model for how it should return fields, inject variable prompts, handle common formatting errors, then cast to pydantic when you're done for typehinting and validation in your IDE. If you're able to play around with it, let me know what you think.
What we've found useful in practice in dealing with similar problems:
- Use json5 instead of json when parsing. It allows trailing commas.
- Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.
- For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense.