Structured Generation Improves LLM Performance: GSM8K Benchmark

curionav · 2024-03-15T14:21:07 1710512467

Intuitively, regex or json grammar have a much lower "semantic dimension" than what today LLMs allow. Maybe the observed performance gains result from such lower dimensionality.

remilouf · 2024-03-15T14:21:54 1710512514

What do you mean by "semantic dimension"?

remilouf · 2024-03-15T14:06:10 1710511570

That whole structured generation line of work looks promising. I hope someone else takes this and runs evaluations on other benchmarks. Curious to see if the results translate!

Homunculiheaded · 2024-03-15T14:47:02 1710514022

Agreed! While these results are very promising, there's still a lot to explore in this space.

In addition to the "prompt consistency" and "thought-control" ideas mentioned in the post, I'm definitely curious how the performance is on more complex structured data (things like codegen).