Looks interesting! How would you say it compares to Microsoft's TypeChat (beyond...

remilouf · on Aug 14, 2023

Thanks for bringing this library to my attention! From my understanding, TypeChat proceeds by (1) generating (2) attempting validation (3) if it fails, call the LLM again to fix the output (4) etc.

Our method on the other guarantees that the output will follow the specs of the JSON schema. No need to call the LLM several times.

1wheel · on Aug 14, 2023

There's also https://lmql.ai/

remilouf · on Aug 14, 2023

LQML (and guidance https://github.com/guidance-ai/guidance) are much more inefficient. They loop over the entire vocabulary at each step, we only do it once at initialization.

potatoman22 · on Aug 15, 2023

Does looping over the vocabulary add much overhead to the tok/s? I imagine they're just checking if the input is in a set, and usually there's only ~30k tokens. That's somewhat intensive, but inference on the neural net feels like it'd take longer.

remilouf · on Aug 15, 2023

They’re checking regex partial matches for each possible completion, which is intensive indeed. You can look at the Figure 2 in our paper (link in original post) for a simple comparison with MS guidance which shows the difference.

2bitencryption · on Aug 14, 2023

TypeChat: let's try really hard to try to convince the model to make the highest-scoring tokens follow the grammar we want.

Guidance (and this project?): Let's not even bother with trying to convince the model; instead, we'll only sample from the set of tokens that are guaranteed to be correct for the grammar we want to emit.

btwillard · on Aug 14, 2023

Yeah, and our addition to all that is to almost completely remove the cost of determining the next valid tokens on each step.