Hacker News new | past | comments | ask | show | jobs | submit login

Looks interesting! How would you say it compares to Microsoft's TypeChat (beyond the obvious Python/TypeScript difference)?

https://microsoft.github.io/TypeChat/blog/introducing-typech...




Thanks for bringing this library to my attention! From my understanding, TypeChat proceeds by (1) generating (2) attempting validation (3) if it fails, call the LLM again to fix the output (4) etc.

Our method on the other guarantees that the output will follow the specs of the JSON schema. No need to call the LLM several times.


There's also https://lmql.ai/


LQML (and guidance https://github.com/guidance-ai/guidance) are much more inefficient. They loop over the entire vocabulary at each step, we only do it once at initialization.


Does looping over the vocabulary add much overhead to the tok/s? I imagine they're just checking if the input is in a set, and usually there's only ~30k tokens. That's somewhat intensive, but inference on the neural net feels like it'd take longer.


They’re checking regex partial matches for each possible completion, which is intensive indeed. You can look at the Figure 2 in our paper (link in original post) for a simple comparison with MS guidance which shows the difference.


TypeChat: let's try really hard to try to convince the model to make the highest-scoring tokens follow the grammar we want.

Guidance (and this project?): Let's not even bother with trying to convince the model; instead, we'll only sample from the set of tokens that are guaranteed to be correct for the grammar we want to emit.


Yeah, and our addition to all that is to almost completely remove the cost of determining the next valid tokens on each step.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: