Hacker News new | past | comments | ask | show | jobs | submit login

LQML (and guidance https://github.com/guidance-ai/guidance) are much more inefficient. They loop over the entire vocabulary at each step, we only do it once at initialization.



Does looping over the vocabulary add much overhead to the tok/s? I imagine they're just checking if the input is in a set, and usually there's only ~30k tokens. That's somewhat intensive, but inference on the neural net feels like it'd take longer.


They’re checking regex partial matches for each possible completion, which is intensive indeed. You can look at the Figure 2 in our paper (link in original post) for a simple comparison with MS guidance which shows the difference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: