AICI is a proposed common interface between LLM inference engines (llama.cpp, vLLM, HF Transformers, etc.) and "controllers" - programs that can constrain the LLM output according to regexp, grammar, or custom logic, as well as control the generation process (forking, backtracking, etc.).
AICI is based on Wasm, and is designed to be fast (runs on CPU while GPU is busy), secure (can run in multi-tenant cloud deployments), and flexible (allow libraries like Guidance, LMQL, Outlines, etc. to work on top of it).
We (Microsoft Research) have released it recently, and would love feedback on the design of the interface, as well as our Rust AICI runtime.
I'm the lead developer on this project and happy to answer any questions!
The most obvious usage of this is forcing a model to output valid JSON - including JSON that exactly matches a given schema (like OpenAI Functions if they were 100% reliable).
That Python code is a really elegant piece of API design.