Sorry it's taking so long to review and for the radio silence on the PR. We have...

regen7253 · 2024-06-16T09:08:00 1718528880

Thank you!

I've been using llama.cpp for about a year now, mostly implementing some RAG and React related papers to stay up to date. I mostly used llama.cpp, but since a few months, I started to use both Ollama and Llama.cpp.

If you added grammars I wouldn't have to be running the two servers, I think you're doing an excellent job out of maintaining Ollama. Every update is like Christmas. They also don't seem to have the server as a priority (it's still literally just an example of how you'd use their C api).

So, I understand your position, since their server API has been quite unstable, and the grammar validation didn't work at all until February. I also still can't get their multiple model loading to work reliably right now.

Having said that, GBNF is a godsend for my daily use cases. I even prefer using phi3b with a grammar than deal with the hallucinations of a 70b without it. Fine tuning helps a lot, but can't solve the problem fully (you still need to validate the generation), and it's a lot less agile when implementing ideas. Crating some synthetic data sets is easier if you have support for grammars.

I think many like me are in the same spot. Thank you for being considerate about the stability and support that it would require. But please, take a look at the current state of their grammar validation, which is pretty good right now.

okwhateverdude · 2024-06-16T09:45:11 1718531111

Not to put too fine of a point on it, but why not merge one of the simpler PRs for this feature, gate the feature behind an opt-in env var (ie. OLLAMA_EXPERIMENTAL_GRAMMAR=1), and sprinkle these caveats you've mentioned into the documentation? That should be enough to ward off the casuals that would flood the issue queue. Add more hoops if you'd like.

There seems to be enough interest in this specific feature that you don't need to make it perfect or provide a complicated abstraction. I am very willing to accept/mitigate the side effects for the ability to arbitrarily constrain generation. Not sure about others, but given there are half a dozen different PRs specifically for this feature, I am pretty sure they, too, are willing to accept the side effects.

washadjeffmad · 2024-06-16T13:37:58 1718545078

Since it's trivial enough to run mainline features on actual llama.cpp, it seems redundant to ask ollama to implement and independently maintain branches or features that aren't fully working, if it's not something already in an available testing branch.

We're not relying on ollama for feature development and there are multiple open projects with implementations already, so no one is deprived of anything without this or a hundred other potential PRs not in ollama yet.