Hi HN
A few folks and I have been working on this project for a couple weeks now. After previously working on the Docker project for a number of years (both on the container runtime and image registry side), the recent rise in open source language models made us think something similar needed to exist for large language models too.
While not exactly the same as running linux containers, running LLMs shares quite a few of the same challenges. There are "base layers" (e.g. models like Llama 2), specific configuration to run correctly (parameters, temperature, context window sizes etc). There's also embeddings that a model can use at runtime to look up data – we don't support this yet but it's something we're looking at doing soon.
It's an early project, and there's still lots to do!
Ollama's promise of a simple setup was enough to get me out of that rut. Within a minute of seeing your README, I decided that this would be easy enough to experiment with, and if I didn't get it right quickly, I would give up and be no worse off.
I'm glad I gave it a shot. I now have llama2 working locally on my laptop and am already thinking about what I can do with this.
All this is to say: I hope everyone working in this area recognizes that not only is it important to make local LLMs as effective as possible, but it's also critical to make them approachable and easy to set up. By doing this, you'll get more eyes on this tech and more passion for local and open LLMs.