or you can just load up ollama, have it load a local model and point claude or o...

malkosta · 2026-06-12T19:35:14 1781292914

That was exactly my same question. Then I finished reading the post. The reason is pretty clear, and written in the post: it is faster than ollama+mlx.

sleepybrett · 2026-06-12T20:01:22 1781294482

how much faster?

freerunnering · 2026-06-13T03:23:54 1781321034

I was benchmarking different models, different engines, and different draft models, I posted a video on twitter, and people started asking about the setup in the final screen recording. So the blog post isn't so much "how a beginner should setup something" it's "here's the setup I posted in the video".

Original video: https://x.com/Freerunnering/status/2065275403548168398

And in the blog post there is a table showing the different speeds I got from different engines.

Slowest combo was 38.1 tk/s, and the fastest was 72.2 tk/s. All from "the same" model.

krzyk · 2026-06-13T08:47:47 1781340467

ollama is a wrapper on top of llama.cpp, and it makes llama.cpp slower, why use it?

Also Ollama has other issues (like forgetting what it really is - a wrapper).