That was exactly my same question. Then I finished reading the post. The reason is pretty clear, and written in the post: it is faster than ollama+mlx.
I was benchmarking different models, different engines, and different draft models, I posted a video on twitter, and people started asking about the setup in the final screen recording. So the blog post isn't so much "how a beginner should setup something" it's "here's the setup I posted in the video".
is this article old? It's not. I'm not sure why he went through all the bother of llama.cpp