If you go spend $5k on a MacBook Pro m4 max with 128 gigs of ram, and toss on Ollama with Qwen2.5-72b, you have your local LLM, free to run as much as you like.
At first glance that might seem expensive, but then consider how insane it is that you can ask your laptop arbitrary questions and have it respond with really cogent answers, on almost any topic you can think of, without relying on a massive rack of gpu machines behind an api. It uses barely more power than an old incandescent bulb while doing it!
At first glance that might seem expensive, but then consider how insane it is that you can ask your laptop arbitrary questions and have it respond with really cogent answers, on almost any topic you can think of, without relying on a massive rack of gpu machines behind an api. It uses barely more power than an old incandescent bulb while doing it!