Hacker Newsnew | past | comments | ask | show | jobs | submit | bhamm-lab's commentslogin

Wrote up some notes from upgrading my local llama.cpp setup on AMD Strix Halo hardware and testing a batch of newer open models.

Main findings:

- Kimi Linear 48B works well as a generalist on my hardware (fast, consistent).

- Qwen3 Coder Next is my new default for coding tasks.

- Aggressive quants (Q2_K_XL) on massive models (200B+) can still be useful for long-running/background tasks, even if they aren't interactive.

Happy to answer questions about the dual AI Max+ 395 setup or how I run models in kubernetes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: