There are quite a few local LLMs released in the past few months, do you run them? If so, what are your usecases?
Personally, I find them good, but very slow (RTX 3050, Mistral 7B) and hard to make them output in a consistent format (JSON, bullet points). GPT 3.5 makes it look like a pointless exercise from a speed and consistency perspective.
Any usecases for Local LLMs apart from them being local so we can feed sensitive documents?
It’s fun running models from HuggingFace on my computer. Finally, something that utilizes my computer’s 64GB of RAM and 24GB of VRAM. It’s neat seeing the immense performance difference between CPU (Ryzen 7 5700X) and GPU (RTX 3090) offloading.
I think as with most “Cloud vs on-prem” arguments, it comes down to cost vs convenience. Building an application on Azure or AWS is as easy as it gets, but if money is scarce, you can’t beat on-prem for raw resources.
I’m writing a program right now that will query ChatGPT 4 with… A LOT of tokens. We project it will cost between $5k and $15k and probably run for around 2 days. *OR* I could feed that same data through a local model running on the RTX 3090 and it’ll cost like $20 in electricity, and take maybe 6 days.