Ask HN: Running LLMs Locally

ttyprintk · 2024-05-15T12:02:52

The “around” part of your question hints that you may want to see the range of options possible with LangChain. Instead, for brainstorming, I’d recommend the description of what Symbolic AI aims to support:

https://github.com/ExtensityAI/symbolicai?tab=readme-ov-file...

I think what you’ll find is that some applications are very capable locally, like Whisper.

A lot of plugins expect to work with the llama.cpp family. Nowadays, that’s HuggingFace TGI: https://huggingface.co/blog/tgi-messages-api

So your application could speak OpenAI api, and you’d run HuggingFace TGI on your hardware for testing and comparison.

FezzikTheGiant · 2024-05-15T17:24:41

How feasible would running a local gpt2 fine-tune be on m1/m2 macs. The usecase I'm building for has privacy as a pretty big need, so I need to ensure the data never leaves the user's machine.

ttyprintk · 2024-05-16T00:53:51

I use CUDA, not MPS, thus my comments about HuggingFace TGI [1]. GPT2 is supported by transformers, but I would expect only a 1.5x speedup [2] in fine-tune by using its GPU instead of CPU.

[1] https://huggingface.co/docs/text-generation-inference/en/sup...

[2] https://github.com/Ankur3107/transformers-on-macbook-m1-gpu