Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What do you do with Local LLMs?
17 points by rkwz on April 10, 2024 | hide | past | favorite | 19 comments
There are quite a few local LLMs released in the past few months, do you run them? If so, what are your usecases?

Personally, I find them good, but very slow (RTX 3050, Mistral 7B) and hard to make them output in a consistent format (JSON, bullet points). GPT 3.5 makes it look like a pointless exercise from a speed and consistency perspective.

Any usecases for Local LLMs apart from them being local so we can feed sensitive documents?




Other than feeding sensitive documents? I’m not sure. OpenAI has certainly gone out of its way to eliminate the need for local LLMs- they offer fine-tuning of GPT 3.5 and 4, so it’s harder to argue you might get better results from a local model for a particular task with particular data. Though I don’t have personal experience with the ChatGPT fine-tuning.

It’s fun running models from HuggingFace on my computer. Finally, something that utilizes my computer’s 64GB of RAM and 24GB of VRAM. It’s neat seeing the immense performance difference between CPU (Ryzen 7 5700X) and GPU (RTX 3090) offloading.

I think as with most “Cloud vs on-prem” arguments, it comes down to cost vs convenience. Building an application on Azure or AWS is as easy as it gets, but if money is scarce, you can’t beat on-prem for raw resources.

I’m writing a program right now that will query ChatGPT 4 with… A LOT of tokens. We project it will cost between $5k and $15k and probably run for around 2 days. *OR* I could feed that same data through a local model running on the RTX 3090 and it’ll cost like $20 in electricity, and take maybe 6 days.


>I’m writing a program right now that will query ChatGPT 4 with… A LOT of tokens.

What are you writing a program to do? That's more what I feel OP's question is about, and also what I'm interested in.


Document comprehension/extracting certain fields from the document. There’s some overlap with what can be achieved with regex, but some of the “questions” for ChatGPT require an “intelligent” understanding of the documents. Think “what is this document about?” It’s good at that, but less-good at reliably extracting properties… which is where regex comes in handy I suppose.


Thanks. I'm keen to hear how well it works for you.


So far it’s mixed. Prompt engineering is important and makes all the difference, but for what we pay to hit the OpenAI API, I’m not particularly pleased.


A local LLM wont be rate limited, so if you are an AI company you may prefer to secure (as in purchase outright!) your own servers for your AI workload rather than be at the mercy of a third party who could rate limit you. I guess that is not local per-se.

For local it has to be privacy and having uncensored models (and maybe the combination of those things).


True, I forgot about the uncensored aspect. No “sorry, I can’t advise on that….” answers.


> Other than feeding sensitive documents?

Everything in our developer's portal is sensitive to a degree, so a local LLM is required to allow talking to it


I use Mistral (7B v0.2 instruct, 6-bit quantized) to generate the title-text for messages that I send to myself via a Discord bot.

Right now, I'm prompting Mistral to generate these titles in "clickbait" style. I fold the topic of the message and other context into the prompt.

My intention is to shift my attention to the message, which shifts my attention to something else I need to do, because I tend to over-focus on whatever I'm doing at the moment.

It doesn't matter whether what I'm doing at the moment is "good" or "bad". Based on probability, I should almost always switch my attention when I receive such a message because I should have switched an hour ago.

To guarantee consistent JSON output, I use a llama.cpp grammar (converted from JSON schema)

Generation is via CPU (Ryzen 5800) because it's an async background operation and also because my 1070 GPU is being used by Stable Diffusion XL Turbo to generate the image that goes along with the message.


If you send yourself messages via a Discord bot, you lose the privacy advantage of running a LLM locally.

Discord does not have end-to-end encryption for messages.


Indeed. Currently, my primary concerns are a) surprise b) accessibility c) efficiency and d) self-containment.

Surprise, because that draws my attention better. Not interested in guardrails here.

Accessibility, because I can involve my sons without friction.

Efficiency, because using Discord lets me skip building or finding a component (for now).

And I still get a degree of self-containment because Discord is the only piece I'll need to swap out. Bonus that it doesn't have a recurring cost until then.

Yet privacy still matters to me. Despite being Discord-compromised, the detailed personal context within the prompts remains private. The data that determines the timing and topic of each message remains private as well.


The most profitable use case for local LLMs will be one where the end user doesn't even know a local LLM is running, in the same way that a user doesn't know what libraries Photoshop is running, to them it's just Photoshop.

For example, lets say some image editing software decided to use Stable Diffusion to fill in image data in one of their Content Aware tools or something, they would not tell the user to install and run Ollama or sdapi from their CLI. They would install the LLMs when you install the app, and talk to it when you use the app. The end user would never know an LLM is being ran locally, any more than they know DirectX is running. (some might)

I like this use case because image/music/video editing software already requires good CPU/GPU, and in the case of Photoshop, I'm used to my fans blaring when I run Filter Gallery (lol) I as the end user would not need to know that LLMs are being invoked as I use software.

I think this use case is a lot stronger than any cloud-based one as long as it's this expensive to run GPU in the cloud - and the fact that present cloud behavior is to use one of the Big 3, anyone looking for cloud AI will use an OpenAI or another major provider - in the end something from Microsoft, Google, etc.


So after a day up, the only actual answers here are somebody using it to distract themselves from getting too engrossed in what they're doing by sending a clickbait discord post to themself, and someone who wants an offline search engine that hallucinates.

AI seems totally not like a giant bubble.


Well if you're looking for a real use, I ask a local LLM the setup for a child's joke several times a day, and post its answer to my tens of Mastodon followers: https://botsin.space/@jokeunderstander

So it's not all useless.


Well, not useless, but still not fully explaining to me why Nvidia/Intel/etc are investing billions (trillions?) into AI.


All bubbles pop eventually. MongoDB had a stupid bubble for a long time because “SQL databases are dead! Everyone will be on MongoDB!” $570/share it was trading at in 2021. It’s $355 now. Still too high for a company that loses more money each quarter than it makes.


Cut off the internet and use it as an offline search engine. That way you can really just lock yourself in a room and build something without the distraction of social media and memes.


I already do this more and more each day.

ollama run mistral

change this React code to Angular. Here is the code:

or chocolate chip cookies recipe

And get the actual recipe instead of countless blogs and ads

(just watch for the hallucinations - "Hm it calls for 2 gallons of milk")


The irony is I would do something like this for the memes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: