Hacker News new | past | comments | ask | show | jobs | submit login

All you need is 2 3090s.



All you need is 64GB of RAM and a CPU, actually. Two 3090s is much faster but not strictly necessary.


All you need is a few thousand dollars lying around to spend solely on your inference fun?

I don’t think that many people really qualify as such (though it’s probably true that many of them are on HN).


Not just inference.

AFAIK, you are able to fine-tune the models with custom data[1], which does not seem to require anything but a GPU with enough VRAM to fit the model in question. I'm looking to get my hands on an RTX 4090 to ingest all of the repair manuals of a certain company and have a chatbot capable of guiding repairs, or at least try to do so. So far doing inference only as well.

[1] https://github.com/tloen/alpaca-lora


you might think about do the training in the cloud and then your back to needing standard hardware for the bot.

Also, another thought might be to generate embeddings for each paragraph of the manual and then index those using Faiss then you generate an embedding of the question and use Faiss to return the most relevant paragraphs feed those into the model with a prompt like "given the following: {paragraphs} \n\n {questions}"

I'm sure there are better prompts but you get the idea.


>All you need is a few thousand dollars lying around to spend solely on your inference fun? I don’t think that many people really qualify as such (though it’s probably true that many of them are on HN).

Can confirm. Did a new build just for inference fun. Expensive, and worth it.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: