I would like to use stuff like this as a side-project. Buy a Nvidia Geforce GPU ...

slowmovintarget · on May 22, 2023

Here's a sheet I came across today...

https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...

Granted, it's talking about quantized models, which use less memory. But you can see the 30B models taking 36 GB at 8-bit, and at least 20 GB at 4-bit.

The page even lists the recommended cards.

But as others have pointed out, you may get more bang "renting" as in purchasing cloud instances able to run these workloads. Buying a system costs about as much as buying instance time for one year. Theoretically, if you only run sporadic workloads when you're playing around it would cost less. If you're training... that's a different story.

lhl · on May 23, 2023

The more VRAM the better if you'd like to run larger LLMs. Old Nvidia P40 (Pascal 24GB) cards are easily available for $200 or less and would be easy/cheap to play. Here's a recent writeup on the LLM performance you can expect for inferencing (training speeds I assume would be similar): https://www.reddit.com/r/LocalLLaMA/comments/13n8bqh/my_resu...

This repo lists very specific VRAM usage for various LLaMA models (w/ group size, and accounting for context window, which is often missing) - this are all 4-bit GPTQ quantized models: https://github.com/turboderp/exllama

Note the latest versions of llama.cpp now have decent GPU support and has both a memory tester and lets you load partial models (n-layers) into your GPU. It inferences about 2X slower than exllama from my testing on a RTX 4090, but still about 6X faster than my CPU (Ryzen 5950X).

Again this is inferencing. For training, pay attention to 4-bit bitsandbytes, coming soon: https://twitter.com/Tim_Dettmers/status/1657010039679512576

nharada · on May 22, 2023

Wait why is renting a GPU in the cloud not a solution? You can even try multiple options and see which ones are capable enough for your use case.

Look into some barebones cloud GPU services, for example Lambda Labs which is significantly cheaper than AWS/GCP but offers basically nothing besides the machine with a GPU. You could even try something like Vast in which people rent out their personal GPU machines for cheap. Not something I'd use for uhhh...basically anything corporate, but for a personal project with no data security or uptime issues it would probably work great.

itake · on May 22, 2023

My annoyance was managing state. I’d have to spend hours installing tools, downloading data, updating code, then when I want to go to bed I have to package it up and store as much as I can on s3 before shutting off the $$ server.

ciberado · on May 22, 2023

I've played a lot with Stable Diffusion using AWS spot instances, mostly because it is the platform with which I'm more familiar. The Terraform script[0] should be easy to adapt to any other project of this kind.

Let me know if you are interested, and maybe we can find time to work on it together :).

[0] https://github.com/ciberado/stable-diffusion-webui-terraform...

mackman · on May 22, 2023

If you shut down an ec2 instance you only pay for the volume not the compute. Just shut it down and start it up the next day.

izuchukwu · on May 23, 2023

You should check out https://brev.dev. You can rent GPUs, pause instances, use your own AWS/GCP accounts to make use of your credits, and the CLI lets you use your GPU as if it’s on your local machine.

akiselev · on May 23, 2023

You can try skypilot: https://skypilot.readthedocs.io/en/latest/

It handles storage, setup, etc for machine learning work loads across several providers - which helps a lot if you need one of the instances that rarely have capacity like 8x A100 pods.

orloffm · on May 23, 2023

Aren't Vortex AI Workbooks on Google Cloud exactly the tool to solve this?

fnordpiglet · on May 22, 2023

aws s3 sync + image snapshot

itake · on May 22, 2023

all that costs time :-/ When you're finished working, you got to wait for the syncs to finish (or I guess run them all the time?).

For home computer setups, you can simply walk away when you need a break.

fnordpiglet · on May 23, 2023

No, you write a script that runs the sync and shut down the instance. When for instance is stopped you don’t pay for it. Resuming it is a simple api call. You don’t even really need to do the sync, it’s just to be certain if the instance volume is lost you have a backup.

The shutdown / stop on an instance is like closing the lid on your laptop. When you start it again it resumes where it was left off. In the mean time the instance doesn’t occupy a VM.

A caveat is you can’t really do this with spot instances. You would need to do a sync and rebuild on start. But, again, scriptable easily.

pdntspa · on May 23, 2023

Dont forget you can play computer games on it too!

sneak · on May 22, 2023

This is what containers solve. Don't waste time manually installing things. Store state in a database via the app on a different host.

kunwon1 · on May 22, 2023

Speaking as someone who has encountered similar difficulties, this response has strong 'Draw the rest of the owl' vibes

sneak · on May 22, 2023

Speaking as someone who has solved these difficulties hundreds of times, "draw the rest of the owl" doesn't tell you the specific things to google to get detailed examples and tutorials on how millions of others have sidestepped these repeated issues.

itake · on May 22, 2023

Yep... you spend hours messing around with docker containers and debugging all the weird build errors.

I am less familiar with storing data in a db (for ml hosting concerns), but I'd imagine it would add overhead (as opposed to accessing files on disk).

You also have to deal with hosting a db and configuring the schema.

sneak · on May 22, 2023

You "spend hours messing around" with everything you don't know or understand at first. One could say the same about writing the software itself. At its core Dockerfiles are just shell scripts with worse syntax, so it's not really that much more to learn. Once you get it done once, you don't have to screw around with it anymore, and you have it on any box you want in seconds.

In either case you have to spend hours screwing around with your environment. If those hours result in a Dockerfile, then it's the last time. If they don't, then it's each time you want it on a new host (which as was correctly pointed out a pain in the ass).

Storing data in a database vs in files on disk is like application development 101 and is pretty much a required skill period. It's required that you learn how to do this because almost all applications revolve around storing some kind of state and, as was noted, you can't reasonably expect it to persist on the app server without additional ops headaches.

Many people will host dbs for you without you having to think about it. Schema is only required if you use a structured db (which is advisable) but it doesn't take that long.

NBJack · on May 22, 2023

I applaud your experience, but honestly I agree with parent: knowledge acquisition for a side project may not be the best use of their time, especially if it significantly impedes actually launching/finishing a first iteration.

It's a similar situation for most apps/services/startup ideas: you don't necessarily need a planet scale solution in the beginning. Containers are great and solve lots of problems, but they are not a panacea and come with their own drawbacks. Anecdotally, I personally wanted to make a small local 3 node Kubernetes cluster at one time on my beefy hypervisor. By the time I learned the ins and outs of Kubernetes networking, I lost momentum. It also didn't end up giving me what I wanted out of it. Educational, sure, but in the end not useful to me.

simonw · on May 22, 2023

I'm having trouble imagining what data I would store in a database as opposed to a filesystem if my goal is to experiment with large models like Stable Diffusion.

pdntspa · on May 23, 2023

I would take GP's kind of dogmatic jibber jabber with a grain of salt. There is an unspoken and timeless elegance to the simplicity of running a program from a folder with files as state

throwaway2037 · on May 23, 2023

Isn't terminfo db famous for this filesystem-as-db approach? File Vs DB: I say do whatever works for you. There is certainly more overhead in the DB route.

ElectricalUnion · on May 23, 2023

> Storing data in a database vs in files on disk is like application development 101

It's OK until you're dealing with, say, 130GiB of tensors, on what is effectively a binary blob that needs to be mostly in VRAM somehow.

I really don't want to read 130GiB of blobs from a database all the time.

novok · on May 23, 2023

IMO tensors & other large binary blobs are fairly edge-casey. You might as well treat them like video files and video file servers also don't store large videos in databases either, and most devs don't have large binary blob management experience.

chaxor · on May 23, 2023

'shell scripts with worse syntax' lol I wish shell could emulate Alpine on a non-linux box. Shell script with worse syntax for config VM may be closer to a qemu cloud init file.

machinawhite · on May 22, 2023

Well then you're wasting time managing your containers? Have you ever used k8s, it's a full time job lol

zztop44 · on May 22, 2023

I don’t think anyone is suggesting k8s for running an ai model as a side project on a single machine.

machinawhite · on May 23, 2023

> a side project on a single machine

Sorry I don't see how will containers help here

ElectricalUnion · on May 23, 2023

I used to think that the 300MB containers I shipped to prd were reasonably unoptimized.

Those AMD ROCm containers are like 14GiB compressed.

throwaway6734 · on May 23, 2023

Docker?

nomel · on May 22, 2023

Besides some great tooling out there if you wanted to roll your own, you can literally rent windows/linux computers, with persistent disks. If you have good internet, you can even use it as a gaming PC, as I do.

itake · on May 22, 2023

Is there an easy way to off-board the persistent disk to cheaper machines when you don't need the gpus?

Like imagine, setting up and installing everything with the gpu attached, but when you're not using the gpu or all the cpu cores, you can disconnect them.

If you have docs on how to do this, please let me know.

helsinkiandrew · on May 23, 2023

With AWS (and probably most other cloud VPC services) the disk is remote from the hardware so you can halt the CPU and just pay for the storage until you restart.

AWS also provides accessible datasets of training data:

https://aws.amazon.com/marketplace/search/results?trk=868d87...

cmgbhm · on May 22, 2023

https://towardsdatascience.com/stop-duplicating-deep-learnin...

This is basically the building blocks.

haliskerbas · on May 22, 2023

What sort of frame rate and cost do you get? I have the highest tier GeForce now subscription and it sometimes drops to horrible conditions.

fnord77 · on May 22, 2023

"but offers basically nothing besides the machine with a GPU"

they must offer distributed storage, that can accommodate massive models, though? how else would you have multiple GPUs working on a single training model?

anonzzzies · on May 23, 2023

I have not seen many setups that wouldn’t pay itself back (including energy in my case) within a year (sometimes even 6 months) with buying vs renting. For something that pays itself back that fast, and that is without renting it out myself, just training with it, I cannot see how I would want to rent one.

Edit; on Lambda labs, the only exception seems to be the H100; it would be 1.5 years or so, but even 2 years would still fast enough. I have an A100 which paid itself back; thinking of getting another one.

fennecfoxy · on May 24, 2023

I think the downside to buying hardware as well is that compared to other tech, this LLM/ML stuff is moving very quickly, people are great at quantising now (whereas I only really saw it done before for the coral edge TPUs etc).

Someone could buy an H100 to run the biggest and bestest stuff right now, but we could find that a model gets shrunk down to run on a consumer card within a year or two with equivalent performance.

I suppose it makes sense if someone wants to be on the bleeding edge all the time.

chaxor · on May 23, 2023

Cloud has a variable price (up to the whim of whatever they decide the price to be that day) so it's uncertain, but typically it is far" more expensive for this type of application. So when faced with 1) probably far more expensive, or 2) single price that will be cheaper, is always available, and has far more uses, I think most would choose 2) for self-hosting. Cloud is very rarely* a good option.

ginko · on May 22, 2023

Having to connect to a gpu over the internet seems extremely cumbersome. Stuff like this should be as easy as running a local program with an accelerator.

ttt3ts · on May 22, 2023

You can finetune whisper, stable diffusion, and LLM up to about 15B parameters with 24GB VRAM.

Which leads you to what hardware to get. Best bang for the $ right now is definitely a used 3090 at ~$700. If you want more than 24GB vram just rent the hardware as it will be cheaper.

If you're not willing to drop $700 don't buy anything just rent. I have had decent luck with vast.ai

NBJack · on May 22, 2023

There is the world of used Nvidia Teslas, like the M40. Very cheap, but some assembly required.

ttt3ts · on May 23, 2023

I own a P100, P4, M40. They either lack ram or speed. Also, unless you are putting them in a server you have to cool them.

If your goal is to learn ML don't tinker with very obsolete hardware. Rent or buy something modern.

copperx · on May 23, 2023

What do you mean by assembly needed? I looked them up and they look like normal gfx cards. Am I missing something?

zamnos · on May 23, 2023

They're datacenter GPUs so you need special power supplies and an adapter to connect it to a regular motherboard.

spudlyo · on May 23, 2023

From what I’ve read cooling can also be a challenge.

flangola7 · on May 22, 2023

What is their advantages?

ceejayoz · on May 22, 2023

“Very cheap” isn’t enough?

fswd · on May 23, 2023

apparently there are a group of folks finetuning with these cards.

ttt3ts · on May 23, 2023

Source? I own 3 and I would take a single 3090 any day. M40s are simply too old.

aci_12 · on May 23, 2023

Can we use 3090 or nvidia gpus in general with mac? do people generally have their windows dekstop for the gpus?

ttt3ts · on May 23, 2023

No clue but if you want learn/finetuned ML use a Linux box otherwise you will spend all your time fighting your machine. If you just want to run models Mac might work.

syntaxing · on May 22, 2023

I would recommend a 3090. It can handle everything a 4000 series can albeit slightly slower, has enough VRAM to handle most things for fun, and can be bought for around $700.

segmondy · on May 22, 2023

Just do it. Spend a few hours doing research and you will find out. With that said, buy as much memory as can. That makes 4090 king if you have the server that can carry it, plus the budget. For me, I settled for 3060, it's a nice compromise between cost and ram. Cheap, 12gb and 170TPW.

ra · on May 22, 2023

I think you just need to educate yourself a bit about the space. These models are very small (the large version is only 1B parameters) so should run on a 4GB gaming GPU.

bigdict · on May 22, 2023

A 3080 Ti is sufficient to run the largest Whisper model (just under 12 GB VRAM).

KRAKRISMOTT · on May 23, 2023

Use the A100 as a base reference and work your way down from there.