I love the full stack deep learning crew, and took their course several years ago in Berkeley. I highly recommend it.
One thing that always blows my mind is how much it is just not worth it to train LLMs in the cloud if you're a startup (and probably even less so for really large companies). Compared to 36 month reserved pricing the break even point was 8 months if you bought the hardware and rented out some racks at a colo, and that includes the on hands support. Having the dedicated hardware also meant that researchers were willing to experiment more when we weren't doing a planned training job, as it wouldn't pull from our budget. We spent a sizable chuck of our raise on that cluster but it was worth every penny.
I will say that I would not put customer facing inference on prem at this point- the resiliency of the cloud normally offsets the pricing, and most inference can be done with cheaper hardware than training. For training though you can get away with a weaker SLA, and the cloud is always there if you really need to burst beyond what you've purchased.
> Compared to 36 month reserved pricing the break even point was 8 months if you bought the hardware and rented out some racks at a colo, and that includes the on hands support. Having the dedicated hardware also meant that researchers were willing to experiment more when we weren't doing a planned training job, as it wouldn't pull from our budget.
That’s having it both ways, of course. You can’t both recoup the hardware cost in 8 months and have “free” downtime.
Under this pricing you need at least 25%ish duty cycle to break even (in 3 years) so probably still favoring buying, but for some people that might not add up. Pricing also varies drastically between providers, so this may depend on choice there.
It’s the first widespread thing where build your own makes sense in a while. Most prior workflows either needed high SLAs, were bursty, or just didn’t add up to much.
My experience with many of these services renting mostly A100s:
LambdaLabs: For on-demand instances, they are the cheapest available option. Their offering is straightforward, and I've never had a problem. The downside is that their instance availability is spotty. It seems like things have gotten a little better in the last month, and 8x machines are available more often than not, but single A100s were rarely available for most of this year. Another downside is lack of persistent storage, meaning you have to transfer your data every time you start a new instance. They have some persistent storage in beta, but it's effectively useless since it's only in one region and there's no instances in that region that I've seen.
Jarvis: Didn't work for me when I tried them a couple months ago. The instances would never finish booting. It's also a pre-paid system, so you have to fill up your "balance" before renting machines. But their customer service was friendly and gave me a full refund so shrug.
GCP: This is my go-to so far. A100s are $1.1/hr interruptible, and of course you get all the other Google offerings like persistent disks, S3, managed SQL, container registry, etc. Availability of interruptible instances has been consistently quite good, if a bit confusing. I've had some machines up for a week solid without interruption, while other times I can tear down a stack of machines and immediately request a new one only to be told they are out of availability. The downsides are the usual GCP downsides: poor documentation, sometimes weird glitches, and perhaps the worst billing system I've seen outside of the healthcare industry.
Vast.ai: They can be a good chunk cheaper, but at the cost of privacy, security, support, and reliability. Pre-load only. For certain workloads and if you're highly cost sensitive this is a good option to consider.
RunPod: Terrible performance issues. Pre-load only. Non-responsive customer support. I ended up having to get my credit card company involved.
Self-hosted: As a sibling comment points out, self hosting is a great option to consider. In particular "Having the dedicated hardware also meant that researchers were willing to experiment more". I've got a couple cards in my lab that I use for experimentation, and then throw to the cloud for big runs.
May also suggest the suite of open source DeepView tools and which are part of PyPi. Profile and predict your specific model training performance on a variety of GPUs. I wrote a linked in post with usage GIF here: https://www.linkedin.com/posts/activity-7057419660312371200-...
"Those god damn AWS charges" -Silicon Valley.
Might as well build your own GPU farm. Some of these cards, used you can probably get for 6K (guestimating).
That would imply that the current AI cycle would be able to persist at its current levels of frothiness indefinitely: In the in-between lull periods, these GPU farms would be seen as something to sell off. This doesn't even take into account the eventual depreciation of the GPUs in question, as better GPUs/accelerators come into the market.
Most companies have an AWS account that they can throw on more money at for 'AI research & implementation'. With such an account existing in the first place, along with said price depreciations, the company in question would have to be certain that they'll use said GPUs all the time to make up for the upfront costs they'll be putting up with.
Often you are better off running certain workloads on lesser GPUs. But this requires certain tricky compiler-level optimizations. For example, can run certain LLM inference with comparable latency on cheaper A40s vs running on A100s. Could also run on 3090s (sometimes even faster). This helps with operating costs but may also resolve availability constraints.
Given the heterogeneous nature of GPUs, RAM, tensor cores, etc. it would be nice to have a direct comparison of, say, number of teraflops-hour per dollar, or something like that.
If you look through the throughput/$ metric, the V100 16GB looks like a great deal, followed by H100 80GB PCIe 5. For most benchmarks, the A100 looks worse in comparison
Demand outstrips supply by such a wide margin it doesn’t matter one bit how they compare.
The cheapest 8x A100 (80GB) on the list is LambdaLabs @ $12/hour on demand, and I’ve only once seen any capacity become available in three months of using it. AWS last I checked was $40/hr on demand or $25/hr with 1 year reserve, which costs more than a whole 8xA100 hyperplane from Lambda.
I was very hopeful when I saw "AMD Support" listed on a few of those providers, but that appears to only refer only to AMD CPUs. It is, unfortunately, very difficult to find public cloud providers for AMD GPUs.
I know those cards are second class citizens in the world of deep learning, but they have had (experimental) pytorch support for a while now, where are the offerings?
> I know those cards are second class citizens in the world of deep learning,
It's worse than that. AMD cards aren't second class citizens, they're not even on the same playing field. ROCm can't compete with CUDA and its ecosystem at all, the most popular deep learning frameworks are only experimentally supported, and Nvidia ships more dedicated tensor processing cores for AI acceleration on their cards. Nvidia has a near monopoly in AI not because they're particularly amazing, but because it seems like AMD is just uninterested in competing.
With NVidia I can just buy any random GPU and expect it to work for everything I throw at it (at long as it has enough VRAM). With AMD it's a roulette, and only a handful of very expensive server/workstation GPUs (8 in total if I'm counting it right) are actually officially supported. It's a joke.
They need to better support their own products, and they need to officially support all of their consumer GPUs to expand their mindshare. They're not doing that. From what I can see they only seem to be interested in the traditional HPC space.
Why would anyone offer a strictly worse product unless it were lots cheaper to offer, which it isn't? (Even for non-AI use cases, I don't think AMD has much that's more attractive in servers?)
They are simply put not price competitive in instruction per dollar at the high end though they are starting to catch up. But to me the biggest reason is software wise they are behind Nvidia. NVidia might be considered a hardware company because of its gpus but they are underappreciated as a the software company that build tools for other to utilise its gpus.
They’re not price competitive anywhere in the line.
By the time you spend hours/days/weeks constantly dealing with random edge cases and issues with the poor software support of AMD you could have bought 2-3x the Nvidia hardware (minimum) and still come out ahead.
A comprehensive list of GPU options and pricing from cloud vendors. Very useful if you're looking to train or deploy large machine learning/deep learning models.
Looking to run a cloud instance of Stable Diffusion for personal experimentation. Looking at cloud mostly because I don't have a GPU or desktop hardware at home, and my Mac M1 is too slow. But also needing to contend with constant switching on/off the instance several times a week to use it.
Wondering which vendors other HN'ers are using to achieve this?
NVIDIA's EULA specifically forbids using consumer GPUs for data-center -like computation purposes. Even if that wasn't an issue, there are a number of issues with both providing power to multiple consumer GPUs and just fitting them into a case/chassis, as they vary in their physical sizes - often being 1.5-2x in width compared to the similar level enterprise GPU.
Vast is a grey-market with very few security measures in place. A host can snoop on your workloads very easily for example because they have full access to the docker host.
You're also limited in that you can only deploy a single container so if you just wanted some spot nodes to scale out your existing Dagster cluster etc.. You're not going to be able to cleanly plug Vast into your existing infrastructure deployment process.
But if you're just playing around or don't have major security/data restrictions and only want a jhub notebook or nvidia-glx-desktop then yeah it's a very hard deal to beat.
You can purchase a 3090 for $1000, assuming you're going to use it 24/7, at 450W you would use about 1000kWh, at $0.10 / kWh, it would pay itself in about 3 months.
That's not a typical use case during development though. People need fast feedback loops: they'd rather rent multiple GPUs and have the result the next morning, than waiting for days with no guarantee of success.
So unless you have stable tasks that need to run continuously, or have enough users to keeps your GPU clusters busy, your GPUs usage would be quite bursty: some period of high activity then a lot of time idling.
I'd argue the opposite. Having to spin up and down instances for development is a huge PITA, the tooling sucks, and the instance might not even be available the next time you need them. It also stresses me out personally, because I'm worrying about getting productive use out of every minute. Whereas my little GPU cluster (Despite a big upfront cost) costs nothing but electricity and runs 24/7.
I agree that buying is a good choice, but your calculation is wrong. 3 months of renting will take 0.2*24*30*3=$432, which won't even cover half of the base price.
So you save 500$ on cloud costs, spend 4k on developer time and delay a 250k dollar app launch by 2 weeks.
When you're just messing around, this is great. Home brew gpu labs are fun and cool. As soon as you bring economics into it and start valuing time, it's a non-starter.
Out of curiosity. Do mostly use/want one GPU or the full server with all GPUs (8x A100 80GB, or 16x A100 40GB but I think only Google Cloud has those)? or a mix?
TPU and other accelerator performance varies by application... And even in a hugely popular config (like finetuning LLaMA with JAX) its hard to find a good benchmark. But generally speaking Google charges a pretty penny for TPUs
Accelerators outside TPUs are exotic. Off the top of my head... Cerebras only offers their WS2 as a 1st party "pay for a specific training job" kinda thing. Intel Gaudi 2 is supposedly good but is mysterious to me, and Ponte Vecchio has barely started shipping. Graphcore and Tenstorrent chips in the wild seem kinda long in the tooth for big training jobs. The AMD MI300 is not shipping, and the older AMD Instincts are difficut to find in cloud services (maybe because they got eaten up for HPC?)
Lots of other promising accelerators (with my personal favorite being the Centaur x86 "accelerated" CPUs, perfect for dirt cheap LLM inference/LORA training) died on the vine because of the CUDA moat, and I think more will share the same fate.
The only H100s available to get is the "little brother" PCIe version with HBM2e memory instead of HBM3. Very very few clouds, or people, in general, have H100 deployed at any scale. You will see when the MLPerf benchmark results come out.
One thing that always blows my mind is how much it is just not worth it to train LLMs in the cloud if you're a startup (and probably even less so for really large companies). Compared to 36 month reserved pricing the break even point was 8 months if you bought the hardware and rented out some racks at a colo, and that includes the on hands support. Having the dedicated hardware also meant that researchers were willing to experiment more when we weren't doing a planned training job, as it wouldn't pull from our budget. We spent a sizable chuck of our raise on that cluster but it was worth every penny.
I will say that I would not put customer facing inference on prem at this point- the resiliency of the cloud normally offsets the pricing, and most inference can be done with cheaper hardware than training. For training though you can get away with a weaker SLA, and the cloud is always there if you really need to burst beyond what you've purchased.