How fungible is that compute though? Having even a single H100 is different than...

vineyardmike · 2024-10-11T09:30:40 1728639040

That’s the point. You can run inference on a 4090 but training is better on a H100. If you use llama, you don’t need to train on an H100, so you can free that supply up for meta.

fragmede · 2024-10-11T09:52:08 1728640328

I haven't been following llama closely but I thought the latest model was too big for inference on 4090's, and that you can't fine tune on 4090's either, but furthermore, the other question is if the market is there for running inference on 4090s.

vineyardmike · 2024-10-11T17:00:10 1728666010

Well, (1) there are a ton of GPUs out there of various specs, and you can also use an inference provider who can use a H100 or similar to serve multiple inference requests at once. (2) there are a ton of LLAMA sizes, from 1b, 2b, 8b, 70b, and 400b. The smaller ones can even run on phone GPUs.