Hacker News new | past | comments | ask | show | jobs | submit login

How fungible is that compute though? Having even a single H100 is different than having a bunch of 4090's, nevermind a properly networked supercomputer of H100s.



That’s the point. You can run inference on a 4090 but training is better on a H100. If you use llama, you don’t need to train on an H100, so you can free that supply up for meta.


I haven't been following llama closely but I thought the latest model was too big for inference on 4090's, and that you can't fine tune on 4090's either, but furthermore, the other question is if the market is there for running inference on 4090s.


Well, (1) there are a ton of GPUs out there of various specs, and you can also use an inference provider who can use a H100 or similar to serve multiple inference requests at once. (2) there are a ton of LLAMA sizes, from 1b, 2b, 8b, 70b, and 400b. The smaller ones can even run on phone GPUs.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: