Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's an interesting option. My gut instinct is that if you need need 128GB of memory for a giant model, but you don't need much compute - like fine tuning a very large model maybe - you might as well just use a consumer high core CPU and wait 10x as long.

5950X CPU ($500) with 128GB of memory ($400).



CPU-addressable RAM is not interchangeable with graphics card RAM, so unfortunately this strategy isn't quite in the right direction.

AFAIK it flat out won't work with the DL frameworks.

If I'm mistaken please do speak up.

Edit: Thank you JonathanFly, I didn't know this!


All the frameworks work on CPU. At the time I tried it, the 5950X was about 10x slower than my GPU, which was a 1080Ti or 2080Ti. GAN not a transformer though.


I think they are saying train (or at least fine-tune) on a CPU.

This can work in some cases (years ago I certainly did it for CNNs - was slow but you are fine tuning so anything is an improvement) but I don't know how viable it would be on a transformer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: