A cluster of 6 year old 24GB NVIDIA Teslas should do the trick...they run for about $100 apiece. Put 12 or so of them together and you have the VRAM for a GPT3 clone.
Amazon has them listed at $200, but still, that's only $2,400 for 12 of them.
Still, adds up once you get the hardware you'd need to NVlink 12 of them, and then on top of that, the price of power/perf you get probably isn't great compared to modern compute.
Wonder what your volume would have to be before getting a box with 8 A100's from Lambdalabs would be the better tradeoff.
If you have time to wait for results then sure, it could work in theory but in practice they are so slow and power inefficient (compared to newer nodes) that no one uses them for LLMs, that's why they cost ~200$ used on ebay.
I just checked ebay and they are shockingly cheap. I can't even get DDR3 memory for the price they're selling 24GB of GDDR5... with a GPU thrown in for free.
Why is this? Did some large cloud vendor just upgrade?
Are there any deals like this on AMD hardware? Not having to deal with proprietary binary drivers is worth a lot of money and reduced performance to me. A lot.
These are pretty old, and all the companies are upgrading. But no one is upgrading from AMD hardware - basically no companies care if they use proprietary drivers. They want a good price-to-performance ratio, so they use NVIDIA stuff.