Not even that, spot pricing on an 8 GPU instance (the 16xl I believe, the larger one has the same number of GPUs but more memory per GPU) is something like $6/hr. I use this for personal projects sometimes, get all data in S3, a good launch template, and then spin up a spot instance and be super efficient about training quickly. I've even run evals on a separate, cheaper, machine so the 16xl can spend all its time training. It's still not "cheap", but $50 for 8 hours of training on a machine like that with $64k of GPUs on board is really not bad.
Except that cloud-ified V100s are significantly less powerful than if you have direct access to the hardware. Last time I checked, in AWS they're actually external devices mapped in over GBit ethernet, which is significantly slower than the 8GB/s that PCIe x4 has.
I think you are confusing this with AWS Elastic Inference.
If you use AWS Elastic Inference, then you get networked attached devices. But these are Amazon's own (non-NVidia) devices and only used for inference, so it's not really comparable.
Presumably that depends on maximum PCIe bandwidth consumption before your workload bottlenecks elsewhere? A 2018 benchmark (https://www.pugetsystems.com/labs/hpc/PCIe-X16-vs-X8-with-4-...) seems to indicate that x8 isn't generally a bottleneck for common (at the time) workloads. x8 is a far cry from the claimed gigabit ethernet though!
AWS is tricky in terms of how storage is provisioned - I don't remember details, but it's easy to put your datasets on storage that is connected to your GPU servers over 1Gb link. That could easily become a bottleneck. Datasets should live on Elastic Block Storage or something like that, over high speed links. Again, it's been a while since I looked into that, so I don't remember the details.
The earlier comment claimed that the GPUs (!!!) were located elsewhere on the network; I suspect that the scenario you describe is what they intended to refer to.
(IIRC AWS offers compute optimized instances with a volume that's guaranteed to be backed by blocks on a local NVMe drive.)
I think they are confused with AWS Elastic Inference. That is a different thing which does have network attached accelerators:
Amazon Elastic inference accelerators are GPU-powered hardware devices that are designed to work with any EC2 instance, Sagemaker instance, or ECS task to accelerate deep learning inference workloads at a low cost. When you launch an EC2 instance or an ECS task with Amazon Elastic Inference, an accelerator is provisioned and attached to the instance over the network.