well an m5ad.24xlarge is 96 threads and 384G with your own 2xSSD (saves on EBS bandwidth costs). So fewer threads but a bit more memory. (we'll guess that is a 48 core EPYC 7642 equivalent with 96 threads since there is no 96 core version)
How does having your own SSD save on bandwidth? You have to send the data to and from the cores where the computation is taking place.
There are certainly some workloads where it makes sense to own your own storage and rent computation, but you can't assume that by default for a "powerful AI" workload.
There is no exact cloud equivalent - the researchers used commodity hardware for their GPU, something that NVIDIA doesn't allow for use in data centres.
The closest you can get on AWS (more like System #3 in the paper with 4x GPUs) would be something like a p3.8xlarge instance [1] that'll cost you $12.24 (on demand) or $3.65 to $5 (spot price, region dependent) [2].
A single GPU instance (p3.2xlarge) only 16 vCores, though) will cost you $3.06 on-demand or $0.90 to $1.20 (spot).