The only reason you need all those GPUs is because they only have a fraction of the ram you can cram in a server.
With AMD focusing on ram channels and cores the above rig can do 6-8 tokens per second inference.
The GPUs will be faster, but the point is inference on the top deepseek model is possible for $6k with an AMD server rig. 8 H200's alone would cost $256,000 and gobble up way more power than the 400 watt envelope of that EPYC rig.