From how I understood it, it means they optimised the entire stack from CUDA to the networking interconnects specifically for data centers, meaning you get 30x more inference per dollar for a datacenter. This is probably not fluff, but it's only relevant for a very very specific use-case, ie enterprises with the money to buy a stack to serve thousands of users with LLMs.
It doesn't matter for anyone who's not microsoft, aws or openai or similar.
It's a weird graph... It's specifically tokens per GPU but the x-axis is "interactivity per second", so the y-axis is including Blackwell being twice the size and also the increase from fp8 -> fp4, note it will needs to be counted multiple time as half as much data is needed to be going through the networks as well.