I'm pretty sure the insides of your 2 $1,000,000 servers are architecturally essentially identical to a large cluster of off-the-shelf computers, with somewhat different bandwidth characteristics, but not enough to make a difference. They'll have some advanced features for making sure the $1,000,000 machine doesn't fall over. But it's not like they're magically equipped with disks that are a hundred times faster or anything, you're still essentially dealing with lots of computing units hooked together by things that have finite bandwidth. You can't just buy your way to 25GHz CPUs with .025ns RAM access time, etc. etc.
Massively parallel single-machine supercomputers are still, essentially, distributed systems on the inside. You still have to use many of the same techniques to avoid communicating all-to-all. If you treat such a system as a flat memory hierarchy, your application will fall down.
True that they're still essentially distributed systems on the inside, but the typical bandwidth can often be orders of magnitudes higher, and the latencies drastically lower when you need to cross those boundaries, and for quite a few types of apps that makes all the difference in the world to how you'd architect your apps.