I always think this when people talk about scaling. You can buy off the shelf now boxes with 48 or 96 cores, 1-2T of RAM, internal bays for 10s of T of SSD or connect it to an AFA and get 100s of T. This is not even an exotic custom build system, just a commodity, and has been for years. Running a recent version of a conventional database on a box like this goes a very, very long way, with very little hassle because you don't even need to think about "sharding", and you can always add a hotstandby for offloading reads, or for redundancy in another DC, or whatever. Systems like this can quite happily bottleneck on the network before the database starts breaking a sweat.
Remember, sharding isn't scaling the database. Sharding is admitting your database can't scale so you're offloading the problem to another layer.
It's also about locality of your data. Having a global infrastructure means 150-200ms minimum latency per query if you have a system in India or Singapore with a database server in a US data center region. That adds up quick.
That is orthogonal concern to what is being discussed if you run geo distributed Mongo cluster you will either have slow queries or will compromise on data integrity.
It's not orthogonal because it is harder to have master-master cross data center setups in relational databases even in scenarios where eventual consistency is acceptable (such as a use case for a piece of our infrastructure we have).
Sure MongoDB may not be the best fit for it but my point is more that these are scenarios where horizontal scaling is an important consideration makes more sense for some nosql solutions than for sql solutions. It's not just about single box performance.
Not a fan of AWS or cloud in general but accepted that this the direction the world is going for now. The AWS instance type was purely for illustration purposes.
I think you should reconsider that dismissal: everyone needs n > 1 instances for reliability while increasingly few tasks require more {CPU,RAM,IOPs} than a single server can provide. That means that a growing percentage of problems will require clustering for reliability more than performance, and that favors the easiest to manage since in most cases every option will be fast enough.