>When they scale horizontally across thousands of commodity machines, then knowledge of their problem domain becomes encoded in the scaling decisions they make and stays internal to the company.
Or to put it another way: "they create a massive maintenance nightmare for themselves like the one described in the article".
>When they scale vertically by buying bigger hardware, then they are trading profits in exchange for having someone else worry about the difficulties of building really big, fast supercomputers.
You are overestimating the cost of high end servers, or underestimating the cost of low end ones. Again, their existing redis cluster is less RAM, CPU power, and IO throughput than a single, relatively cheap server right now.
>Instead of having a proprietary competitive advantage, they are now the commodity application provider
Twitter is a commodity application provider. People don't use twitter because of how twitter made a mess of their back end. People don't care at all about the back end, it doesn't matter at all how they architect things from the users perspective.
>while if their server vendor wants to raise prices, it has them by the balls since the whole business is built on their architecture.
What do you think servers are? They aren't some magical dungeon that traps people who buy them. If oracle wants to fuck you, go talk to IBM. If IBM wants to fuck you, go talk to fujitsu, etc, etc.
When two of your two servers die, you ... um, well, you lose money and reputation. Quickly.
- Servers that phone home; sometimes the first you know of a potential problem is engineers at your door come to service your server.
- Hot swappable RAID'ed RAM.
- Hot swappable CPU's, with spares, and OS support for moving threads of CPU's that are showing risk factors for failure.
- Hot swappable storage where not just the disks are hot swappable, but whole disk bays, and even trays of hot swappable RAID controllers etc.
- Redundant fibre channel connections to those raid controllers from the rest of the system.
- Redundant network interfaces and power supplies (of course, even relatively entry level servers offers that these days).
In reality, once you go truly high end, you're talking about multiple racks full of kit that effectively does a lot of the redundancy we tend to try to engineer into software solutions either at the hardware level, or abstracted from you in software layers your application won't normally see (e.g. a typical high end IBM system will set aside a substantial percentage of CPU's as spares and/or for various offload and management purposes; IBM's "classic" "Shark" storage system used two highly redundant AIX servers as "just" storage controllers hidden behind SCSI or Fibre Channel interfaces, for example).
You don't get some server where a single component failure somewhere takes it down. Some of these vendors have decades of designing out single points of failure in their high end equipment.
Some of these systems have enough redundancy that you could probably fire a shotgun into a rack and still have decent odds that the server survives with "just" reduced capacity until your manufacturers engineers show up and asks you awkward questions about what you were up to.
In general you're better off looking at many of those systems as highly integrated clusters rather than individual servers, though some fairly high end systems actually offer "single system image" clustering (that is, your monster of a machine will still look like a single server from the application point of view even in the cases where the hardware looks more like a cluster, though it may have some unusual characteristics such as different access speeds to different parts of memory).