I think something about distributed storage which is not appreciated in this article:
1. Some systems do not support replication out of the box. Sure your cassandra cluster and mysql can do master slave replication, but lots of systems cannot.
2. Your life becomes much harder with NVME storage in cloud as you need to respect maintenance intervals and cloud initiated drains. If you do not hook into those system and drain your data to a different node, the data goes poof. Separating storage from compute allows the cloud operator to drain and move around compute as needed and since the data is independent from the compute — and the cloud operator manages that data system and draining for that system as well — the operator can manage workload placements without the customer needing to be involved.
Good points. PlanetScale's durability and reliability are built on replication - MySQL replication - and all the operational software we've written to maintain replication in the face of servers coming and going, network partitions, and all the rest of the weather one faces in the cloud.
Replicated network-attached storage that presents a "local" filesystem API is a powerful way to create durability in a system that doesn't build it in like we have.
Agreed, if you are a mature enough and well funded organization you probably should be using NVME and then run distributed systems on top of the NVMEs to manage replication yourself.
AWS, for one example, provide a feed of upcoming "events" in EC2 in which certain instances will need to be rebooted or terminated entirely due to whatever maintenance they're doing on the physical infrastructure.
If you miss a termination event you miss your chance to copy that data elsewhere. Of course, if you're _always_ copying the data elsewhere, you can rest easy.
1. Some systems do not support replication out of the box. Sure your cassandra cluster and mysql can do master slave replication, but lots of systems cannot.
2. Your life becomes much harder with NVME storage in cloud as you need to respect maintenance intervals and cloud initiated drains. If you do not hook into those system and drain your data to a different node, the data goes poof. Separating storage from compute allows the cloud operator to drain and move around compute as needed and since the data is independent from the compute — and the cloud operator manages that data system and draining for that system as well — the operator can manage workload placements without the customer needing to be involved.