Most applications are not so IOPS limited that they depend on the difference between PCIE request latency and going out over the network to get stuff that's actually stored on a nearby rack. And in that case what Kubernetes offers (with the help of cloud providers) is fine. You make a storage class. You make a persistent volume claim. Your pod mounts that. Not all that hard. If the performance isn't good enough, though, then you have to build something yourself.
I am used to a completely different model that we had at Google. You could not get durable storage in your job allocation. Everything went through some other controller that did not give you a block device or even POSIX semantics, and you designed your app around that. If you needed more IOPS you talked to more backends and duplicated more data.
Meanwhile in the public cloud world, you get to have a physical block device with an ext4 filesystem that can magically appear in any of your 5 availability zones, provisioned on the type of disk you specify with a guaranteed number of IOPS. It's honestly pretty good for 90% or even 99% of the things people are using disks for. (In my last production environment I actually ran stuff like InfluxDB against EFS, the fully-managed POSIX filesystem that Amazon provides. It did fine.)
Only if your (storage) network is slow (eg 1/10/25 GbE).
With a decent network (eg modern infiniband, 40+GbE, etc) for the storage, the latency and throughput to the storage shouldn't make a difference.
For example (years ago), I used to set up SSD arrays - SATA at the time, as M.2 wasn't a thing - and have them served over a 20Gbs Infiniband network to hosts in the same data centre.
The access times from an OS perspective to hit that storage over the network were the same as for hitting local disk. But the networked storage was higher bandwidth (multiple SSD's, instead of a single per host).
Worked really well. :)
Looks like it's still an active project too, as there's a new release listed from November 2018.
EBS is a compromise, its allows your data to run anywhere in a region, on any machine. That means lots of hops and lots of interconnects.
the latest SAS stuff runs at 12 gigs(most likely 4 lanes per cable, and dual linked too.), _but_ thats dedicated for local traffic. The performance difference between having a SAS disk inside a box, or in the next rack is negligible. A decent SAN that exports over a 56gig connect (inifiniband et al) will be >> than a local pcie in terms of iops and sustained bandwith (at the expense of latency.)
Crucially, in somewhere that runs it's own datacenter, the storage is modelled for a specific workload. In VFX land we had 32 60 disk file servers, each capable of saturating two 10 gig ethernet links. But it'd be appalling for mixed VM hosting.
EBS is a hedged bet, it had a _boatload_ of caching to make any sort of performance. It had a huge amount of QoS to stop selfish loads stealing all the IOPs.
To get the best performance from EBS you have to have large volumes (5Tb+) and large latest generation instances (c5, r5, m5) 4 or 8/9 xlarge ec2 instances. This costs money to run.
of course there are a lot of systems where one MySQL DB instance is needed and sufficient for the forseeable future. so that's a very static resource allocation one VM on a hypervisor with local disk, (and with backups every day), makes a lot of smaller sites very happy, and if they get the local provisioned performance instead of the EBS variant they will be happier for longer.
of course, EBS is a lot more fault tolerant.
that should read AZ
EBS is not your typical SAN, which usually offer much more throughput, IOPS, and reliability in exchange for more complexity and latency, however you probably won't even notice that latency if your SAN system is close enough and using highend links.
Furthermore, you'll never get close to saturating a 40gbps link with a database workload, your limitation is iops, not throughput, and you don't have anywhere near enough cpu to max out those links in a loaded server much less a nuc.
You usual run of the mill enterprise use most of this horsepower to heat air and have large clusters of oracle db’s power even more expensive SAP modules.
There’s a reason no serious iaas provider build infra this way.
I have one example, actually an old customer to the vendor that used to employ me, that had probably well over 1k of tennants all with VM’s and DB’s backed by humongous SAN (replicated ofc!).
Not nice when corrupt data was mirrored due to a software bug in said SAN system.
This went to a national level, really, due to the nature of many of the customers.
Sure you can build a reasonable architecture using the products and vendors mentioned, but, it doesn’t scale and it really locks you in.
I’ve seen and worked this stuff at numerous enterprises.
This only makes it my experience and one datapoint, so please bear that in mind! =D
The reason there is a standing joke with your coworkers is because you don't work on important things that run the world. When 1 minute of downtime costs you over $1mil, your "ripped off" Oracle costs, your SAN costs, etc, are lost in the rounding errors. I can have a SAN have hundreds of arrays, across multiple datacenters, and dynamically grow and shrink my storage needs. It is the definition of scalable. I can have one VM farm vmotion to another VM farm 30 miles away transparently, which will sit on a different array attached to the SAN. What happens when the storage needs of your server outgrow the drives you can shove in there? There are servers and databases a petabyte in size, pushing a million IOPS. They're in charge of money. If there is corruption on the SAN, well it infrequently happens, just like with servers and memory. Twice in my vast experience. You can roll back all writes on the replication software, and usually keep an undo journal a few days long, and something like hourly snapshots. This also protects against cryptoviruses and other types of corruption.
This is why people like you work at small companies, fiddling with your cute little projects, while the world moves forward with you on the sidelines. I've been doing this for 20+ years, and have been at most fortune100 companies, in 80 countries. But yeah, your opinion, while being dismissive, is cute.
The CIO is the guy in charge of getting this stuff at enterprises. It's clear you don't understand the impact of design. I am guessing you are not at an architect level. There's a reason for that, and a reason other people - the ones you put down in your post, are making the big decisions. People like you would cost the company millions of dollars in loss per year.
Again, if you read what I said you'll notice that I pointed out that you actually can build decent architectures using enterprisey stuff, it will however never scale horizontally in a resonable way.
What happens when you have filled all your shelves in the array box? Ah, you'll need a new $250K box...
Mentioning vmotion and all... yeah so... VSAN et al... brrr... I'll put my trust in the open source world rather than shoddy software from 3rd party vendors trying to appease to the latest fad. Hyperconverge my *ss! =)
KISS is what rules but vendors are busy feeding channels and partners with impossible to penetrate acronyms leaving you in a mess sooner or later.
Focus from these vendors is to keep integration at a minimum, which means API's are usually crappy and achieving a reasonable level of automation is often a chore.
Now there is probably a little percentage of "enterprises" that uses tech in a sane way, but my bet is that the CIO you mention is a blockchain expert, as well as putting AI at the top of strategic actions to "implement" this year.
He probably have commissioned a pre-study from Accenture that outlines these strategic imperatives.
This is of course a little rant, but also true for the 90% of the 90% mentioned.
Don't be defensive and close minded. That's what's leaving most enterprises in the dust.
There's just a lot of inertia inherent within certain businesses that will let them continue to burn through cash on pointless tech for yet a while.
Keep it old-school!
Vsan btw is vmwares attempt to actually achieve _horizontal_ scalability.
I just don’t trust vmware with anything except for that bytecode vm. It used to rock, but time went ahead, and even as they IPO’d I thought they would end up dead in the water eventually.
I had just been introduced to the wonderful zones in Solaris and it just made hypervisor vm’s seem silly.
And here we are with cgroups and friends...
Have a look at ceph if you’re interested: http://docs.ceph.com/docs/master/architecture/
Anyway, I’m out.
If your horizon extends to what vendors are prepared to sell you, take that 50 discount and run with it!
One final thought: is that disk not ”internal” to the array? This is what makes blockstorage notoriously difficult to scale horizontally.
You’ll need very clever software!
What "box" - the 42U Rack? I don't think so. The rack is always free. You then have DAs connected to the disk, and FAs connected to the SAN, which are on a pair of directors. You literally cannot get those w/o disk.
You don't know what a SAN is, you've never seen an itemized quote for an array. Thanks for your link. It's like sending a hooked on phonics link to an English professor. Cute. Keep it cute! People like you are the reason people like me get paid a lot.
I was certain I had nothing more for you, but this is too much fun!
You clearly don’t grasp the difference between vertical and horizontal scalability which means you have never been subjected to a bunch of scenarios requiring the latter.
Dinosaurs taking the p*ss are the reason many enterprises opt to off-shore and out-source.
there are, literally, zero enterprises off-shoring their hard IO hitting datacenters. In fact, having been in 80 different countries for these enterprises, they usually have many datacenters all over the world.
VSAN is for hyperconverged systems only. racks with a thousand 1u nodes that all have disk, connected to a fat ethernet backplane. Things on a SAN are not hyperconverged - they are on a SAN. VSAN is for tier 2 stuff that goes on hyperconverged - like web servers and DMZ things. The fastest processing is done on solid databases like Oracle or UDB on clusters of large servers, connected to a SAN. VSAN is even positioned by sales for tier 2 from all major vendors.
You literally picked up some technical words you heard around the office, googled a few things, and now consider yourself and expert so you give authoritative opinion on here about things you have never worked with. I bet you are deskside support or a code monkey, and have never architected a solution. When someone gives you a budget of 20mil and says you can average 1min of application unavailability per year or it impacts billions in company's bottom line and your whole team gets fired, do let me know. I'm sure your suggestion would be to cluster together a bunch of old dell laptops over wifi.
So, you’re saying vsan for horizontal scalability and you regular san for single point of vertical scalability.
Yup, sounds about right!
I’ve only consumed vsan as a dev, and that turned out... not so good.
Scaling performant storage horizontally is a science and no fun to troubleshoot.
Implying you get stuff for free from EMC is just a bad joke, or you have no clue how high the markups are.
It does not matter what the invoice tells you, bottom line counts.
Last time I was involved, admittedly many years ago, the enclosure actually came with a price tag.
I was there when the company I used to work for was first through the gates in emea to san boot their physical x86 boxes. What a ride and very little gain. Costed a fortune tho!
I’ve worked at a hardware/software vendor (499 of the 500) as an architect within professional services which means I have way to much exposure to the madness going on at the 500s.
I’ve built all flash boxes running zfs as well as xfs for specific scalability needs for customers, usually on-prem clouds but other use cases as well.
Commodity hardware. Blazing performance.
If I need proper storage experts I turn to the zfs dev mailing list f ex.
If you could lower your guard for a sec you’ll see that I have no less than two times pointed out that you can build decent infrastructure with precious enterprise gear.
You’ll pay through your nose tho, and personally I’d rather hire 5-10 super generalist FTEs rather than a vendor labeled expert and a bunch of hardware.
I’ll leave you with this oldie but goldie: https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-...
I know: “that’s not SAN”, but the point I’m making is the mark-up. If you are working at a place that allows you to play the lone ranger and blow everyone away with three letter vendor acronyms, good for ya’!
Lone rangers are the biggest blocker for most enterprises from a technical perspective, second only to politics.
People with a clear agenda spewing fake crap gets upvoted here now. Disagree with a mod - your comments are shadow removed, and possibly your account is shadow banned, with no warning, so the insecure mod can feel better.
I give actual valuable information from 20 years of experience at almost all the fortune100 firms. Snarkness? Yes. Much better and funnier than the guy I replied to.
I know you disagree. That's specifically why I include sentences like the last - to draw attention to the issue. comment karma has become useless here. Stuff is greyed out, stuff is shadow deleted - good things, filtered by idiots. I don't want my content filtered by insecure idiots.
I've fought all my career to get "enterprise" folks to understand that spending $100s of mill on hardware vendors and 3rd party software is madness, if what you are doing is actually mission critical.
Money should be spend on acquiring talent and adapting to actual business needs.
The business you support could not care less about your shiny toys, they want to transform, and probably yesteryear!
A digression here, from one of your other posts: if your business applications rely on live migration of vm's, you're doing it wrong.
You should take this up with your CIO or even better the CTO, but he's probably busy looking at Gartner magic quadrants and contemplating the dire need of a blockchain.
Keep an open mind, and keep it simple. Best advice ever in this crazy business!
no one contemplates blockchain, but for someone CIO level, gartner quadrants present a quick high-level summary of where the industry is going. and $100mil+ for hardware is completely normal to run things that process billions of dollars. The US treasury department is a great example. Mastercard is another.
But if it floats your boat — awesome!
It is not just IOPS but latency. Some of my production servers are connected one-to-one by crossover cable, in clusters with as many extra nics as needed (say 3 machines: 2 extra nics per machine) just to shave that little extra overhead of going through a router as yes, it matters in some applications!
I would be curious if the SFP for 1G using fiber also has lower latency than copper 1000baseT.
In general, a distributed database can be really fast. DynamoDB claim to be 3ms (AWS reinvent 2014). The current technology can probably achieve close to 1ms, which is equivalent to memcache. With such performance, you don't really need local storage.
You can get much faster IO if you use many local SSDs. The downside is utilization. It is very rare a single machine has a workload that fully utilize local disk. You end up over provision greatly fleet-wise to get high performance. A managed database over a network is more likely to utilize disk/SSD throughput.
This is one of the things that gets me. If you get the last .1% that can ever be gotten, what do you do for an encore?
When your userbase grows another 5% or 10% or 20%, it won't be enough. You'd be better off trying to figure out ways to give your users something they want that doesn't require most exotic thing that can be procured. It's expensive to begin with, and it's the end of the road. You don't want to be at the end of a road trying to figure out what to do next.
They have ridiculous numbers of servers. Rather than fiddling to reduce server count, they're improving their ability to scale out. It's cheaper and it's repeatable. Doing crazy things like building custom hardware and power distribution buses (Facebook) to reduce heat in the data centers.
There are services where latency is important, they launch multiple requests to different replicas but to optimize this when one of the replicas is serving the response it sends a request cancel to the other one.
Horizontal scaling and vertical scaling is both very important.
Both G and FB pours many engineering hours into maximizing per server "ROI". Just think about how much they work on scheduling work/tasks so they can do more with the same number of servers.
https://rook.io (built on ceph)
https://www.openebs.io (built on Jiva/cStor)
The distributed storage problem is difficult on it's own, this isn't a kubernetes specific issue. IMO Kubernetes is improving on the state of the art by dealing with both the distributed storage problem and dynamic provisioning, and that's why it might seem somewhat flustered.
What isn't shown here is how effortless it feels when you do have a solution like rook or openEBS in place -- we've never seen ergonomics like this before for deploying applications. Also, the CSI (Container Storage Interface) that they're building and refining as they go will be extremely valuable to the community going forward.
BTW, if you want to do things in a static provisioning sense, support for local volumes (and hostPaths) have been around for a very long time -- just use those and handle your storage how you would have handled it before kubernetes existed.
Shameless plug I've also written about this on a fair number of occasions:
I've gone from a hostPath -> Rook (Ceph) -> hostPath (undoing RAID had issues) -> OpenEBS, and now I have easy to spin up, dynamic storage & resiliency on my tiny kubernetes cluster.
All of this being capable as someone who is not a sysadmin by trade, should not be understated. The bar is being lowered -- I learned enough about ceph to be dangerous, enough about openebs to be dangerous, and got resiliency and ease of use/integration from kubernetes.
Starred the GitHub repo... and a few minutes later received email spam from them to my personal email (it's in my GitHub profile). :(
Completely lost interest in their project at that point.
Probably best to skip it, as rewarding spammers doesn't lead to good things. :(
Also, I wouldn't be so quick to write them off -- their solution is based on Container Attached Storage (CAS), and is the only relatively mature solution so far I've seen (I haven't seen any others that do CAS) that sort of take the Ceph model and turn it inside out -- pods talk to volumes over iSCSI via "controller" pods, and writes are replicated amongst these controller pods (controller pods have anti-affinity to ensure they end up on separate machines).
I have yet to do any performance testing on ceph vs openebs but I can tell you it was easier to wrap my head around than Ceph (though of course ceph is a pretty robust system), and way easier to debug/trace through the system.
Reaching out to the occasional person who stars a project, if there's some strong overlap of stuff then maybe sure. An automatic spam approach though... that's not on. That makes starring projects a "dangerous" thing for end users, as they'd have to be open to emails for every one. :/
And yeah, it did look useful up until this point. Lets see what their response to the GH issue is like. :)
> Static provisioning also goes against the mindset of Kubernetes
Then the mindset of Kubernetes is wrong. Or at least incomplete. Persistent data is an essential part of computing. In some ways it's the most important part. You could swap out every compute element in your system and be running exactly the way you were very quickly. Now try it with your storage elements. Oops, screwed. The data is the identity of your system.
The problem is that storage is not trivially relocatable like compute is, and yet every single orchestration system I've seen seems to assume otherwise. The people who write them develop models to represent the easy case, then come back to the harder one as an afterthought. A car that's really a boat with wheels bolted on isn't going to have great handling, but that's pretty much where we are with storage in Kubernetes.
The Borg/Omega model kind of assumes you have a Google-like storage tier interconnected with 10GbE links that is automatically replicated and accessible from potentially anywhere. Once you have that, then storage on k8s is "easy".
Presumably that means that on every Google machine there is a directory that is shared for the entire cluster BUT:
1) It is per-cluster, not global (given Google's cluster sizes one imagine's that's not that much of a limitation, but then again, it seems like that is a clear limit)
2) you can only create or append to files (or delete them I guess) (and this is therefore non-posix, and does not support things like mysql or postgres)
3) this is very different from what GlusterFS, NFS, persistent volumes, etc provide. Therefore disks on google cloud are presumably very much not just files on this GFS/colossus thing.
4b) it was single master at least until 2004. Maybe until 2010. Apparently that can work.
For the most part, google only has custom applications.
Just about everything is written in-house, and takes advantage of things like being able to open a file from the local disk file just as easily as opening one in Colossus or their equivalent of Zookeeper.
Global might mean you're accessing blocks stored in Europe from a container running in Australia. If your workload is doing that, you might want to consider network costs and latency, and then reconsider whether you really want to architect things that way.
This "solves" the persistent storage layer in a way Kubernetes cannot. Inside Google you might deploy your "Spanner" database container which knows how to interface with Colossus and doesn't require any special setup or colocation (below the cluster level). You cannot deploy MySQL on K8s and expect the same.
The most obvious example of this is the need to prematurely scale horizontally and deal with ephemeraliy (even compute).
Also, whatever unnecessary scaling solution you come up with today for the problems you have today is not likely to be applicable to the problems you have years down the road; features will have changed, apps will be rewritten.
I think people underestimate how much a handful of properly specd servers can achieve. My best experience with "at-scale" was 25+ million users, heavy API usage (and the API hits weren't trivial / hard to cache), and we could handle all the load on 2 beefy servers (but had more for HA and lower latency (they were distributed)).
Quick search and I found a dual EPYC 7281 with 256GB DDR4, 2TB nvme, 2TB ssd, and 12TB HDD for $500/m in LA. Now it obviously depends on what you're doing, but for most startups to max this out, they're either extremely successful or extremely bad software engineers.
And if you're willing to actually get a physical server, Dell happily sells single nodes with 6 terabytes of RAM for a few tens of thousands. Other vendors go much higher.
> Quick search
Even AWS (i.e. the very pricey one) has an x1e.2xlarge with 4.5 physical (9 hyperthreaded) Xenon E7 8880 v3 processors, 244GB DDR4, 240GB SSD, 25 Gbps network for $1200/month paid monthly or $700/month paid annually.
And that scales linearly 16x.
You've really got to be something amazing (maybe a ton of video?) to scale past what a single commodity box can handle.
In the past, I've supported a few million daily active users on a single MySQL database.
Use cases obviously vary, your numbers may not line up.
But I'm pretty sure there are lot of startups that would be fine starting with an architecture that can scale to 2 million users per day.
I remember watching a big xbox/ps3 game launch crash and burn when 16 of the biggest MySQL servers we could buy couldn’t keep up. That was two jobs ago and embarrassing (and probably expensive). That was “at scale” for us.
Last job we started at about 60 m4 ex2 instanced and were well into the thousands when I left. I suspect they’re approaching 10k instances now. And they’re pre-IPO startup, and I think that was at scale.
Current job measures in the hundreds of thousands of database instances, and I only count one specific database engine. Probably counts as at scale.
Why is it so hard to hammer
a nail with a screwdriver?
Now, if you simply must have stateful storage, I've had a pretty good time with PVCs pointing to Plain Jane NFS volumes combined with node tainting and performing local replication of high need data on the pods as needed.
If your pods are scaling up and down or dying so quickly that this seems untenable, you have other fish to fry.
For applications deployed in cloud infrastructure, most folks are using persistent disk simply for the ease of management. Some folks end up going to local disk for scale and performance reasons, at which point they end up having exactly the same problem one would have trying to do the same with Kubernetes.
Service Fabric powers many Microsoft services today, including Azure SQL Database, Azure Cosmos DB, Cortana, Microsoft Power BI, Microsoft Intune, Azure Event Hubs, Azure IoT Hub, Dynamics 365, Skype for Business, and many core Azure services.
The only thing that at least seemed possible without additional software was nfs mounts but I was thinking it does not help the high availability cause to see the one and only storage node go down. Nor could I see any benefits to a cluster that used the same network for file access as external network requests. Say I wanted to build a horizontally scalable video sharing site I don’t see how I could test the real world performance of it before it went live.
Not sure if this is in a datacenter or not. In a datacenter, you could use something like 3-PAR to have network attached storage.
But storage is hard. This is one of the advantages of using cloud providers, they have this part figured out for you. AWS's EBS volumes are network-attached storage.
Then there's another layer of abstraction when you are using containers. Ok, so you have this volume accessible from the network now, in whatever form. How do you attach it to running containers? That's what this article is about, mostly. Not with the underlying storage mechanisms.
Thanks for the lead though.
FYI, Digital Ocean Kubernetes opened to everyone on December 11th.
The other thing I was quite interested in at one point was flocker, a volume manager which is responsible for replicating/migrating data along with the container it is connected to. The company shut down quite a while ago but code is still available: https://github.com/ClusterHQ/flocker
o managing state is hard. Kubernetes makes it look simple because we have been moving the state from apps to databases/messaging queues
o storing data durably at scale, with speed and consistently is a CAP problem
o distributed block storage is slow, complex and resource hungry
o distributed storage with metadata almost always has a metadata speed problem (again CAP)
o managing your own high performance storage system is bankruptingly hard
Ceph/rook is almost certainly not the answer. Ceph has been pushed with openstack, which is a horridly mess of complexity. Ceph is slow hard to mange and eats resources. if you are running k8s on real steel, then use a SAN. if you're on AWS/google use the block primitives provided.
Firstly, there is no general storage backend that is a good fit for all workloads. Some things need low latency (either direct attached, or local SAN) some can cope with single instances of small EBS volumes. Some need shared block sorage, some need shared posix.
With AWS, there is no share block storage publicly available. Mapping volumes to random containers is pretty simple. Failing that there is EFS, which despite terrible metadata performance, kicks out a boat load of data. If you app can store all its state in one large file, this might be for you.
Google has the same, although I've not tried their new EFS/NFS service.
But as the OP writes, the standard case it should support is secure access to existing storage, and Kubernetes fails there on all levels. Simple things should be simple, hard things should be possible.
In its API design, it hides storage under abstractions like PVC, PV, StorageClass that most users have not seen before in their career (where do they come from btw?) and which do not re-use general system architecture concepts. Worse, these abstractions do not have exact definitions as their semantics can be massaged with access control rules.
Consider the most simple case: mounting an existing file systems and authenticate with a Kubernetes secret. You should be able to do this in 3-4 lines in a pod defnition. Instead you need to create several yaml files, whose content is strongly dependent on how your Kubernetes was set up, so no general tutorial will help you (but others are, see above).
And this is just for getting the basics going. Secure access / user authentication is still unsolved after two years (https://www.quobyte.com/blog/2017/03/17/the-state-of-secure-...), and does not seem to be high on the agenda neither in Kubernetes nor CSI.
There are other basics missing (like file systems do not have necessarily a "size"), but let's not go into detail there.
K8s grew organically here, and thus the seams show. It’s getting better every release, particularly being able able to dynamically provision and schedule storage to pods in a “zone aware” manner, which has been tricky.
The deeper issue issue is we are spoiled for choice on storage engines. Using Ceph for random access r/w low latency block storage I wouldn’t wish on my worst enemy, for example. But it’s hard to distinguish hard numbers for comparative purposes.
The rest of the stateless compute cores can run on kubernetes.
EDIT K8s makes storage very easy. No “seems.”
or, using storage in an environment where the real storage is completely managed for you makes storage seem easy..
Within a zone or a cluster, the latency is about 1ms, which is faster than most hard disks. The network bandwidth is on par with disk throughput. What we really need is a faster database and a faster object storage that can match the network performance (1ms and 10Gbps), then all workloads can be stateless.
If one uses a VM on GCP, the VM has no local storage besides the small local SSDs. Practically even the VM is stateless besides some cache.
Yes, and most storage you have access to, in cloud environments, is network attached. GCP disks, AWS EBS volumes, etc. All network and outside the hypervisors. You may have some local storage, but that's ephemeral, by design.
However, since we are talking about Kubernetes: not only VMs are ephemeral, but your containers are ephemeral too! And they can move around. So now you (or rather, K8s) need to figure out which worker node has the pod, and which storage is assigned to it, and then attach/detach accordingly.
This is what persistent volumes and persistent volume claims give you. They actually work fine already for StatefulSets.
Now, if you are in a cloud environment you should look into the possibility of using the hosted database offerings. If you can (even at a price premium), that's a great deal of complexity you are going to avoid.
Addressing this requires caching data in memory while making sure those caches are also disjoint so that you fully utilize your cluster memory. This has driven Google (and others) to make some services semi stateful and build dynamic sharing infrastructure to make this easier .
A good disk subsystem had less write latency than that in the early 90’s.
On the other hand, a single machine has limited reliability. If one wants to have high availability, they needs to dual write to another machine, which also has network latency.
what was the latency to the controller with ram cache?
using seek time as a measure is also somewhat worst case - controllers/filesystems also queue(d) according to drive geometry.