64 cores and 24 NVME drives in a 2U spot on a rack is just insane compared to what we used to have to do to get a beefy database server. And it's not some exotic thing, just a popular mainstream Dell SKU.
If you price it out on Dell's site, you get a retail price north of $200k. That is really what made it clear for me. That you could fit $200k+ worth of DIMMS, Drives, CPUS into a 2U spot :)
If you buy through a VAR and/or Dell reps you don't pay the price on the website. What you actually pay is typically significantly lower. I don't think anyone actually buys servers like these by just ordering from the website. We (Let's Encrypt) certainly don't.
These are expensive servers, crossing into six digits, but not $200k.
I’m looking at buying some fortigate firewalls to do some natting, looks like 200Fs will be fine, but even the fortigate sales guys won’t give me a price, I have to go to resellers who also refuse to give prices, which adds friction. Cisco exactly the same.
When looking at options, price is at the forefront of my mind, but sales guys want me to choose their company, and even commit to the specific device and numbers, before I even see the price.
Almost walked away from the fortigate option until I found avfirewalls which gave me a ballpark idea of what I could afford to implement, and what trade offs. The benefits of the fortigate over a mikrotik were worth it at that price, but it was painful getting the price out and they nearly lost the sale as I assumed it would be 10 times more.
Ebay has them for 3.2k and CDW new for 3.6.
- Hospitals have an interest in massively inflated high sticker prices to make as much money as possible from those who pay in cash, e.g. foreigners, and to make the debts look better for collection agencies - basically, assuming the true cost of a procedure is 1000$, when the hospital bills 1000$ and that goes to collections who buy the debt for 10% of value the hospital gets 100$, and if the hospital bills 10k $ and the collections agency pays 10%, the hospital gets 1000$ or the actual cost
- Insurances have an interest in high sticker prices because they will negotiate with the hospital to pay true cost + some markup anyway - so with a higher sticker price, they can claim that their insurance saves the buyers a higher percentage
- Employers have an interest in high sticker prices because they can market themselves as employers who provide a better health insurance than the competition
The people losing out in this gamble are those who cannot afford insurance and have to declare medical bankruptcy.
And in the case of hardware or even some "contact sales for a quote" SaaS it's the same end result: the ones who lose out are small businesses (who can't achieve the sales amount to qualify for cheap-ish rates), and the big companies with dedicated account managers have a nice life.
Past threads: https://www.reddit.com/user/bad0seed/submitted/
Probably in the range of $2100 to $2200 per unit from a x86-64 component distributor in moderate quantities.
"42% off list price: use code SERVER42"
Doesn't make the price reasonable exactly, but it's kind of funny.
In a way you could say the ones not calling a rep are subsidizing your cost for the server.
Such as a fairly low cost 4U threadripper box, running VyOS (ultimately based on a very normal debian Linux) with multiple Intel 100GbE NICs in PCI-E slots.
The Intel linux kernel (and FreeBSD) driver support for those is generally excellent and robust. Intel has had full time developers doing the Linux drivers for that series of cards since the earliest days of their 64-bit/66MHz PCI-X 10Gbps NICs about 17 years ago.
Also worth mentioning that FRR is now an official Linux Foundation supported project.
Works great. The PCIe card itself has 2 more slots for SSDs
The GPU is on the 2.0 x8 slot because they don't really transfer that much data over the lanes.
I honestly didn't realize PCIe was up to 4.0 now, and I am pushing up against the limits of PCIe 2.0 but it still works! And I’m “only” at the limits, and its only a limit when I want faster than 3,000 megabytes per second, which is amazing.
Granted, this would have been considered a good enthusiast motherboard in 2012. Buying new but cheap is the mistake.
So each drive acts like it has its own slower (but fast enough) pcie slot, and then the raid0 combines the bits back to double the performance.
Could be wrong but I get 2,900 megabytes per second transfers from RAM to disk and back.
And this is PCIe 2.0 x16
so maybe if you want , 3,000, 4,000 or 6,500 megabytes per second then I have nothing to brag about. I’m pretty amazed though and will be content for all my use cases.
4 TB RAM
Supermicro was cheaper by 20-30% had a quantitatively better power profile which basically meant it paid for itself and just rocked on every dimension. And their sales reps were able to answer every question we had including sending us a excel model of their power usage. The dell folks never got back to us on power usage.
I dunno why folks insist on Dell crapola.
Also "fun" is "The solid state storage I can buy for $100 is equivalent to all the world's computing storage in what year?"
I'm super fun at parties!
If we're talking about 90s or early 2000s Sun, probably. Even then, though, these systems probably had substantially better I/O performance.
I have a SPARC laptop lying around from 1995 that gets a whopping 20 MB/s in disk read/write speeds across its dual half height SCSI drives. That still beats all but the best SD cards.
The full size systems with SAS and other options would get even better disk performance.
I think the first servers I had in production had 8 MB RAM. No more, certainly. Soon we'll be at 1000x that. My dad's first "server" was 3 orders of magnitude smaller, with 8 KB of RAM (hand-wound wire core memory). In that time, the US population hasn't even doubled.
The listed servers are from few years ago so I guess there might be some bigger ones available these days.
Even the super slow ones like on kindle, are _choices_ that have been made in favor of something else. a second to turn a page on a book isn't unbearable
High-end Android phones have 120+ ms latecy. That's easily noticeable and actually annoying (at least to me).
My personal pet peeve is the latency of input when i'm starting new applications in KDE. E.g. I'll start a new terminal with Super+Enter, followed by a Super+Right Arrow in order to tile it to the right. But the latency is big enough, that often it's not the terminal that ends up tiled, but the application that had focus earlier, e.g. a web browser. It's really annoying.
I also don't understand how still in 2020 when I move a window with the mouse, the window can't keep up with the mouse.
30ms input lag is what we should work towards. But that's not what we have today. Today is actually crap.
I keep meaning to try out WindowMaker again as my Fedora window manager. I feel like it would be incredibly speedy on today's hardware.
The one thing that I get hung up on when it comes to RAID and SSDs is the wear pattern vs. HDDs. Take for example this quote from the README.md:
We use RAID-1+0, in order to achieve the best possible performance without being vulnerable to a single-drive failure.
Failure on SSDs is predictable and usually expressed with Terabytes Written (TBW). Failure on spinning disk HDDs is comparatively random. In my mind, it makes sense to mirror SSD-based vdevs only for performance reasons and not for data integrity. The reason is that the mirrors are expected to fail after the same amount of TBW, and thus the availability/redundancy guarantee of mirroring is relatively unreliable.
Maybe someone with more experience in this area can change my mind, but if it were up to me, I would have configured the mirror drives as spares, and relied on a local HDD-based zpool for quick backup/restore capability. I imagine that would be a better solution, although it probably wouldn't have fit into tryingq's ideal 2U space.
That wasn't my experience with thousands of SSDs and spinning drives. Spinning drives failed more often, but usually with SMART sector counts increasing before hand. Our SSDs never got close to media wearout, but that didn't stop them from dropping off the bus. Literally working fine, then boom can't detect; all data gone.
Then there's the incidents where the power on hours value rolls over and kills the firmware. I believe these have happened on disks of all types, but here's a recent one on SSDs . Normally when building a big server, all the disks are installed and powered on at the same time, which risks catastrophic failure in case of a firmware bug like this. If you can, try to get drives from different batches, and stagger the power on times.
comparatively, yes, but when averaged out over a large number of hard drives it definitely tends to follow a typical bathtub curve failure model seen in any mechanical product with moving parts.
early failures will be HDDs that die within a few months of being put into service
in the middle of the curve, there will be a constant steady rate of random failures
towards the end of the lifespan of the hard drives, as they've been spinning and seeking for many years, failures will increase.
State of the art systems keep ~1.2 copies (e.g. 10+2 raid 6) on SSD, and an offsite backup or two. The bandwidth required for timely rebuilds is usually the bottleneck.
These systems can be ridiculously dense; a few petabytes easily fits in 10U. With that many NAND packages, drive failures are common.
The result for us was 2 drives that failed within the same month, of the same brand, and from there it seems to be single failures rather than clusters.
* I would probably go with ashift=12 to get better compression ratio, or even as far as ashift=9 if the disks can sustainably maintain the same performance. Benchmark first, of course.
* We came to the same conclusion regarding AIO recently, but just today I did more benchmark, and it looks like ZFS shim does perform better than InnoDB shim. So I think it's still fine to enable innodb_use_native_aio
* We use zstd compression from ZFS 2.0. It's great, and we all deserve it after suffering through the PR dramas.
You could fix that by writing a bit more to one of the disks, e.g. run badblocks for different amounts of time before putting them in service.
~ * * * * acme-client example.com && rcctl reload httpd
Now, sure, in principle you should have active monitoring so you'd know immediately if there's a problem status e.g. certificate with only 14 days left until expiry; revoked, expired or otherwise bad certificate presented by server; OCSP staples missing. But we know lots of people don't have monitoring, and I guarantee at least one person reading this HN thread is mistaken and isn't monitoring everything they thought they were.
Like Certbot acme-client does a local check before taking any action, so if run once a day (or indeed once an hour) it will not normally call out to the Let's Encrypt service.
Unlike Certbot acme-client doesn't do OCSP checks, so it won't even talk to the network to get an OCSP response. Even with Certbot this (OCSP) is provided through a CDN (so it's roughly similar cost to fetching a static image for a popular web site) and so your check is negligible in terms of operating costs for Let's Encrypt.
I guess you still have a peak time of 00:00 UTC every day though unless people are using servers set to their local time.
Examples of reasons your certificate might have been revoked:
* You re-used a private key from somewhere, perhaps because you don't understand what "private" means, and other copies of that key leaked
* You didn't re-use the private key but your "clever" backup strategy involves putting the private key file in a public directory named /backup/server/ on your web server and somebody found it
* You use the same certificate for dozens of domain names in your SEO business and yesterday you sold a name for $10k. Hooray. The new owner immediately revoked all certificates for that name which they're entitled to do.
* Your tooling is busted and the "random" numbers it picked aren't very random. (e.g. Debian OpenSSL bug)
* A bug at Let's Encrypt means their Ten Blessed Methods implementation was inadequate for some subset of issuances and rather than cross their fingers the team decided to revoke all certificates issued with the inadequate control.
* Let's Encrypt discovers you're actually a country sanctioned by the US State Department, perhaps in some thin disguise such as an "independent" TV station in a country that doesn't have any independent media whatsoever. It is illegal for them to provide you with services and you were supposed to already know that.
So that is a network connection, but not to the Let's Encrypt servers described in this story.
In practice OCSP is done by a big CA by periodically computing OCSP responses for every single unexpired certificate (either saying it's still valid, or not), and then providing those to a CDN and the CDN acts as "OCSP server" returning the appropriate OCSP response when asked without itself having possession of any cryptographic materials.
Further, I was looking at those new server specs. There's an error I think? The server config on the Dell site shows 2x 8 GB DRIMMs, for 16 GB RAM per sever, whereas the article says 2 TB!
With only 16GB of RAM, but 153.6 TB of NVMe storage, the real issue here is memory limitation for a general-purpose SQL database or a typical high-availability NoSQL database.
Check my math: 153600 GB storage / 16 GB memory = 9600:1 ratio
Consider, by comparison that a high data volume AWS i3en.24xlarge has 60TB of NVMe storage but 768 GB of RAM. A 78:1 ratio.
If the article is correct, and the error is in the config on the Dell page (not the blog), and this server is actually 2 TB RAM, then that's another story. That'd make it a ratio of 153600 / 2000 = ~77:1.
Quite in line with the AWS I3en.
But then it would baffle me why you would only get 40 TPS out of such a beast.
Check my logic. Did I miss something?
The reality is that they are storing information during challenges, implementing rate limiting per-account, supporting OCSP validation and a few other things.
You can investigate further if you really want to see the queries that they make against the database since their software (Boulder) is open source . Most queries are in the files in the "sa" (storage authority) folder.
They have capacity for much, much more with that hardware
* Signature by their RSA Intermediate (currently R3, with R4 on hot standby) - which will be a dedicated piece of hardware - to issue a subscriber's certificate. In practice this happens twice, as a poisoned pre-certificate to obtain proof of logging from public log servers, and then the real certificate with the proofs baked inside it.
* Signatures by their OCSP signer periodically on an OCSP response for each certificate saying it's still trustworthy for a fixed period. Again this will be inside an HSM.
* Signature verification on a subscriber's CSR. To "complete the circuit" it's helpful that Let's Encrypt actually confirms you know the private key corresponding to the public key you wanted a certificate for, the signature on your CSR does this. Some people don't think this is necessary, but I believe Let's Encrypt do it anyway.
You're correct that none of this happens on the database servers. I guess it's possible their servers use TLS to secure the MariaDB connections, in which case a small amount of either ECDSA or RSA computation happens each time such a connection is set up or torn down like at any outfit using TLS, but those database connections are cached in a sane system so that wouldn't be very often.
You can see the blockchain size was growing exponentially but then switches to linear as we hit the transaction cap and it now sits at about 350GB
24 NVMEs should have a lot of write throughput, though.
- 32-core AMD EPYC
- 512GB ECC memory
- 8x 3.84TB NVMe datacenter drives
- Unmetered 1gbps bandwidth
Checking out AWS side, the closest I think you'd get is the x1.32xlarge, which would translate to 128 vCPU (which on intel generally means 64 physical cores) and close to 2TB of RAM. nvme storage is only a paltry 4TB, so you'd have to make up the rest with EBS volumes. You'd also get a lower clock speed than they are getting out of the EPICs
I mean, yeah, I guess you can. But a lot depends on your use case and SLA. If you need to keep ultra-low p99s — single digits — then EBS is not a real option.
But if you don't mind latencies, then yeah, fine.
Don't get me wrong: EBS is great. But it's not a panacea and strikes me as a mismatch for a high performance monster system. If you need NVMe, you need NVMe.
448 vcpu, 24TiB of RAM, $70 an hour. ~$52k per month.
This strikes me as odd. In my experience, traditional OLTP row stores are I/O bound due to contention (locking and latching). Does anyone have an explanation for this?
> Once you have a server full of NVMe drives, you have to decide how to manage them. Our previous generation of database servers used hardware RAID in a RAID-10 configuration, but there is no effective hardware RAID for NVMe, so we needed another solution... we got several recommendations for OpenZFS and decided to give it a shot.
Again, traditional OLTP row stores have included a mechanism for recovering from media failure: place the WAL log on separate device from the DB. Early MySQL used a proprietary backup add-on as a revenue model so maybe this technique is now obfuscated and/or missing. You may still need/want a mechanism to federate the DB devices and incremental volume snapshots are far superior to full DB backup but placing the WAL log on a separate device is a fantastic technique for both performance and availability.
The Let's Encrypt post does not describe how they implement off-machine and off-site backup-and-recovery. I'd like to know if and how they do this.
> There wasn’t a lot of information out there about how best to set up and optimize OpenZFS for a pool of NVMe drives and a database workload, so we want to share what we learned. You can find detailed information about our setup in this GitHub repository.
> Our primary database server rapidly replicates to two others, including two locations, and is backed up daily. The most business- and compliance-critical data is also logged separately, outside of our database stack. As long as we can maintain durability for long enough to evacuate the primary (write) role to a healthier database server, that is enough.
Which sounds like traditional master/slave setup, with fail over?
Yes, thank you. I assumed that the emphasis on the speed of NVMe drives meant that master/slave synchronous replication was avoided and asynchronous replication could not keep up. In my mind, this leaves room for interesting future efficiency/performance gains, especially surrounding the "...and is backed up daily" approach mentioned in your quote.
The bottom line is that the old RPO (Recovery Point Objective) and RTO (Recovery Time Objective) are as important as ever.
Yes. My CTO, Avi Kivity did a great talk about this at Core C++ 2019: https://www.scylladb.com/2020/03/26/avi-kivity-at-core-c-201...
Let me boil it down to a few points; some beyond Avi's talk:
• Traditional RDBMS with strong consistency and ACID guarantees are always going to exhibit delays. That's what you want them for. Slow, but solid.
• Even many NoSQL databases written (supposedly) for High Availability still use highly synchronous mechanisms internally.
• You need to think about a multi-processor, multi-core server as its own network internally. You need to consider rewriting everything with the fundamental consideration of async processing, even within the same node. Scylla uses C++ futures/promises, shared-nothing shard-per-core architecture, as well as new async methods like io_uring.
• Between nodes, you also have to consider highly async mechanisms. For example, the tunable eventual consistency model you'd find in Cassandra or Scylla. While we also support Paxos for LWT, if you need strong linearizability, read-before-write conditional updates, that comes at a cost. Many classes of transactions will treat that as overkill.
• And yes, backups are also a huge issue for those sorts of data volumes. Scylla, for example, has implemented different priority classes for certain types of activities. It handles all the scheduling between OLTP transactions as highest priority, while allowing the system to plug away at, say, backups or repairs.
More on where we're going with all this is written in a blog about our new Project Circe:
But the main point is that you have to really think about how to re-architect your software to take advantage of huge multi-processor machines. If you invest in all this hardware, but your software is limiting your utility of it, you're not getting the full bang you spent your buck on.
I appreciate the response but it doesn't address my question: given that Let's Encrypt's MySQL-family RDBMS does not implement any of the multi-core/multi-socket/cpu-affinity/lock-free/asyncIO techniques used by databases like ScyllaDB, MemSQL, and VoltDB, why were they seeing 90% CPU utilization on their old Intel servers while the upgraded AMD servers were 25% (the expected range)?
I think mike_d's suggestion is most plausible: they probably included custom functions/procedures that invoke CPU-expensive code. I also thought this was a single-node scale-up architecture but since they are using a three-or-more node master/slave architecture, network I/O could somehow be involved.
As for why they suddenly dropped? I'll leave that to someone who knows this particular system far better than I do.
Optimizing for CPU efficiency in a system that is I/O bound will not save significant money. Memory and NVMe are the critical factors and with the master/slave replication used, network I/O significantly undercuts the peak performance this single server is capable of.
I have seen CPU bound database servers when developers push application logic in to the database. Everything from using server-side functions like MD5() to needless triggers and stored procedures that could have been done application side.
innodb_thread_concurrency and innodb_concurrency_tickets would be a good starting point, and optimal values depend on your r/w balance and number of rows touched per type of query.
This is usually done to avoid blocking the thread before the end of it's time slice if there is a chance the lock would become available.
If instead they implement a pure spin loop with sleep I can see why it does not perform well.
Causing the lock to not spin and give up the pending CPU cycles back to OS if the lock is not available.
They've tweaked a few other settings as well .
I'd be curious to see more benchmarks and latency data (especially as they're utilizing compression, and of course checksums are computed over all data not just metadata like some other filesystems).
Ah, I miss actual hardware.
The productivity enabled by having one master RDBMS is a big deal, and if they can buy commodity servers that satisfy their requirement, this seems like a fine way to operate.
That is, full table replication, but individual servers maintaining differing sets of indexes. OLAP and single request transactions could be routed to specialized replicas based on query planning, sending requests to machines that have appropriate indexes, and preferably ones where those indexes are hot.
The trend has been away from complex specialization of data structures, secondary indexing, etc and toward more general and expressive internal structures (but with more difficult theory and implementation) that can efficiently handle a wider range of data models and workloads. Designers started moving on from btrees and hash tables quite a while ago, mostly for the write performance.
Write performance is critical even for read-only analytical systems due to the size of modern data models. The initial data loading can literally take several months with many popular systems, even for data models that are not particularly large. Loading and indexing 100k records per second is a problem if you have 10T records.
Part of it is short memories, but part of it is how the cost inequalities in our hardware shifts back and forth as memory or storage or network speeds fall behind or sprint ahead.
You may need clusters, duplicated systems, replication, etc for resiliency reasons of course, but a single modern machine with lots of memory channels per CPU and PCIe 4.0 can achieve ridiculous throughput...
edit: Here's an example of doing 11M IOPS with 10x Samsung Pro 980 PCIe 4.0 SSDs (it's from an upcoming blog entry):
It is more interesting if actual CPU can handle such traffic in context of DB load: encode/decide records, sort, search, merge etc.
And CPU problem for OLTP databases is largely a memory access latency problem. For columnar analytics & complex calculations it's more about CPU itself.
When doing 1 MB sized I/Os for scanning, my 16c/32t (AMD Ryzen Threadripper Pro WX) CPUs were just about 10% busy. So, with a 64 core single socket ThreadRipper workstation (or 128-core dual socket EPYC server), there should be plenty of horsepower left.
The Samsung PRO 980 I have, are TLC for main storage, but apparently are using some of that TLC storage as a faster write buffer (TurboWrite buffer) - I'm not an expert, but apparently the controller can decide to program the TLC NAND with only 1-bit "depth", they call it "simulated SLC" or something like that. On the 1 TB SSD, the turbowrite buffer can dynamically extend to ~100 GB, if there's unused NAND space on the disk.
Btw, the 3DXpoint storage (Intel Optane SSDs & Micron X1) should be able to sustain crazy write rates too.
Something like a NOSQL style it is kind of built in that it will be distributed. But that backs the compute cost back into the clients. Each node is 'crap' but you have hundreds so it does not matter.
Something like SQL server it comes down to how fast you can get the data out of the machine to clone it somewhere else (sharding/hashing, live/live backups, etc). This is disk, network, CPU. Usually in that order.
In most of the ones I ever did it was almost always network that was the bottleneck. Something like a 10gb network card (was state of the art neato at the time, I am sure you can buy better now) you were looking at saturation of 1GB per second (if you were lucky). That is a big number. But depending on your input transaction rate and how the data is stored it can drop off dramatically. Put it local to the server and you can 10x that easy. Going out of node costs a huge amount of latency. Add in the req of say 'offsite hot backup' and it slows down quickly.
In the 'streaming' world like kafka you end up with a different style and lots of small processes/threads which live on 'meh' machines but you hash it and dump it out to other layers for storage of the results. But this comes at a cost of more hardware and network. Things like 'does the rack have enough power', 'do we have open ports', 'do we have enough licenses to run at the 10GB rate on this router'. 'how do we configure 100 machines in the same way', 'how do we upgrade 100 machines in our allotted time'. You can fling that out to something like AWS but that comes at a monetary cost. But even virtual there is a management cost. Less boxes is less cost.
In addition, you really care about integrity of your data so you probably want serializability, avoid concurrency and potential write/update conflicts, and to only do the writes on a single server.
For this reason it sounds to me that partitioning/sharding is the only way to really scale this: have different write servers that care about different primary keys.
> If this database isn’t performing well enough, it can cause API errors and timeouts for our subscribers.
What are the SLO's? How was this being met (or not) before vs after the hardware upgrade? There's a lot of additional context that could have been added in this post. It's not a bad post but instead it simply reduces down to this new hardware is faster than our old hardware.
The majority of OCSP traffic will probably be for end-entity certificates; most OCSP validation (in browsers and cryptographic libraries) is end-entity validation, not leaf-and-chain.
Removal of intermediate CA's OCSP is probably not really relevant to their overall OCSP performance numbers (and if it was, it was likely cached already).
Suppose you promise to issue OCSP revocations within 48 hours if it's urgent, and your OCSP responses are valid for 48 hours. That means after a problem happens OCSP revocation takes up to 96 hours to be effective.
If you only issue certificates with lifetimes of 96 hours then OCSP didn't add anything valuable - the certificates expire before they can effectively be revoked anyway.
Let's Encrypt is much closer to this idea (90 days) than many issuers were when it started (offering typically 1-3 years) but not quite close enough to argue revocation isn't valuable. However, the automation Let's Encrypt strongly encourages makes shortening lifetimes practical. Many of us have Let's Encrypt certs automated enough that if they renewed every 48 hours instead of every 60 days we'd barely care.
The solution to excessive OCSP traffic and privacy risk is supposed to be OCSP stapling instead, but TLS servers that can't get stapling right are still ridiculously popular so that hasn't gone so well.
To do this they automatically generate and sign OCSP responses (the vast majority of which will just say the certificate is still good) on a periodic cycle, and then they deliver them in bulk to a CDN. The CDN is who your client (or server if you do OCSP stapling, which you ideally should) talks to when checking OCSP.
To generate those responses they need a way (hey, a database) to get the set of all certificates which have not yet expired and whether those certificates are revoked or not.
Interesting they decide to put in PCIE 3.0 NVMe SSD instead of PCI-E 4?
Imagine having 24x Intel Optane . PCI-E 5 is actually just around the corner. I would imagine next time Let's Encrypt could upgrade again and continue to use a Single DB Machine to Serve.
Dual 32-core chips give us plenty of cores while keeping clocks higher for single-threaded performance.
You are correct that the price of the CPUs is almost irrelevant to the overall cost of a system with this much memory and storage. We were picking the ideal CPU, not selecting on CPU price.
> CPU usage (from /proc/stat) averaged over 90%
Leaves me wondering exactly which metric from /proc/stat they are refering to. I mean it's presumably its user time, but I just dislike attempts to distill systems performance into a few graph comparisons. In reality the realized performance of a system is often better described given a narrative describing what bottlenecks the system.
It shouldn't be too difficult given the current use of MariaDB to start using something like Galera to create a multi-master cluster and improve redundancy of the service, unless there are some non-obvious reasons why they wouldn't be doing this.
I think I also see redundant PSUs, would be neat to know if they're connected to different PDUs and if the networking is also redundant.
Galera is great, but you lose some functionality with transactions and locking that could be a deal breaker. And up until MySQL 8, there were some fairly significant barriers to automation and clustering that could be a turn off for some people.
Everything has it's pros and cons.
Asynchronous streaming to a truly redundant second site often makes more sense
It is kind of funny how long InnoDB was the most reliable storage engine. I am not sure if MyISAM is still trying to catch up, it used to be much worse than InnoDB. With the emergence of RocksDB there are multiple options today.
And on 8 MyISAM is mostly gone, not even the 'mysql' schema uses it.
Edit: Originally linked only to default_tmp_storage_engine).
The latter is always interesting to me, building things is easy. Building rock solid reliable things is seriously hard work and often difficult.
Would make a great blog post....
Then it would be 90 ms -> 81 ms, not 90 ms -> 9 ms. The way I see it, at least. With proper decimation, 90% of what was there remains. ("removal of a tenth", as wikipedia puts it).
What a stellar bit of incompetence.
These are two distinct axes that are not incompatible with each other.
In all seriousness, however, the decision (likely) has very little to do with that. They're most likely not hosting in the cloud because the current CA/Browser Forum rules around the operation of public CAs effectively don't permit cloud hosting. That's a work in progress, but for the time being, the actual CA infrastructure can't be hosted in the cloud due to security and auditability requirements.
I can forsee letsencrypt in the future going to building their own cloud (on their own physical infrastructure), but speaking as a letsencrypt user of their free certificate program, I would lose respect and interest in their service if they went with an AWS or GCP or Azure approach.
The independence from other major players (and the ability of their team to change and move everything about their service, as needed) is one of the reasons I use letsencrypt.
So long as they don't have a viable independent revenue stream they're arguably less independent than commercial CAs.
For example : https://www.scaleway.com/en/dedibox/dedirack/
I'm not sure you can say it's the cloud though.
They are not hosting their database in the cloud like Amazon or Azure because no cloud provider offers such high performances at a comparable price. Actually I'm not even sure you can get a cloud VM with that many IOs, if you don't mind the pricing.
It can help if someone wants their data "in the cloud".