With relatively economical 64 core / 128 thread options with nearly unlimited RAM capacity appearing, a lot more workloads will "fit". Needless to say, single-node systems are much easier/cheaper/faster to get right.
If things follow the previous patterns that will open up the market to a single/dual core, large memory, machine with a high speed networking and storage ports that is lower cost and lower power.
A lot of this also centers around how apps are designed. If you can rewrite it you can pick your architecture. But it's hard to take a pre-existing one and slap it on a new architecture and make it just scale.
Mosix attempted to have a magical SSI that just scaled single pre-existing apps (even threaded shared-memory ones) across machines with no extra work, but it's got such a hard set of problems it never caught on. Now we just add intermediate abstractions until we can jam an app into whatever architecture we have.
From a systems architecture point of view it is a really interesting exploration of Amdahl's law. So many things that people said "You'll never do that on a networked cluster of machines." have fallen (data bases being one of the larger ones). And while it used to be mainframes won on I/O channel capacity, Google and others have shown that when you can parallelize the I/O channels effectively that advantage goes away as well.
Google's implementation is not very helpful to all those companies, individuals, NGOs, and governments that have to follow privacy laws, HIPAA, etc though, because Google's implementation isn't open source, and these entities can't use Google Cloud. Or don't want to.
Until we get an open source solution for this, SMP machines will be useful.
And even then, you save money by having less and larger machines in your cluster than just having tiny ones. Larger machines means your overhead is reduced.
Google, and Amazon, and MS, undergo extensive third-party audits and people can, and do, run HIPAA workloads there.
I’m not sure what distinction you are trying to draw.
So either we need an on-prem version of Cloud Spanner, Cloud Spanner needs to be HIPAA certified, certified to match German privacy laws, etc, or Cloud Spanner can’t serve these situations.
Maybe in a year, or two. But not today.
That’s not always the case, though.
There was an update to that 6 months later: https://www.cockroachlabs.com/blog/better-sql-joins-in-cockr...
I'd assume they've made even more improvements since, so they really should update their docs.
> The cost of two-phase commit is particularly high in Spanner because the protocol involves three forced writes to a log that cannot be overlapped with each other. In Spanner, every write to a log involves a cross-region Paxos agreement, so the latency of two-phase commit in Spanner is at least equal to three times the latency of cross-region Paxos.
I suspect that Google's innovation in making these cross-region or cross-cluster transactions viable is partly in their network: having an extremely reliable network with consistent low latency, a lot of which allegedly uses first-party network fiber between disparate locations. This is touched on in a paper that Google published :
> Many assume that Spanner somehow gets around CAP via its use of TrueTime, which is a service that enables the use of globally synchronized clocks. Although remarkable, TrueTime does not significantly help achieve CA; its actual value is covered below. To the extent there is anything special, it is really Google’s wide-area network, plus many years of operational improvements, that greatly limit partitions in practice, and thus enable high availability.
I'm also not sure that technologies like Spanner necessarily address the same use-case as SMP machines. Aren't those use-cases concerned with achieving the highest possible transaction throughput for transactions that are potentially highly contentious and require low latency? Systems that solve that problem today still largely look like Oracle DBs from 2000 -- or a fleet of replicas reading executing the same state machine on a transaction log.
Transactions against Spanner likely have high latency compared to traditional DBs:
> On the downside to Spanner, Zweben noted, that the price of Spanner’s high availability is latency at levels unacceptable in some applications. If you’re executing banking transactions, like in airline reservations, and you can afford a minute of delay while you make sure they’re committed in multiple data centers, Spanner’s probably a good database for that, he said.
> Zweben says, however, if you’ve got a customer-service app that requires you to be extraordinarily responsive over millions of transactions and at the same time analyze profitability of a customer and maybe train a machine learning model, those simultaneous transactional and analytical requirements are better for a database [...]
One benchmark I found suggests that read and update latencies can be quite high . The graphs are missing units, but it seems to suggest update latency is higher than with traditional DBs. I've had difficulty finding other benchmarks just now, but the one other I found reported the same finding :
> An interesting pattern above is that queries 14-18, which are all updates, perform with higher latency on Spanner than the easy selects and non-bulk inserts.
It's definitely innovative technology in that it allows you to scale read-oriented SQL workloads to a far greater degree, with better availability, but from what I've read it does not tackle all of the same problems and use-cases that traditional relational DBs are best at solving: high transaction throughput and low latency for contentious data sets.
I think this understates it. Production distributed systems suck balls with unreliable infrastructure, and anyone who has ever tried to do realtime replication of all of their data (and lots of it) across the internet knows how completely unrealistic it is. It's like running a power cord across a highway to power a refrigerator.
??? Color me baffled.
Did you mean Numa vs SMP? Or something else maybe? How can a machine be a NUMA SMP? NUMA and SMP are fundamentally different architectures.
Today on a 24/48 core dual socket server you'll see the same sorts of thing, having a core using memory on the 'other' physical chip's memory bus will impact the overall performance significantly.
In the 2000's, before multi-core chips were a thing, there were two major camps, the 'super computer' camp, and the 'cluster' camp.
The 'super computer' camp insisted on cache coherent memory between all of the cores or threads. You got these very expensive fabrics from people like cray that would snoop access to memory from the cores and send coherency messages around to insure that if someone wrote something in to memory somewhere, everyone else's L1 or L2 cache got the message to invalidate what they were holding (shoot downs). These machines are very expensive and take months to build.
The 'cluster' camp said, "We can use a network fabric and just parameterize shared memory usage." So they put together independent machines connected by a network fabric and no cache coherency protocol. If you wanted to use shared memory you could build something like memcached and wrap your access with network calls. With that architecture even if it took twice as many cores to do what you wanted to do, the price of the machine was one tenth what it was for the big SMP machine.
For something that was trivially parallellizable like internet search or serving up web sites, that was a much more cost effective way to go. When people started doing stuff that they previously used 'super computers' and big SMP machines for on these Linux clusters it became a sort of race to pull apart these problems into "shared nothing" clusters.
Oh interesting. Might you have any links or suggested reading on this discussion and eventual compromise?
>"In the 2000's, before multi-core chips were a thing, there were two major camps, the 'super computer' camp, and the 'cluster' camp."
Is the cluster camp Beowulf then basically?
>"You got these very expensive fabrics from people like cray that would snoop access to memory from the cores and send coherency messages around to insure that if someone wrote something in to memory somewhere, everyone else's L1 or L2 cache got the message to invalidate what they were holding (shoot downs)."
Is this the MESI protocol then?
EDIT I just saw the link below about CCNUMA.
However, SMP systems suffer disadvantages in that system bandwidth and scalability are limited. Although multiprocessor systems may be capable of executing many millions of instructions per second, the shared memory resources and the system bus connecting the multiprocessors to the memory presents a bottleneck as complex processing loads are spread among more processors, each needing access to the global memory. As the complexity of software running on SMP's increases, resulting in a need for more processors in a system to perform complex tasks or portions thereof, the demand for memory access increases accordingly. Thus more processors does not necessarily translate into faster processing, i.e. typical SMP systems are not scalable. That is, processing performance actually decreases at some point as more processors are added to the system to process more complex tasks. The decrease in performance is due to the bottleneck created by the increased number of processors needing access to the memory and the transport mechanism, e.g. bus, to and from memory.
Alternative architectures are known which seek to relieve the bandwidth bottleneck. Computer architectures based on Cache Coherent Non-Uniform Memory Access (CCNUMA) are known in the art as an extension of SMP that supplants SMP's "shared memory architecture." CCNUMA architectures are typically characterized as having distributed global memory.
The historical opposite of SMP used to be asymmetric multiprocessors in the heterogenous sense - different kinds of processors, for example scalar/vector/io processors. Or for a modern day take, ARM SoCs with little low-power cores and faster & more power hungry cores, and GPUs thrown in for good measure.
I think in the context that OP was referring to "early 2000s" which presumably means the introduction of Opteron and HyperTransport then yes I believe the NUMA vs SMP distinction would be pretty clear.
I think VMWare deserves a mention here? And the terms OS Virtualization vs Hardware virtualization do as well (Ctrl-F doesn't find them.)
For awhile hardware virtualization (VMWare) was more prevalent, but it's more complex and has more overhead than OS virtualization (containers). That is how people solved the problem of having powerful machines and small workloads (or workloads with a lot of variance).
Although historically it might have been that hardware virtualization actually came first, in IBM mainframes. In the Unix world I guess OS virtualization came first.
I didn't follow the latest development, but for a while, Docker was not even that great in term of robustness and stability. I never was a big fan of the userland proxy for example. And from what I've read, upgrading from one docker version to another could be quite painful (disclaimer: I toyed with docker a little, I haven't used it in production yet).
IMHO, Docker succeeded because it brought a complete ecosystem around containers:
* Ways to generate the container image through Dockerfile
* Ways to compose an image on top of another (itself on top of another, etc...)
* Ways to share and distribute these images with Docker Registry
Except jails already existed on FreeBSD, and CP/CMS on mainfraimes in the 70s existed long before this...
For the past decade or so, mainframe-level virtualization on x86 has been possible, at least, starting on high-end chips and working its way downwards to the commodity world. And, yes, it exists in ARM as well.