I don't know why the author called this "Multi-process architectures suck :(" wh...

adwn · on July 4, 2014

> [...] if you don't use multiple processes, you can only reach about 10-20% capacity on a multi-core machine.

No, you can use threads.

> Plus, forget computers, multitasking is a fact of life. Animals do it, humans do it, so do computers. [...]

That's a really bad comparison, and isn't applicable here at all. Computers are very different from humans or animals, both in how they function and in how they're "programmed". Well, at least it wasn't a car analogy.

PeterGriffin · on July 4, 2014

> No, you can use threads.

I thought that was assumed in the sentence I wrote, as threads are even harder to deal with properly due to their shared memory space. But they still do context switch, have cache misses, require complex (way more complex) synchronization and so on.

If you can get away with separate processes, you're way better off. Threads are a last resort and come with a "use with caution" label.

> That's a really bad comparison, and isn't applicable here at all. Computers are very different from humans or animals, both in how they function and in how they're "programmed". Well, at least it wasn't a car analogy.

No, I didn't just use a random analogy. How much do you know about our brain works anyway?

Did you know our memory types are stacked like CPU caches, RAM and disk, going from fast short-term to slow long-term memory? Interesting, right? We are literally having "cache misses" when context switching, just like our computer brothers.

And as we keep evolving computer designs, they keep creeping closer and closer to how our brain works and the tradeoffs are startlingly similar. I was very deliberate in my comparison. Think about it like dolphins adopting a quite similar body design and locomotion as fish, despite their completely different origins.

Next step is the always-around-the-corner clockless CPU design. It'll come, eventually (either that, or more likely we'll have hundreds of tiny, simple, specialized, locally clocked cores with some memory on-board).

lmm · on July 4, 2014

We have things like caches, true. But I'm not aware of neurology having identified anything remotely resembling multiple cores or hyperthreading in the human or animal brain.

And saying that "there's a balance to be struck" in human multitasking manages to be both trite and wrong; the evidence is not that some people multitask too much and some people multitask too little. Multitasking makes you more stressed and less effective; the overwhelming majority of people should be trying to reduce or eliminate it. Which says very little about how much multitasking our CPUs should be doing.

pdonis · on July 4, 2014

But I'm not aware of neurology having identified anything remotely resembling multiple cores or hyperthreading in the human or animal brain.

Um, you do realize that the human brain has ten to a hundred billion neurons that all run in parallel, right?

lmm · on July 7, 2014

You mean like all the transistors in a single-core processor run in parallel?

pdonis · on July 7, 2014

Transistors are much less complex than neurons; it takes multiple transistors even to implement a single logic gate, and the input-output function for a single neuron is much more complex than a single logic gate. (Some neuroscientists might say that processor cores are less complex than neurons, though I'm not sure I would go that far. I would only say that it would take a small number of neurons--much smaller than the number in the brain--to be equivalent to a processor core.)

GrinningFool · on July 4, 2014

"But I'm not aware of neurology having identified anything remotely resembling multiple cores or hyperthreading in the human or animal brain."

And yet here you sit, typing and breathing and seeing and thinking about what you're having for dinner tonight while your digestive system continues to work on lunch and your heart continues to beat.

lmm · on July 4, 2014

Sounds more like a bunch of specialized hardware controllers than several general-purpose cores. If you want to do something as simple as counting while thinking about something else, the only way is to (ab)use your visual or auditory processors.

adwn · on July 4, 2014

> I thought that was assumed in the sentence I wrote

You might have assumed it, but your post didn't mention or even imply it.

> Computers keep creeping closer and closer to how our brain works and the tradeoffs are startlingly similar.

This definitely needs an explanation. In what way are computers and brains "creeping closer and closer"?

> Think about it like dolphins adopting a quite similar body design and locomotion as fish, despite their completely different origins.

This is another bad analogy. CPUs and human brains do not live in the same environment, and CPUs are not subjected to evolution (they could be considered instances of Intelligent Design, though ;-) ).

PeterGriffin · on July 4, 2014

Seriously, you're going to nitpick the way I used the word "evolution" now? I'm done.

stcredzero · on July 4, 2014

You can find 4 cores in a trivial computer these days (8 cores with hyperthreading). This means you can have 8 processes without context switching at all

The story changes if those processes have to communicate.

Cache misses and overly eager context switching are bad,...There are ways to design multi-process systems to take best advantage of all cores, and your machine caches.

Yeah, but frankly, the situation sucks! It's still quite difficult to detect such problems with certainly, and the solutions often look like hacks.

Plus, forget computers, multitasking is a fact of life...We have to check email and talk on the phone at the same time sometimes

This kind of misses the point. We can do efficient multitasking of largely independent processes, and have been doing that for a long while. It's using multiple cores on heavily interrelated and interconnected processes where things get quite hairy. This is where current architectures suck.

restalis · on July 4, 2014

You can find 4 cores in a trivial computer these days (8 cores with hyperthreading). This means you can have 8 processes without context switching at all, and also suggests that if you don't use multiple processes, you can only reach about 10-20% capacity on a multi-core machine.

In a typical user machine there can be useful independent single-threaded processes running, all serving the user's needs (and thus -- using machine's computational capacity). For servers that have to do only one thing, on the other hand -- yes, it's a problem that requires an actual engineering.

jonpress · on July 6, 2014

I agree with all your arguments. I even have a project to back you up (it's Node.js multi-process based): https://github.com/topcloud/socketcluster - It scales linearly. I just ran a benchmark on a 16-core machine and was able to reach 126k concurrent 'active' virtual users sending messages ever 6 seconds. To put it in perspective, I was only able to reach 55k concurrent users on an equivalent 8-core machine using the same benchmark test.

Karellen · on July 4, 2014

"You can find 4 cores in a trivial computer these days (8 cores with hyperthreading)."

There are still plenty of tablets and phones out there which don't. (And the author is targetting those devices with the software he's writing)

"It's true it has overhead, but you don't have to create thousands of processes,"

Ah, but that might be a natural consequence of your design. I think part of the problem is that we're currently in the middle of a really awkward transition period in computer architectures.

As you point out, almost no computers are single core nowadays. So having a single-process application is obviously not taking advantage of all the computer power you have at your disposal. It's blatantly sub-optimal.

However, if you're going to design a multi-process application framework, you probably want to design it to kick off new processes whenever you're about to do something non-trivial that could introduce significant latency into the mix. But depending on what the user/application user is actually doing, that might potentially end up starting dozens, or hundreds, or even thousands of processes.

And our computer architecture is not yet at the point where we have "enough" cores in our CPUs that we can just do that and have it work. We're getting there, but it's likely a decade or maybe two away.[0]

So, we want to create a multi-process architecture to stop wasting the computing power that exists; but we have to be careful and write extra code to managing creating these processes, because we can't yet afford to create them "on a whim".

It seems similar in some ways to segmented 16-bit memory models. In an early flat 16-bit memory model, you were very constrained (64k) but at least the environment you were working in was simple. The move to a segmented 16-bit model was in some ways a lot less constrained, but taking advantage of it meant dealing with a bunch of extra complexity. (boo!) It was the next step to flat 32-bit systems which made working with memory painless and simple again, while further lessening the constraints.

When low-end phones and tablets have >=256 cores, then we'll be able to take advantage of multi-process frameworks properly.

It occurs to me, having seen Alan Kay's talk a few weeks ago and being fascinated by the idea of "spending money to get ahead of Moore's law" to put together a computer that would allow you to write the sort of software to take advantage of computers a decade from now, but couldn't figure out what that meant if you wanted to do it now, I imagine that putting together a >=256 core system might be a good start.

astrobe_ · on July 4, 2014

I doubt tablets will ever need 256 cores. Desktop computers don't need that much either if you notice that many of them just run a browser and Excel. It's really when you do specific stuff like for instance gaming that you really need more horsepower. 2-4 cores are enough probably till the end of the decade at least: it allows many people to run 2 or 3 programs smoothly. A corollary is that for a program to try to use every core may not be a so good idea in the big picture.

Furthermore, if one observes the hardware evolution of the PC, one notices that it took the direction of heterogeneous multicore architecture (CPU, GPU, etc.) rather than the direction of an homogeneous multicore architecture: there are more "cores" of different types in your PC outside your i7 than inside. The same goes for tablets and phones that typically feature an ARM design with CPU, GPU, DSP integrated into one chip for footprint and power consumption reasons. Architectures seem to evolve into modular hardware, featuring a base CPU backbone to which specialized chips are added. There are a gazillion ARM-based designs, depending which set of functions you need.

This makes sense for the software and consumer electronics industry. Switching to completely different solutions like many-core chips (Mill CPU, GreenArray), would have a huge cost. Picking a common, general purpose "few-cores" CPU and adding in specialized chips as needed is much more affordable.

Karellen · on July 4, 2014

"I doubt tablets will ever need 256 cores. Desktop computers don't need that much either"

And 640k will be enough for everyone.

Hey, you might even be right about not needing them, but that doesn't mean those devices won't get them. You don't really need a 32-bit CPU to run your microwave or washing machine, but very few of the ones you buy today are running 8-bit microcontrollers with hand-coded assembler.

Though clock speeds stopped increasing some time ago, Moore's law marches on, and we're still getting more and more transistors per buck. Sure, some of them will go to more L1/L2 cache, but at some point the bandwidth to flush them to main memory becomes a bottleneck, so I think we're going to see more and more cores/chip. As that marches on, I think it's going to trickle down to even the low end of CPUs. For instance, you might even be able to buy a bunch of 64-core chips which have a bunch of cores disabled really cheap, because the disabled cores failed factory testing and were switched off. It can't be sold for full price any more, but that doesn't mean it's useless.

And once your $5 CPUs have 32 cores, well, providing the tooling is there, you might as well use a framework that makes use of them.

"many of them just run a browser and Excel."

Bad choice of examples there, I think. Browsers can be pretty multi-process heavy these days, with one main process, plus one per tab, plus extra processes for e.g. decoding streaming video in the tab, or running your (spit) EME plugins (spit) in a sandbox.

Similarly, with Excel, sure most of the time it doesn't do much. But, if you've got a spreadsheet with a bunch of dependent cells/formulas, then if someone updates the right cell, being massively parallel could really speed up value recalculations and propagation throughout the sheet. Some spreadsheets translate really well to map/reduce, and using all the cores could really help there.

vilya · on July 4, 2014

"Ah, but that might be a natural consequence of your design."

If a natural consequence of your design is a need to do something that fits the target environment so poorly, then it's a bad design & should be rethought. Software isn't a Platonic ideal, it needs to account for these things if it's to run well.

kilburn · on July 4, 2014

> However, if you're going to design a multi-process application framework, you probably want to design it to kick off new processes whenever you're about to do something non-trivial that could introduce significant latency into the mix. But depending on what the user/application user is actually doing, that might potentially end up starting dozens, or hundreds, or even thousands of processes.

This is exactly the problem process/thread pools solve. Most if not all modern programming languages include such thread pools either in their stdlib or as battle-tested external libraries. Do you have a specific complaint here?

The really hard part of multi-(thread/machine)process architectures is synchronization, not resource allocation.