

Will Parallel Code Ever Be Embraced? - noidi
http://www.drdobbs.com/parallel/will-parallel-code-ever-be-embraced/240003926

======
danieldk
_But the shiny object that has my attention at present consists of low-voltage
ARM-type chips running on tiny inexpensive systems that can be stacked
together to do all kinds of interesting things for a fraction of the power my
Intel Xeon uses_

This sentiment is repeated very often, but has someone actually done the math
(in the case you do temporarily need a lot of processing power)? E.g., the
following post estimates the power use of a Raspberry Pi around 2W:

[http://www.raspberrypi.org/phpBB3/viewtopic.php?f=2&t=60...](http://www.raspberrypi.org/phpBB3/viewtopic.php?f=2&t=6050)

A recent Xeon or Core i7 is _many_ times faster, and has the advantage of
providing shared-memory parallelism (as opposed to a cluster of Pi's, where
you have to distribute work over a 100MBit network).

Also, when he wants to save power, he shouldn't use a Xeon. Intel Core mobile
CPUs, draw a relatively small amount of power as well. E.g. last time I
measured my Mac Mini, it used 12W during normal use. And it's actually a
usable desktop machine, in contrast to the Raspberry Pi.

~~~
bunderbunder
For power usage, a model like the one used by Parallax's Propeller
microcontroller might be interesting: Propeller has 8 cores. The entire thing
can be clocked up an down like on a modern x86 chip. But it's also possible to
put cores to sleep individually, which reduces power consumption even further.

I'm not sure how good an example Raspberry Pi is, simply because it includes
features like a GPU. A computer probably really only needs one of most of that
kind of stuff, so it would make more sense to look at the power requirements
for a bare ARM chip than for a complete computer built around it.

\-- change of subject --

What I've been running into (including hard and repeatedly over the last few
days) is that parallelizing workstation-end tasks without killing performance
in the process is _hard_. Shared-memory parallelism in particular is painful,
because with too many cooks in the kitchen they'll end up spend more time
trying to not pour boiling water on each other than making food. For example,
every time you hit a memory barrier all the cores that are working against
that memory need to stop and consult the L3 cache or, worse yet, main memory.
That introduces an enormous stall (Of course if you're in a situation where
non-trivial parallelizing is worth the effort, any stall feels enormous.), so
it needs to be avoided as much as possible. . . which tends to not be an easy
thing to do if you're doing shared-memory parallelism. Because if it were
trivial, then you'd probably have been able to get away with shared-nothing.

Now the "lots of tiny cores" approach gets more interesting when you can get
away with a more shared-nothing approach like what the article suggests. But
it comes at a big cost, which is that you're going to take a massive hit on
the kind of performance you can get on tasks for which parallelism is
infeasible, or for which you don't have any programmers who are good enough at
parallelization to do it (effectively the same thing). In those situations,
you're going to be stuck watching one lone core play "Little Engine that
Could" while all the other cores are dozing off like the lazy bums they are.

Meanwhile it solves a problem that I'm not convinced really exists. Time-
multiplexing relatively beefy CPUs is pretty much a solved problem. Less so if
you need real-time, but for everyday use there's really not much need to
segregate processes to different cores when pre-emptive multitasking has been
around on consumer systems for decades.

------
iso-8859-1
I see this as two separate prophesies:

* One is the Intel MIC, which is due to arrive THIS YEAR ([http://blogs.intel.com/technology/2012/06/intel-xeon-phi-cop...](http://blogs.intel.com/technology/2012/06/intel-xeon-phi-coprocessors-accelerate-discovery-and-innovation/))

* One is a total revamp of Operating Systems, so that everything is virtualized.

The first is a hardly a prophesy, because it's so near to being reality, and
the article was written on the day of the Intel announcement.

The second is maybe, what, 20 years into the future? I'm not even sure there's
a need. Security problems are not technical anymore. They are caused by a
breach of trust. Integration between all those different VMs will still be
needed. Badware will use this interface too. People like integration.
Separating everything into its own VM will hinder that, and customers will not
like it.

------
unwind
_Except for games and some build cycles, I'm almost never waiting because the
CPU has maxed out._

That's just ... Weird. What would "maxing out" even mean for a CPU? Going
"fast enough", somehow? Maybe because build cycles is exactly what I do a lot
of my time in front of a computer, I really don't think CPU:s are ever going
to be "fast enough". Even if just doing "ordinary computing", I often think
e.g. browsers and office applications are rather slow.

~~~
gaius
What I would like to be able to do, instead of -O1, -O2 -O3 flags to a
compiler is for the "optimization level" to be measured in seconds. So -O1
would run the optimizer for 1 second and give me the best optimized code it
could come up with in that time. -O3600 would let the machine think about it
for an hour, using any and all heuristics and empirical tests and then giving
me what it had at the end. Pre-release, I might want to run the optimizer for
a week.

~~~
Maro
> Pre-release, I might want to run the optimizer for a week.

That's a dangerous model, as certain classes of bugs would only come out in
the super-optimized version, but that version would presumably not get the
same amount of testing as the regular builds.

------
URSpider94
Well put.

There will always be some applications (games, compilers, Photoshop, 3D
rendering) that will benefit from fine-grained parallelism. For the rest,
being able to run your web browser, DVD ripper, music streamer and IDE on four
separate cores is good enough.

