
Intel: Why a 1,000-core chip is feasible - kingsidharth
http://www.zdnet.co.uk/news/emerging-tech/2010/12/25/intel-why-a-1000-core-chip-is-feasible-40090968/
======
jws
Summary: Intel guy says 1000 is feasible in a message passing architecture
without coherent caches (having done 48). The programming model will be
radically different from current shared memory models.

~~~
ekidd
Happily enough, a functional message-passing language like Erlang is most of
the way there. I don't know about STM (Software Transactional Memory),
however: How hard is it to implement efficient in-memory transactions without
coherent caches?

~~~
thesz
STM reduces many locks into one (or several). As I understand, you read and
write into log and then try to commit (by acquiring commit lock). Instead of
many points of synchronization you have one.

I do not see any reason why message passing cannot be useful here.

Speaking of Erlang, I often saw point of view that Mnesia is a STM for Erlang,
extended with relational capabilities. So we already have at least one example
of STM over message passing.

------
ssp
Also, by 2011, microprocessors will be clocked at more than 10 GHz.

[http://web.archive.org/web/20000819011344/http://www5.zdnet....](http://web.archive.org/web/20000819011344/http://www5.zdnet.com/zdnn/stories/news/0,4586,2601717,00.html)

~~~
ssp
(To be fair, the 1000-core prediction is considerably less dumb than the 10GHz
one, and the article here is a good one).

~~~
anonymous246
How so? Can you elaborate?

~~~
ssp
As he says in the article, there is no theoretical reason you can't keep
adding cores if they will fit. So given that a 486 was about a million
transistors, and a modern Nehalem has about a billion, you could probably make
a chip with a thousand 486s on it today.

A clock frequency of 10 GHz is crazy talk, on the other hand. It would use way
too much power. The power dissipation of a circuit is something like

    
    
        P = C * V^2 * f
    

where C is the capacitance, V is the voltage, and f is the clock frequency.
The problem is that the necessary voltage also grows with f, which means power
dissipation grows much more than linearly with clock frequency.

There is an explanation here:

[http://perilsofparallel.blogspot.com/2010/02/parallel-
power-...](http://perilsofparallel.blogspot.com/2010/02/parallel-power-
law.html)

The comments on that post are also very good.

~~~
billswift
Thanks. That's an excellent blog, added to my bookmarks.

------
aufreak3
NVidia's Tesla C2050 has 448 CUDA cores. Is Intel still dabbling with 48
purely because they want each core to be totally general purpose?

~~~
jaen
Each CUDA "core" is actually a lane in a 16-wide SIMD processor, so it has 20
CPUs in the traditional sense. (Intel CPUs have 4-way SIMD with SSE) Only one
"program" (kernel) could run on the older GPUs in parallel, on newer GPUs you
can have have all the "CPUs" running different programs, but instruction cache
space is quite limited.

------
kbob
I think this is more about physical packaging than actual new architecture.
SGI, among others, was building 1,000 node, cache incoherent, shared memory
systems 10 years ago. But those systems occupied a whole room, not a single
chip.

<http://en.wikipedia.org/wiki/ASCI_Blue_Mountain>

------
DanielBMarkham
Now somebody just needs to build a pure-functional, MPI-like OS

~~~
sparky
A couple interesting projects along these lines are Barrelfish (
<http://www.barrelfish.org/> ) and Helios (
<http://research.microsoft.com/apps/pubs/?id=81154>
<http://research.microsoft.com/pubs/81154/helios.pdf> ). Both projects put a
lot of focus on being an operating system for a heterogeneous system (with
several types or architectures of CPU, a GPU, maybe a NIC that acts as a
first-class citizen rather than a slave, etc.), but the architecture they
developed to make that work also works well for making the operating system
functional on a CMP without cache coherence. As for "pure-functional," that is
difficult to achieve in many cases since one of the operating system's main
purposes is to help programs have side effects, but I think "shared-nothing"
and "distributed-system-like" are good aims for an OS that works on an
incoherent CMP.

------
jwcacces
Defiantly feasible, check out Chuck Moore's GreenArrays, and his ArrayForth to
program them.

<http://greenarraychips.com/>

------
pjscott
Did they really put a page break in mid-sentence? I guess it would increase
the odds of someone clicking through to the next page, but still, WTF.

------
cma
<http://en.wikipedia.org/wiki/Network_On_Chip>

