
Open source 25-core chip can be stringed into a 200,000-core computer - jonbaer
http://www.pcworld.com/article/3111693/open-source-25-core-chip-can-be-stringed-into-a-200000-core-computer.html?href=
======
fmstephe
If, like me, you wondered how it maintains cache coherence with that many
core. It uses something called 'directory based coherence'.

[https://en.wikipedia.org/wiki/Cache_coherence](https://en.wikipedia.org/wiki/Cache_coherence)

From what I have read it appears that the cores we typically run today, use
snooping, constantly listening for interesting changes to other core's caches.
Clearly this doesn't scale, you don't want to listen to cache traffic of
200,000 cores. So they have a centralised 'directory' to manage cache
coherency. The directory system has higher latency, presumably every
load/store has to send a message to the potentially distant cache directory
manager.

Does anyone really know this stuff, I would be interested to hear a more
knowledgeable take on it :)

~~~
digi_owl
I find myself thinking that the pattern of computing is to first implement an
idea in discreet boxes, and then translate that into regions on a IC.

Just observe the path from Mainframe, via minicomp, to todays SoCs.

And as i was reading your comment i found myself thinking about how to get
multiple servers to talk to a common datastore, and how that looks eerily
similar to this cache coherence issue.

To wander into the weird for a bit, the phrase "as above, so below" keeps
going through my mind while contemplating all this.

------
trhway
so it is more than a decade old UltraSparc T1, so called CoolThreads. That
architecture didn't do any good for Sun back then (exactly as it was
expected/predicted at the time by all the normal engineers) - bunch of weak
cores (8 or 16, if i remember correctly, at the time when Intel was only
dreaming about 2) choked by memory bus and without much of business tasks
around that would have available widely parallelized software to solve them
(beside those SSL benchmarks) - may be it was just that ahead of time...

~~~
hobo_mark
Would the T2 (also open source) have been considerably better?

~~~
pawadu
how about MIPS (also recently open sourced)?

------
brudgers
Current discussion of Piton:
[http://parallel.princeton.edu/piton/#](http://parallel.princeton.edu/piton/#)

------
semi-extrinsic
> The goal was to design a chip that could be used in large data centers that
> handle social networking requests, search and cloud services. The response
> time in social networking and search is tied to the horsepower of servers in
> data centers.

So, if I understand this correctly, they are targeting embarrasingly parallell
tasks, although tied to a central database. Why should this super-NUMA
architecture be good for that? If it were, why are Facebook and Google running
on distributed Intel Xeon clusters, and not one of Fujitsu or Oracles big-
SPARC-machine offerings?

------
cmrdporcupine
> Some open-source processor designs are for fun. For example, Open Core
> Foundation ... design for the SH2 processor, which was in Sega’s 1994 Saturn
> gaming console.

Pretty sure that project is not for "fun." In the talk I watched they seemed
pretty damned serious about producing it for 'serious' uses.

------
fallingfrog
This looks quite similar to what Adapteva did with their Parallella platform-
they were targeting the low end of hobbyists though rather than big data
centers. Maybe that was their mistake..
[https://www.parallella.org/](https://www.parallella.org/)

------
milesf
How would one write code for such a monster?

~~~
59nadir
Using erlang.

~~~
colanderman
Erlang is a great way to write code that's distributed across a network and
which is generally I/O bound. It's a terrible way to usefully use compute on
manycore systems.

