
Silicon Brain: 100,000 ARM cores [video] - DiabloD3
https://www.youtube.com/watch?v=2e06C-yUwlc
======
rasz_pl
Custom 18 core arm chips (too much for A7-9, too little for M0-4, really weird
number of cores), and what looks like 8-12 layer pcbs. This cant be cheap :o

I wonder how it looks at this level of integration (850 cores per board)
versus something like Xeon + PHI/GPU. You certainly do gain asynchronous mode
of operation, and that might be the winning factor/secret sauce.

~~~
ris
"Custom 18 core arm chips (too much for A7-9, too little for M0-4, really
weird number of cores)"

Could just look at the website instead of speculating
[http://apt.cs.manchester.ac.uk/projects/SpiNNaker/SpiNNchip/](http://apt.cs.manchester.ac.uk/projects/SpiNNaker/SpiNNchip/)

~~~
rasz_pl
ARM9E, wow that is ancient, 15 year old nintendo DS ancient. Not even R4 (if
they really really need realtime oriented core).

~~~
m_mueller
AFAIK the Xeon Phi is based on something in between Pentium I and II cores. So
the idea is about the same. I'd put my money on ARM when it comes to efficient
chip architecture, since efficiency was their goal from the start (x86 was
always about backwards compatibility first).

~~~
DiabloD3
No, the Phi is a P6 core, but so is all the modern Core family CPUs.

All Phis are along the lines of modern cores stripped down and simplified.
This is also how the modern Avoton and newer Atoms work, but the Phi is
stripped down even farther.

~~~
m_mueller
That's not what I get from Intel's documentation. Intel seems to call it a
Pentium I with a P6 programming interface. But maybe you have other sources,
if so please share.

> So why not use an older, smaller but still very capable core? And that is
> what they did. The designers went back generations, literally back to one of
> the first modern cores, the Intel® Pentium® processor. [1]

> The foundation for the Intel® Xeon Phi™ coprocessor core PMU is the PMU from
> the original Intel® Pentium® processor (aka P54C). Most of the forty - two
> performance events that were available in the original Intel® Pentium®
> processor are also available on the Intel® Xeon Phi™ coprocessor The core
> PMU has been upgraded to an Intel® Pentium® Pro processor - like (“P6 -
> style”) programming interface.

[1] [https://software.intel.com/en-us/blogs/2013/03/22/the-
intel-...](https://software.intel.com/en-us/blogs/2013/03/22/the-intel-xeon-
phi-coprocessor-what-is-it-and-why-should-i-care-pt-1-fitting-it-all)

[2]
[https://software.intel.com/sites/default/files/forum/278102/...](https://software.intel.com/sites/default/files/forum/278102/intelr-
xeon-phitm-pmu-rev1.01.pdf) (section 1.2)

~~~
DiabloD3
All a PMU is, is a performance monitoring unit. It keeps track of CPU
utilization and performance counters.

This is an extremely small part of the CPU, and yes, I can imagine they
jettisoned a lot of stuff you find on modern cores because it takes up too
much room.

~~~
m_mueller
Good point, that source wasn't really what I thought it is. There's still
enough sources to support my argument though.

> The cores at the heart of Intel’s first Xeon Phi are based on the P54C
> revision of the original Pentium and appear largely unchanged from the
> design Intel planned to use for Larrabee. [3]

> Many changes were made to the original 32-bit P54c architecture to make it
> into an Intel Xeon Phi 64-bit processor. [4]

\- so, still, they seemed to have started with Pentium I and adding stuff to
it rather than stripping out a modern core. Which was always the story they
sold about Larrabee, which AFAIK was the direct predecessor project that got
salvaged with Intel MIC.

[3] [http://www.extremetech.com/extreme/133541-intels-64-core-
cha...](http://www.extremetech.com/extreme/133541-intels-64-core-champion-in-
depth-on-xeon-phi)

[4] [https://software.intel.com/en-us/articles/intel-xeon-phi-
cor...](https://software.intel.com/en-us/articles/intel-xeon-phi-core-micro-
architecture)

~~~
DiabloD3
It is closer like this: ARM, and all the companies that have their own ARM
designs, have a library of parts. They assemble those parts as needed to
complete designs for themselves and for customers.

What people don't understand is, Intel does the same. Look at how modern E3s,
E5s, E7s, i3/5/7s, modern Atoms, etc, all work: similar designed parts, all
paired with what is minimally required for that design to work and perform the
way they want.

Intel doesn't throw designs out, they keep them and periodically make sure
they still work on smaller fab sizes and newer fab techs.

A more striking example than the Phi is the Intel Quark, featured in the
Edison platform, which is Intel's equivalent of an ARM Cortex-M series (such
as the M4s used in a lot of cell phones as a GPS/motion sub-processor and
other things). The Quark really _is_ a modernized P54C (Pentium 1 pre-MMX)
core, and more so than the Xeon Phi is (although, obviously, there is shared
part design through both of them).

I think the thing with Phi is, its rapidly evolving. Larrabee was closer to
this design than first gen Phi was, and now they're shipping second gen Phi,
and it looks more like how some GPUs have been historically designed than just
x86 core spam (look at how the bus design is evolving, they're getting closer
and closer to how AMD and Nvidia design theirs, and also how post-Skylake on-
die GPU integration is evolving on multi-socket platforms).

So, yeah. I don't agree that the Phi can be flat out called a P54C, but I
agree they have been reusing modernized parts from that era because it is
easier to do that than continually strip down existing designs to look like
that.

The Quark, however, looks a lot like how embedded family 286 and 486s have
been kept alive for the embedded hardware sector, and now they're positioning
the Quark for the IoT era (which, hey, they have my interest with that
product, so they did something right); the Quark is more of a P54C than the
Phi is.

------
yigitdemirag
For those who interested in the hardware implementation of neural networks,
this project simulates neurons on the software level, however neuromorphic
TrueNorth chip does the same job on hardware level which enable faster and
more efficient applications.

~~~
kevinchen
Side note, hardware accelerated doesn't necessarily mean it's faster. For
example, Java chips never took off because an x86 with a really good JIT does
the job better.
[https://en.wikipedia.org/wiki/Java_processor](https://en.wikipedia.org/wiki/Java_processor)

------
melling
Sounds like something Paul Allen's Brain Institute is working on.

[http://youtu.be/Te-SDsb6sHM](http://youtu.be/Te-SDsb6sHM)

[http://youtu.be/EX7wzqAoeQA](http://youtu.be/EX7wzqAoeQA)

[https://en.m.wikipedia.org/wiki/Allen_Institute_for_Brain_Sc...](https://en.m.wikipedia.org/wiki/Allen_Institute_for_Brain_Science)

------
transfire
Doubling every two years, we will have brain power in a single rack in 13
years, and a brain sized box in less than 20.

------
justin_
"With a million cores we only get to about 1% of the scale of the human brain"

Interesting. So in terms of raw computing power, the totality of computers out
there is much more powerful than a brain. I wonder when we first passed that
level as a whole. (I know that we wouldn't use all our computers to make a
"brain", but I think it's interesting to think about :)

~~~
xixixao
This is a very inaccurate comparison for many reasons, but to get the sense of
complexity of the human brain:

In the human brain, there are almost 100 billion neurons (the project plans to
simulate 1000 neurons on one core, hence 1% atm). Each has about 7000
connections, or alternatively there are estimated 10^15 connections or more.

The internet has only about 15-20 billion nodes, only few highly connected,
the Internet is more of a tree than a complete mesh.

So in some way, you could say we are nowhere close to obtaining the computing
power of the brain, if we, of course, disregard the fact that the nodes,
computers, are very powerful on their own. But this is where the comparison
breaks. The human brain is powerful due to the complexity of its network, not
the sheer amount of "working units" (cores/transistors).

Comment on the project: it is exciting, but it must not be understated that it
is still a very gross approximation, since the neuron models used are simple.
It might be the case that interesting behavior will arise using this simple
model, but it also might be the case that the secret sauce is in the fine
behavior of each neuron. The hope is that interesting phenomena can be
observed with this simpler model, perhaps a bit like classical physics can
often be used without requiring the full model of quantum mechanics.

~~~
iandanforth
I would extend this to say that when you get down to the level of synapses
you're now talking about memory and not CPU. (If you actually want to
implement this) 100 trillion synapses is better mapped to 100TB of RAM.

It's also very important to remember how incredibly _slow_ the human brain is.
Were talking 100s of ms slow to go from sensory input to motor output. A
single neuron might take between 1 and 10 ms to fire, and a single dendrite
might take 1/10th that time, so at best your doing computation at 10khz. CPU
has 5 orders of magnitude over biology.

The problem is we have only a vague idea of how the network is connected and
don't really know the algorithm thats being implemented by that network. So we
fall back on things like simulating ion channels which take _way_ more compute
resources than necessary. There is a _lot_ of cargo culting going on right
now, but of course it's also insanely exciting and fun to find out what does
and doesn't work.

~~~
Asbostos
With 1000 neurons per core running in series, that would eat up a lot of the
speed advantage.

------
tinco
It's cool that they compare it to mice brains and fractions of human brains,
but how similar are they really? Do they operate at the same speed? I'd guess
biological neurons trigger slower. Are they as connected? It's nice that it's
a hexagonal mapping of a toroids surface, but neurons are connected in true
3d, how much more connected is that?

~~~
trhway
>but neurons are connected in true 3d,

not exactly. The most of the "thinking" neurons form the surface of the brain
- that surface is wrinkled into "gyri" to achieve more than 2D, yet it isn't
3D. In the language of Hausdorff dimension it is 2.x something.

And the neuron connections are pretty structured topology that is far from
each-to-each. If, for example, you look at this picture
[https://en.wikipedia.org/wiki/Fornix_%28neuroanatomy%29#/med...](https://en.wikipedia.org/wiki/Fornix_%28neuroanatomy%29#/media/File:Gray747.png)
you can see how the wires from the neurons of the hippocampus gyrus are
bundled together into the "fimbria" and routed through the "fornix".

~~~
redcalx
Here's a very short video showing white fibre bundles in a human brain.

[https://youtu.be/6YFG5OnDp-Y](https://youtu.be/6YFG5OnDp-Y)

From:

[http://heliosphan.org/pittsburghbraincomp.html](http://heliosphan.org/pittsburghbraincomp.html)

------
ropiku
There's also a video of wiring the rack (took 4h) and how the wiring for the
whole 10-rack cluster would look like:
[http://jhnet.co.uk/projects/spinner](http://jhnet.co.uk/projects/spinner)

------
cpplinuxdude
"with a little bit of mathematics too" how wonderfully understated.

~~~
grondilu
He actually said "with quite a lot of mathematics too"

------
mentos
What technology do we have right now to map the connections of the brain? Is
there anything that currently exists with enough granularity to take a
snapshot of our neurons and their connections?

edit: thanks for the replies, did some research found this TED talk that
visualizes the problem pretty well
[https://www.youtube.com/watch?v=HA7GwKXfJB0](https://www.youtube.com/watch?v=HA7GwKXfJB0)

~~~
yigitdemirag
fMRI I suppose. As far as I know, even if you have all neuron mapping of human
brain, you could not achieve basic intelligence as brain constantly alter with
these connections and their strengths. Even if you know all the connections,
neuron firing types and representation of information over the network would
create a big question for you.

~~~
mentos
Yea if you think about it short term memory is necessary for the movement of
consciousness from state to state. Otherwise it'd be like flipping a toaster
on.

------
nodesocket
How do the PCB boards communicate among themselves? The cables don't look like
ethernet (CAT6).

~~~
lcr
From the SpiNNaker project page:

>The control interface is two 100Mbps Ethernet connections, one for the Board
Management Processor and the second for the SpiNNaker array. There are options
to use the six on-board 3.1Gbps high-speed serial interfaces (using SATA
cables, but not necessarily the SATA protocol) for I/O...

------
bch
I wonder what software they're using; they mention that the system is event-
driven, but I can't see any sign of what they're using to implement this,
whether custom or COTS software...

~~~
abersek
Mostly python, their code is at
[https://github.com/SpiNNakerManchester](https://github.com/SpiNNakerManchester)

~~~
tomn
That's mostly front-end stuff for using the machines; the actual code running
on the custom hardware is generally a lot lower level.

------
Patient0
Title should say 1 million ARM cores

~~~
netfire
That is the video title on Youtube and is the goal of the project, but the
video itself shows a 100K ARM cores machine, not a million.

~~~
brixon
The YouTube title is 1M. They did not put in all the commas we normally
expect.

~~~
rasz_pl
yes title is 1m, but they show ONE 100K cores rack, the plan is to have 10
racks.

------
dharma1
pretty cool dude.
[https://en.wikipedia.org/wiki/Steve_Furber](https://en.wikipedia.org/wiki/Steve_Furber)

