Hacker News new | past | comments | ask | show | jobs | submit login
Silicon Brain: 100,000 ARM cores [video] (youtube.com)
119 points by DiabloD3 on Aug 25, 2015 | hide | past | web | favorite | 54 comments

Custom 18 core arm chips (too much for A7-9, too little for M0-4, really weird number of cores), and what looks like 8-12 layer pcbs. This cant be cheap :o

I wonder how it looks at this level of integration (850 cores per board) versus something like Xeon + PHI/GPU. You certainly do gain asynchronous mode of operation, and that might be the winning factor/secret sauce.

"Custom 18 core arm chips (too much for A7-9, too little for M0-4, really weird number of cores)"

Could just look at the website instead of speculating http://apt.cs.manchester.ac.uk/projects/SpiNNaker/SpiNNchip/

ARM9E, wow that is ancient, 15 year old nintendo DS ancient. Not even R4 (if they really really need realtime oriented core).

AFAIK the Xeon Phi is based on something in between Pentium I and II cores. So the idea is about the same. I'd put my money on ARM when it comes to efficient chip architecture, since efficiency was their goal from the start (x86 was always about backwards compatibility first).

No, the Phi is a P6 core, but so is all the modern Core family CPUs.

All Phis are along the lines of modern cores stripped down and simplified. This is also how the modern Avoton and newer Atoms work, but the Phi is stripped down even farther.

That's not what I get from Intel's documentation. Intel seems to call it a Pentium I with a P6 programming interface. But maybe you have other sources, if so please share.

> So why not use an older, smaller but still very capable core? And that is what they did. The designers went back generations, literally back to one of the first modern cores, the Intel® Pentium® processor. [1]

> The foundation for the Intel® Xeon Phi™ coprocessor core PMU is the PMU from the original Intel® Pentium® processor (aka P54C). Most of the forty - two performance events that were available in the original Intel® Pentium® processor are also available on the Intel® Xeon Phi™ coprocessor The core PMU has been upgraded to an Intel® Pentium® Pro processor - like (“P6 - style”) programming interface.

[1] https://software.intel.com/en-us/blogs/2013/03/22/the-intel-...

[2] https://software.intel.com/sites/default/files/forum/278102/... (section 1.2)

All a PMU is, is a performance monitoring unit. It keeps track of CPU utilization and performance counters.

This is an extremely small part of the CPU, and yes, I can imagine they jettisoned a lot of stuff you find on modern cores because it takes up too much room.

Good point, that source wasn't really what I thought it is. There's still enough sources to support my argument though.

> The cores at the heart of Intel’s first Xeon Phi are based on the P54C revision of the original Pentium and appear largely unchanged from the design Intel planned to use for Larrabee. [3]

> Many changes were made to the original 32-bit P54c architecture to make it into an Intel Xeon Phi 64-bit processor. [4]

- so, still, they seemed to have started with Pentium I and adding stuff to it rather than stripping out a modern core. Which was always the story they sold about Larrabee, which AFAIK was the direct predecessor project that got salvaged with Intel MIC.

[3] http://www.extremetech.com/extreme/133541-intels-64-core-cha...

[4] https://software.intel.com/en-us/articles/intel-xeon-phi-cor...

It is closer like this: ARM, and all the companies that have their own ARM designs, have a library of parts. They assemble those parts as needed to complete designs for themselves and for customers.

What people don't understand is, Intel does the same. Look at how modern E3s, E5s, E7s, i3/5/7s, modern Atoms, etc, all work: similar designed parts, all paired with what is minimally required for that design to work and perform the way they want.

Intel doesn't throw designs out, they keep them and periodically make sure they still work on smaller fab sizes and newer fab techs.

A more striking example than the Phi is the Intel Quark, featured in the Edison platform, which is Intel's equivalent of an ARM Cortex-M series (such as the M4s used in a lot of cell phones as a GPS/motion sub-processor and other things). The Quark really is a modernized P54C (Pentium 1 pre-MMX) core, and more so than the Xeon Phi is (although, obviously, there is shared part design through both of them).

I think the thing with Phi is, its rapidly evolving. Larrabee was closer to this design than first gen Phi was, and now they're shipping second gen Phi, and it looks more like how some GPUs have been historically designed than just x86 core spam (look at how the bus design is evolving, they're getting closer and closer to how AMD and Nvidia design theirs, and also how post-Skylake on-die GPU integration is evolving on multi-socket platforms).

So, yeah. I don't agree that the Phi can be flat out called a P54C, but I agree they have been reusing modernized parts from that era because it is easier to do that than continually strip down existing designs to look like that.

The Quark, however, looks a lot like how embedded family 286 and 486s have been kept alive for the embedded hardware sector, and now they're positioning the Quark for the IoT era (which, hey, they have my interest with that product, so they did something right); the Quark is more of a P54C than the Phi is.

Taped out in in 2010 before the r4 was announced.

The human brain project did receive $1B from the EU...:


For those who interested in the hardware implementation of neural networks, this project simulates neurons on the software level, however neuromorphic TrueNorth chip does the same job on hardware level which enable faster and more efficient applications.

Side note, hardware accelerated doesn't necessarily mean it's faster. For example, Java chips never took off because an x86 with a really good JIT does the job better. https://en.wikipedia.org/wiki/Java_processor

However, depending on how flexibly you can program it, it may not be quite as useful.

Doubling every two years, we will have brain power in a single rack in 13 years, and a brain sized box in less than 20.

"With a million cores we only get to about 1% of the scale of the human brain"

Interesting. So in terms of raw computing power, the totality of computers out there is much more powerful than a brain. I wonder when we first passed that level as a whole. (I know that we wouldn't use all our computers to make a "brain", but I think it's interesting to think about :)

This is a very inaccurate comparison for many reasons, but to get the sense of complexity of the human brain:

In the human brain, there are almost 100 billion neurons (the project plans to simulate 1000 neurons on one core, hence 1% atm). Each has about 7000 connections, or alternatively there are estimated 10^15 connections or more.

The internet has only about 15-20 billion nodes, only few highly connected, the Internet is more of a tree than a complete mesh.

So in some way, you could say we are nowhere close to obtaining the computing power of the brain, if we, of course, disregard the fact that the nodes, computers, are very powerful on their own. But this is where the comparison breaks. The human brain is powerful due to the complexity of its network, not the sheer amount of "working units" (cores/transistors).

Comment on the project: it is exciting, but it must not be understated that it is still a very gross approximation, since the neuron models used are simple. It might be the case that interesting behavior will arise using this simple model, but it also might be the case that the secret sauce is in the fine behavior of each neuron. The hope is that interesting phenomena can be observed with this simpler model, perhaps a bit like classical physics can often be used without requiring the full model of quantum mechanics.

I would extend this to say that when you get down to the level of synapses you're now talking about memory and not CPU. (If you actually want to implement this) 100 trillion synapses is better mapped to 100TB of RAM.

It's also very important to remember how incredibly slow the human brain is. Were talking 100s of ms slow to go from sensory input to motor output. A single neuron might take between 1 and 10 ms to fire, and a single dendrite might take 1/10th that time, so at best your doing computation at 10khz. CPU has 5 orders of magnitude over biology.

The problem is we have only a vague idea of how the network is connected and don't really know the algorithm thats being implemented by that network. So we fall back on things like simulating ion channels which take way more compute resources than necessary. There is a lot of cargo culting going on right now, but of course it's also insanely exciting and fun to find out what does and doesn't work.

With 1000 neurons per core running in series, that would eat up a lot of the speed advantage.

[Troll] 100 bn neurons * 7k connections + 1 soul. Nait! :)

"With a million cores we only get to about 1% of the scale of the human brain" (or ten mouse brains), he says.

This, he says, requires about 10 racks. So a human brain would require 1000 racks, or about half a typical Amazon data center, of which Amazon has more than a hundred.

We'll have to see if he can do anything useful with all those CPUs.

> With a million cores we only get to about 1% of the scale of the human brain

That statement also requires an assumption that there's a logical one-to-one mapping between an organic nueron and an ANN neuron. In reality, ANN's are modeled using very loose approximations of organic neurons. For all we know, within each organic neuron, there are billions of parallel 'threads' of 'computation' or there are links between neurons that a traditional model doesn't capture. Or any number of things. We're simply too far from understanding what actually happens in brains to even start talking about computational equivalency.

> I know that we wouldn't use all our computers to make a "brain", but I think it's interesting to think about

Instead of SETI@home [0], SAI@home (search for artificial intelligence) ?

[0] setiathome.ssl.berkeley.edu/

But the capabilities don't scale that way and the interconnect and topology of how that computation is arranged is completely different between brains and existing networks.

It's cool that they compare it to mice brains and fractions of human brains, but how similar are they really? Do they operate at the same speed? I'd guess biological neurons trigger slower. Are they as connected? It's nice that it's a hexagonal mapping of a toroids surface, but neurons are connected in true 3d, how much more connected is that?

>but neurons are connected in true 3d,

not exactly. The most of the "thinking" neurons form the surface of the brain - that surface is wrinkled into "gyri" to achieve more than 2D, yet it isn't 3D. In the language of Hausdorff dimension it is 2.x something.

And the neuron connections are pretty structured topology that is far from each-to-each. If, for example, you look at this picture https://en.wikipedia.org/wiki/Fornix_%28neuroanatomy%29#/med... you can see how the wires from the neurons of the hippocampus gyrus are bundled together into the "fimbria" and routed through the "fornix".

Here's a very short video showing white fibre bundles in a human brain.




There are connections between all the cortical layers. See the description of cortical columns: http://brain.oxfordjournals.org/content/brain/120/4/701.full...

He mentioned that 1M cores only gets to about 1% of the human brain. But 10x mouse brains. But as he also mentions, this is in raw power, whereas much of what makes our brains function is the topology or other aspects that are unknown and they can't model.

The software still models a full network, the hardware has a different network topology.

I still think interconnectedness may well be the bottleneck in mimicking a true brain, and its behaviour.

There's also a video of wiring the rack (took 4h) and how the wiring for the whole 10-rack cluster would look like: http://jhnet.co.uk/projects/spinner

"with a little bit of mathematics too" how wonderfully understated.

He actually said "with quite a lot of mathematics too"

What technology do we have right now to map the connections of the brain? Is there anything that currently exists with enough granularity to take a snapshot of our neurons and their connections?

edit: thanks for the replies, did some research found this TED talk that visualizes the problem pretty well https://www.youtube.com/watch?v=HA7GwKXfJB0

The MIT & Harvard medical school and CS had a joint project called something like the Brain initiative to collect that combining results from:

- inferences of neuron interactions from electronic wiring individual neurons in mice brains;

- fMRI and tFMI (a mathematical extension that associates area that are activated one after the other) on live humans;

- scanning ultra-thin slices (volunteer human brain donor) and using that for an neuron-to-synapse complete map of at least one brain. Both the slicing technology and the AI to re-combine images into not just a 3D model, but a complete connectron are pretty mind-blowing achievements.

If you combine that to the possible impact in AI and medical learnings on how things like degenerescence, you can imagine why it ends up feeling far more ambitious than the Manhattan project.

fMRI I suppose. As far as I know, even if you have all neuron mapping of human brain, you could not achieve basic intelligence as brain constantly alter with these connections and their strengths. Even if you know all the connections, neuron firing types and representation of information over the network would create a big question for you.

Yea if you think about it short term memory is necessary for the movement of consciousness from state to state. Otherwise it'd be like flipping a toaster on.

How do the PCB boards communicate among themselves? The cables don't look like ethernet (CAT6).

From the SpiNNaker project page:

>The control interface is two 100Mbps Ethernet connections, one for the Board Management Processor and the second for the SpiNNaker array. There are options to use the six on-board 3.1Gbps high-speed serial interfaces (using SATA cables, but not necessarily the SATA protocol) for I/O...

They almost look like SATA cables... not sure though.

Could be infiniband

I wonder what software they're using; they mention that the system is event-driven, but I can't see any sign of what they're using to implement this, whether custom or COTS software...

Mostly python, their code is at https://github.com/SpiNNakerManchester

That's mostly front-end stuff for using the machines; the actual code running on the custom hardware is generally a lot lower level.

Title should say 1 million ARM cores

That is the video title on Youtube and is the goal of the project, but the video itself shows a 100K ARM cores machine, not a million.

The YouTube title is 1M. They did not put in all the commas we normally expect.

yes title is 1m, but they show ONE 100K cores rack, the plan is to have 10 racks.


Given that this is hacker news, I think most people recognized it as an IP address that is not public.

Since when is 192.168.x.x public?

The entire 192.168.x.x 16-bit block is part of the private network.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact