
HPE unveils The Machine, a single-memory computer with 160 terabytes of memory - aomega08
https://venturebeat.com/2017/05/16/hp-enterprise-unveils-single-memory-160-terabyte-computer-the-machine/
======
eveningcoffee
I am not sure if I should be impressed until they reveal more details.

160 TB over 40 nodes is 4 TB per node, which assuming 512 GB dimms requires
only 8 dimms (4 per socket) or with 256 GB dimms 16 dimms (8 per socket).

How fast is the interconnect? 100Gb/s?

So far it sounds like an common high memory HPC cluster unit with really
unknown technical parameters.

~~~
closeparen
Isn't the idea that a single process can address all 160TB of memory as if it
were local?

Of course you can cobble together that much memory when you're programming a
distributed system of communicating processes. The interesting part would be
programming it as if for one computer.

~~~
eveningcoffee
As far I understand, this memory would be not local to any of the processors -
all of them access it through shared bus (fabric as they call it) and it
appears to be handled rather as a weird (their words) disk device than as a
memory.

I did not find information about the bandwidth or latency of such
architecture.

------
lxtx
Wasn't the Machine supposed to be a memristor computer? Did they ditch the
project?

~~~
mugsie
AFAIK there was licencing issues with memristors - in usual forward thinking
HP way, they exclusively licenced the tech to someone else, who did no work on
it, so it was not ready for the machine.

The IP is also a joint asset of HPE and HP Inc, so that sounds like something
I would avoid baking into my product if I was HPE

------
jhiska
Neat press release. Bombastic, eye-catching, has people asking what it is.

But the product is just a high memory HPC cluster unit, 4TB per node. It's not
"revolutionary", it won't "change everything", it's not a "technological
breakthrough".

 _TL;DR:_ _It 's press-bait._

------
lossolo
> The new prototype has 160 TB of shared memory spread across 40 physical
> nodes, interconnected using a high-performance fabric protocol.

So basically something similar to RDMA scaled to 40 nodes, 4 TB RAM per node.

~~~
convolvatron
the article is pretty weak, people have built rdma machines that size before,
and there have been architectures that allow for direct memory addressability
of that size before.

so i have to assume the latter. that seems to be borne out by the little
information i can find...addressable persistent memory is clearly a theme. but
i haven't found any discussion of what kind of latency hiding mechanisms might
be at play and what kind of consistency model is being used. will keep looking
for anything detailed an authoritative

(edit - this seems to be pretty relevant
[https://www.labs.hpe.com/publications...but](https://www.labs.hpe.com/publications...but)
i don't know how much of it was speculative and how much was built...after all
the security papers, there are some about concurrency control)

~~~
digikata
The Next Platform has had many articles with bits of detail on HPs Machine.
Unfortunately, they don't really have any tagging mechanism, but you can
search the phrase "the machine" or look at related articles there.

[https://www.nextplatform.com/2017/01/09/hpe-powers-
machine-a...](https://www.nextplatform.com/2017/01/09/hpe-powers-machine-
architecture/)

~~~
drewg123
Great resource. Based on that, it looks like the fabric has a bandwidth of
600Gb/s (or 1.2Tb/s full duplex).

The interconnect seems to be the real innovation here.

------
anocendi
The Machine should be fine, but if Samaritan comes online, we are doomed!

------
bitL
How does the Machine recover from software errors? If it has only persistent
RAM, then when some important program goes bonkers (it always goes), there is
no luxury of pressing the reset button to get back to a pristine state.

~~~
lawpoop
My guess is that there would be a microkernel service that is the core OS,
like the BIOS, and it could be instructed to actively wipe the memory and
reload the boot files when the machine freezes

~~~
bitL
There can be bug in a microkernel service (and as it is distributed, it is
actually guaranteed it will crash unexpectedly at some point)

~~~
lawpoop
Well I guess it's a disposable computer then. Use until it crashes.

------
ismail
This falls into a scale up type (vertical) architecture, in what scenarios
would you need tin like this? Versus A more scale out/distributed (horizontal)
architecture

~~~
loeg
It's a 40 node mesh. Isn't that scale-out?

~~~
ismail
Let me rephrase rather. a scale out architecture with commodity machines
(using Hadoop, kube, mesos etc.) versus an engineered system in hardware. A
client purchased a an HP super dome (another highly engineered system) the
cost per CPU core, gb ram, tb of storage was horrendous. More than 10x
comparable commodity hardware solutions. 8 months later and they hit
performance problems, space issues and adding capacity is a serious chunk of
change.

What I am trying to understand in what use cases would this make sense?

~~~
loeg
This is a scale-out solution, it's just higher performance / lower effort than
building your own solution from commodity harwdare. Enterprises often want to
buy canned, supported solutions rather than expend the R&D to build their own.

What workload does this configuration make sense for? I don't know, especially
with those ARM cores. My guess would be the cluster performs well on workloads
similar to Apache Spark. But I don't know why you would choose one over the
other.

------
moftz
Interesting to see ARM starting to become datacenter worthy now. It's cheap
and there is a version geared towards any application you can think of. Now
let's see HPE use some FPGAs with embedded ARM CPUs. That would really be the
killer for all the smaller shops that are building similar exabyte systems.

~~~
happycube
A lot of it is the big customers can go to a chip company and say "I want this
SOC, with X, Y, and Z" \- and if they want enough of them, will actually be
able to get them at a reasonable price.

I read this is why Qualcomm did their data center chip - some of the big
Chinese companies wanted it.

------
mirekrusin
Nice for Redis.

------
cdelsolar
This is funny, because in 100 years, computers with 100 yottabytes of memory
will be standard, and they'd be laughing at a headline like this one.

~~~
krylon
Or they will let out a nostalgic sigh as they think about those simpler days
where you could get useful work done on 4 GB of RAM... ;-)

------
breatheoften
"and also allows accelerators to get direct access to a massive memory-storage
footprint" \-- ehh?

------
moonbug22
I'm sure those SGIers now at HPE will be delighted to see so much being made
of the UV.

Oh.

------
logicallee
Not enough.

Because, you know, there happens to be something else that is driven by
memory. I'll give you a hint as to what it is: it's comprised of approximately
a hundred billion units. (And no, it's not a galaxy.)

Divide 160 terabytes by the number I just stated, and you'll find that that
you only get 1.6 kilobytes per unit. Not really enough (or at least, too close
for comfort).

Step up your game, HP! :) But, this is a VERY good start.

-

Edit: I got downvoted to -4 but I have a right to state my requirements. If
your requirements aren't that high, use whatever equipment you want. (the
guesses below are correct, if you want "the answer to the puzzle.")

~~~
binarymax
Maybe just say what you mean, instead of riddles, as part of contributing to
the discussion?

~~~
flak48
~100 billions neurons in a human brain I guess. Not sure what 1.6KB per neuron
is too close to

~~~
logicallee
This comment currently has a reply (
[http://i.imgur.com/N7BZJfu.png](http://i.imgur.com/N7BZJfu.png) ) which shows
why I wanted to leave my original post as a "riddle" rather than coming out
and saying it.

The following "crap" is not relevant to anything, only read it if you want the
answer to why 1.6 KB is insufficient for a certain science fiction fantasy
which will never, not in ten, a hundred, or a thousand years, become relevant
any more than magicians will start flying around on brooms. It has nothing to
do with anything. Don't get me wrong and think that I think it has relevance
to something.

Original version of this comment:

\--

Well, it's clear that you could not come even close to perfectly describing
the state of a cell and all its connections in 1.6 KB. As just one example,
every cell has a different genome[1] which if you encoded it in full would be
700 MB right off the bat. I'm not saying the fact that every neuron has a
different genome is _relevant_ to its computational functions, but 1.6 KB is
cutting it very close. Let's explore why this is so.

Suppose we simplified and said that each neuron may be connected to n other
neurons (we will calculate what n fits in our memory). Addressing 100 billion
takes 36 bits. (You need approx. 36 bits to name a number between 1 and 100
billion, i.e. if all you're doing is naming which other neuron it's connected
to.). So let's see how many "synapses" the 1.6 KB might be enough to address.
-- we are looking at how many connections could be encoded into 1.6 KB.

So 1.6 KB if you did nothing else but as a simplification say that you were
referring to another neuron, then 1.6 KB = 1.6 * 1024 * 8 = 13107 bits, divide
that by 36 and you get 364 addresses. Suppose you were to then encode each of
those addresses with a single byte (an 8-bit brain) representing the strength
of that connection, and you are down to only encoding 13107/44 = 297 neural
connections with an 8-bit connection value. [EDIT!! As pointed out in a reply,
original version of this paragraph contained a math error - I originally
divided by 8 after dividing by 36 -- as though each bit of the address needed
a connection strength, rather than adding 8 to 36 to get the number of bits
needed for each connection and only afterward dividing that into the 1.6 KB]

Okay, so let's see how "297 neural connections" stacks up.

In fact "Each neuron may be connected to up to 10,000 other neurons", but
"Each of the neurons has on average 7,000 synaptic connections to other
neurons." This means we do not have 8 bits - we do not have even a single bit.
(Because 7,000 is more than 297 or even 364).

So you cannot even store the full _address_ of each synaptic connection (the
full address of the other neuron it's connected to) let alone a value of the
strength of it, or anything else computationally interesting that may be going
on chemically. Of course, obviously one neuron will not be directly connected
with another far away, for example on the other hemisphere of the brain
(meaning you can almost certainly shave nearly 1 bit of addressing off of the
36 bits from the get-go, and likely you need a far smaller address space, if
you really look at it.)

But as you can see, however you slice it, you're just cutting it extremely
slow. If synaptic connections had even 10 bits of value of some kind than
you're way over your budget here.

But as you can see we're close - the numbers just barely don't work.

HP are on the right track - they simply need to step their game up.

[1] recently in the news -
[https://www.scientificamerican.com/article/scientists-
surpri...](https://www.scientificamerican.com/article/scientists-surprised-to-
find-no-two-neurons-are-genetically-alike/)

EDIT* - thanks for the correction, yes, there was a math error.

~~~
SomeStupidPoint
You made a math error:

You can encode about 300 addresses, at 36+8 = 44 bits per item, not 45
addresses, at 36*8 = 288 bits per address.

The bits are additive, because we're storing an address and data, not a byte
of data per bit of address.

~~~
logicallee
thanks, corrected.

