
A Look at HP's “The Machine” - jsnell
http://lwn.net/SubscriberLink/655437/9a48cd3e7a8cbe8a/
======
static_noise
To me "The Machine" seems like a vision which is used to explore the possible
problems and possible solutions that surface when one has to rethink computing
with a really big address space.

If one looks at how current computer technology has developed, especially
evident in the x86 architecture, semiconductor processing, some popular
programming languages and the popular operating systems, it becomes obvious
that we got here by many many small incremental improvements. Rarely someone
pulled the rabbit out of the hat and built a new kind of hardware, wrote a new
kind of software and addressed a new kind of market with it.

Why were these incremental improvements working? Because they solved the most
pressing problems at the time using solutions which could be tested and
implemented to work properly in the respective market scenario.

That said, there is value in testing new ideas and making revolutionary
experiments which never reach the market. The knowledge gained on the way
sometimes can be used to solve problems in other existing systems or introduce
new paradigms which prevent current developments from getting stuck.

------
StillBored
In the end, I suspect they will have reinvented an AS/400/ibm i attached via
fibre channel to a flash array...

Except that was back when they were writing their own OS to run on it. By
doing that, the possibility seemed to exist for all kinds of crazy ideas. Once
they try to conform it to a posix like environment I suspect that a lot of
possible hardware advantages will start to evaporate. Either the external ram
will be maintained in a (semi)consistent (reinventing of any number of single
system image clustering mechanisms) manner, or they will end up using a
filesystem or MPI type layer on top of it, thereby reducing its potential
advantages.

~~~
nickpsecurity
I think something like this will have to be clean-slate. The trick used by SGI
and Cray, though, was to have some nodes run a full OS with compute nodes run
a tiny OS. Then, there were storage or I/O nodes to handle that stuff. Most of
the parallelism was in the operation of the compute nodes, interconnect, and
storage. So, this worked out pretty well in practice for many MPP-style
system. Today's work is trying to make denser, more efficient MPP's. So, they
could use similar techniques as what worked in the past.

The use of ARM processors threw me, though. Usually problems needing this much
memory and I/O take similarly beefy, multi-core, multi-CPU nodes to handle it.
That wasn't good enough so many added GPU's and FPGA's to nodes. So, kind of
wondering if their little ARM chips will cut it. Maybe something like Cavium's
ThunderX could do...

------
sciencesama
The first official HPQ memristor announcement was apparently in 2008, and at
that time there was conjecture it would be used both as analog and digital
recording media:

[http://bit.ly/1fyBrWi](http://bit.ly/1fyBrWi)

HPQ said in 2011 that they would have memristor products within 18 months:

[http://bit.ly/rtVfno](http://bit.ly/rtVfno)

In 2013, HP's Martin Fink was saying they would have memristor based HDD
replacements by 2018. At that time, they claimed those 'early' memristors
would be faster than FLASH, but not speed-competitive with RAM:

[http://bit.ly/1RO05ke](http://bit.ly/1RO05ke)

Then in 2014, HP's Martin Fink was telling media that memristors would be
faster than dram. He produced a chart showing memristors at 10 ns (no range)
while dram technology ranged from 10 to 50 ns:

[http://bit.ly/1fyBrWk](http://bit.ly/1fyBrWk)

In October of 2014, HP's Martin Fink was saying they would deliver a 'clean
sheet' operating system in 'The Machine' with 157 petabytes of addressable
memory and 150 compute nodes by the end of 2016:

[http://bit.ly/1fyBtO6](http://bit.ly/1fyBtO6)

As late as December of last year, they were promising a 'Revolutionary' new
operating system in 2015.

[http://bit.ly/1fyBrWo](http://bit.ly/1fyBrWo)

Now, there is supposed to be a show-and-tell prototype of 'The Machine' to be
demonstrated sometime in 2016. It will not have memristors, and will run a
slightly modified form of Linux. It is supposed to have 2,500 cpu cores and
320 TBytes of conventional RAM memory. That is about 500 times less memory
than the 157 PBytes they were talking about last year. I am guessing without
evidence that the processing distribution will be about 1250 Xeons with 20
cores in each, but really don't know. That amount of conventional memory
suggests it won't be in a trade show, but will be in a controlled environment
for invited guests, where HP can better control the message and defer all the
questions:

[http://bit.ly/1fyBtO6](http://bit.ly/1fyBtO6)

Essentially, the point of all this is that HPQ has a long history of failing
to deliver on and/or lowering the expectations of while pushing out delivery
dates of announced products. There is no real reason that I can see to believe
that HP Enterprise or whatever will suddenly establish the follow-through
integrity that has clearly been lost in HPQ. Those fast/slow/soon/late/high
yield/low yield memristors might never be available from a financial corporate
descendant of HPQ.

I think there is a small fundamental flaw in the message HP is promoting about
distributing their processors far apart from one another in an effort for each
processor to be closer to local memory while having a seemingly contradictory
flat memory model. That is that most of the tasks one would want to execute on
such a machine would certainly be multithreaded. In many multithreaded tasks,
parent threads need to quickly know when daughter threads have completed their
tasks so that they can use these interim results to continue. The speed of
light in glass fiber is typically about 2/3 that of light in a vacuum. It is
reasonable to assume a typical thread process will execute (at maximum
efficiency) at more than 6 billion instructions per second in the year 2020.
So, let's say that a daughter thread is running 50 feet away (not physically,
but as the optical cable bends) from a parent. This means that the daughter
process could have executed over 450 instructions while the "thread complete"
message is in transit to the parent (in addition to the overhead in more
conventional architectures) and would have to wait out more than 450
additional instruction executions while waiting for the parent thread to begin
issuing something new. Translated, the concept of 'The Machine' would lose its
speed advantage in a programming environment where thread execution
instruction counts are relatively low, such as the majority of in-memory
database object manipulation.

In contrast to 'The Machine', successful architectures will, I think,
increasingly move their processing cores together more akin to directions of
the Nvidia Tesla, Intel Phi, and AMD FirePro, to name a few. Speculating
further, massive amounts of memory may likely be arrayed spherically outward,
perhaps on a 3-D radiant structure with cooling fluid [such as helium gas]
running within the fingers. If each of these limbs comes with threading,
contacts, and a seal system so it can be removed and replaced, then the mean-
time-to-repair can be kept low.

My guess is that even though there has not been any disappointing news about
'The Machine' in over a month now, there will be a lot more as the dog-and-
pony show that is supposed to occur late next year nears, and there will be
more still prior to 2020

~~~
tedunangst
It would be deliciously ironic if the final shipping machine uses itanium
processors.

~~~
cmiles74
...And it would be nice if HP finally hit some kind of payoff for investing so
much in that doomed processor.

------
white-flame
So each CPU has its own 256GB of local memory, with a many-TB shared pool that
any of them can load/store into, using a new interconnect.

How is this any different bird's-eye architecturally from current builds
involving compute nodes talking to a shared RAM-backed datastore? Is it
fundamentally different to put it directly into the address space, with the
collision problems that presents?

~~~
jdnier
"to allow addressing up to 32 zettabytes (ZB)" \-- I think that's the first
time I've seen ZB in print.

Wikipedia link: "1 ZB = 10007bytes = 1021bytes = 1000000000000000000000bytes =
1000exabytes = 1billionterabytes = 1trilliongigabytes."

~~~
bronson
ZFS was originally named the Zettabyte File System. Not sure why Sun changed
it. The acronym doesn't mean anything now.

------
DigitalJack
Sounds like the are reinventing the z/architecture.

------
zxcvcxz
>There will not be a single OS (or even distribution or kernel) running on a
given instance of the The Machine—it is intended to support multiple different
environments.

What does this mean? Doesn't it need some kind of base system so it knows what
to do when someone loads a new OS?

~~~
static_noise
This sounds just like virtualization where some hypervisor exists to guard
hardware access. This probably could be implemented in hardware or running in
a separate processor. So calling that an OS is a bit of an exaggeration.

------
luckydude
Am I the only person who thinks this looks sort of like the ETA-10 (circa
1987ish)? With the difference being it is load/store rather than bcopy but if
I read it correctly they both had the coherency problem.

------
nickpsecurity
I'm not trusting this one bit. Sounds like many descriptions of developments
that later ended in bankruptcy or acquisition. Just give me a bunch of nodes
with Octeon III's, TOMI's, and/or Achronix FPGA's in a SGI UV-style system
(esp interconnect). Should meet almost any need I have for quite a while.
Hopefully, the exascale designs are worked out by the time that isn't true. :)

Gotta leverage what we have a bit better even in designs pushing the envelope.
Can always add better tech as it gets proven.

------
Hardenedsoft
I remember someone telling me they already had all this built decades ago, and
were bringing people interested in computer science to see the system, and
perhaps see if they could have any input.

Most who were competent and not fooled by the attempted facade of this world,
would simply show up, kick it(manually boot), and leave laughing at the
stupidity...\

That was decades ago...

~~~
dang
It sounds like you might have a good story here, but there isn't enough
information for the reader to tell. Your comment would be more informative if
you added the details, which people here would likely be interested in, and
took out the insult.

~~~
kragen
It's hard to tell, but it sounds to me like maybe the grandparent commenter is
suffering from paranoid delusions, like the Time Cube guy, and thinks we're
all "educated stupid", and that the only reason computers are still getting
faster gradually is some kind of conspiracy among computer companies. But it's
not a whole lot of text to base a full diagnosis on.

