
64 Terabyte RAM computer from SGI - auvi
http://www.sgi.com/products/servers/uv/
======
SwellJoe
SGI still exists? Somehow they'd been filed away in my head in the "really
cool tech companies that disappeared" like DEC and Hewlett Packard (I am aware
there is a company called "HP", but it exhibits no evidence of being what was
once known as Hewlett Packard). Maybe the fact that Google campus is former
SGI campus made me assume all of SGI was gone.

~~~
DiabloD3
Go read the Wikipedia article on SGI. You're actually correct, the SGI of old
is long dead.

Some other company bought them (Rackable Systems) just for the name and are
selling products that are not anywhere near as cool as the old SGI products,
but are still considered super computers and big storage.

~~~
rdtsc
You are right.

I do miss SGI. They had really cool workstations. Expensive as hell. We had to
lease them instead of buying them.

Besides doing actual work, remember playing and compiling a bunch of OpenGL
demos on them. Even found the Jurassic Park file browser, by accident, and
only years later connected the two when watching the movie the 2nd or 3rd
time.

I remember the "onslaught" of Windows NT and Windows 2000 workstations with
larger, beefier graphics cards, more memory and faster processor. I could tell
it was the end for SGI. But I will always remember them fondly.

~~~
bashcoder
I also enjoyed their Indigo workstations, which included a 3D stereo goggle
viewport. In 1998 I spent 6 months working on a 64-cpu Origin 2000
supercomputer, which had some serious power for long-running Computational
Fluid Dynamics jobs.

------
aheilbut
If you're a researcher in the US, you can access a 32Tb version of one of
these boxes (PSC Blacklight) through NSF XSEDE. (And it's surprisingly easy to
apply and get an allocation! We used it to do some de novo transcriptome
assembly when we couldn't find a big enough machine locally)

~~~
batbomb
Or you could apply to NERSC and hop on the Cray XC30. It was actually "free"
for a while to use.

------
habosa
How bad is memory latency? If this is really a 64TB address space you are
going to need an insane cache hierarchy to make this fast unless they've made
some scientific breakthrough and not shared it with the world.

~~~
rwallace
It probably does have an insane cache hierarchy, but in any case think of it
this way: latency is going to be dramatically better to main memory on this
machine than to disk, which is what the alternative would be if you have a
highly nonlocal workload.

~~~
habosa
True, but gotta wonder how much better this latency is than a more reasonable
amount of RAM backed by a super-fast SSD.

~~~
frozenport
There isn't a bus with the characteristics you mentioned.

For example, SATA 3 saturates at 1 GBps.

A few years ago I measured something like 50GBps on these guys. The real trick
is to walk a graph with multiple processors so that you saturate the bus. That
being said I liked the Yarc data architecture more.

~~~
andruby
That's why the fastest SSD's use PCI Express instead of SATA.

[1] [http://www.fusionio.com/products/iodrive-
octal/](http://www.fusionio.com/products/iodrive-octal/) (6GB/s)

~~~
frozenport
There is an interesting story here, take a look at the benchmarks
[http://regmedia.co.uk/2011/04/07/ssd_write_drop_off.jpg](http://regmedia.co.uk/2011/04/07/ssd_write_drop_off.jpg)

I did a bunch of work with SSDs and trying to get high throughput, at the end
of the day I could touch ram at 10 megabytes per millisecond but only do IO at
2 megabyte per millisecond.

------
pnewman2
Tangentially related story: On my first ever visit to Akihabara, back in the
autumn of 2000, I saw a used SGI workstation (I think an Octane, it was
rounded and kind of a teal blue) on sale in a really cool electronics shop.
They had all kinds of other great stuff -- a big mixing board from a studio,
television cameras, rgb monitors, stuff like that. The price was not that
crazy but I was a broke college student, so I had to be satisfied with just
the coolness of having seen it. So I moved on to the shops with robot parts
and old arcade boards and stuff.

Akihabara is still a fun place to visit, but it seems to have been taken over
entirely by the Otaku culture. Otakudom has always been a part of Akihabara,
but now it seems like that's all there is. I miss the old Akihabara and the
DIY/tinkerer spirit of it.

------
comex
Note that the size of the virtual address space on current x86-64 processors
is only 256 TB! (And half of that is usually reserved for the kernel.)

And inevitably, some programs take advantage of the other 16 bits to store
data, so even if you get new hardware to use the full 64 bits and kernel
support to match, you'll have to watch out for JavaScript engines and other
programs randomly failing :)

------
patrickg_zill
The interesting thing is that this is single system image - it just looks like
one very large desktop computer.

EDIT: add "single"

~~~
MichaelGG
Right but that is probably deceptive. Just like you can mount a network drive
and it "looks" like local storage. If you treat that memory naively (even like
on 1TB RAM NUMA systems), you're in for a bad time.

As a reference point, getting a cache line from one CPU to another on the Xeon
5600 takes ~300 cycles, IIRC. That's just in a two-socket cheapo machine.

It could be considerably worse in this system.

I'm not experienced enough, but so far from what I've dealt with, treating
NUMA systems as separate nodes and coding them as such is the best way to deal
with things. And it lets you scale out to multiple machines easily, too. But
there's probably some workloads that benefit from having what appears to be a
single memory space. SQL Server, for instance, is aware of the various memory
hierarchies and can optimize around it, so it might allow scale-up where
scale-out is simply not an option.

~~~
_delirium
> Just like you can mount a network drive and it "looks" like local storage.

People do that all the time, though! NSF-mounted network drives (often backed
by a NetApp-type box) that give you cluster-wide permanent storage is the
standard way of setting up a compute cluster. There are downsides, but it
greatly simplifies many things vs. not having the same home directories and
software on all the cluster machines. Or, to take a more cloudy example, it's
how Amazon EBS works.

These monster NUMA systems are usually intended for code that's difficult to
turn into cluster code, though, because of too much interaction needed between
parts of the computation. Usually the computation doesn't have to literally
access _all_ of the memory and cores simultaneously, so the fact that it's
NUMA isn't fatal, especially if you have a decent scheduler (improving NUMA-
aware schedulers is an active research topic). But it's often difficult to
partition in a clean way so you can just mapreduce the work onto cluster
machines. These SGI machines don't eliminate the problem, but by offloading
cache coherence to hardware it can both simplify code and improve efficiency
vs. trying to handle everything in software. If you have code that isn't
amenable to a simple map-reduce type architecture, and you don't have hardware
cache coherence, you end up rolling your own state maintenance over a network
protocol or MPI or something, perform explicit work migration via task
checkpointing and task queues, or via finer-grained MPI blocks that produce
smaller tasks not needing migration, etc. Which is all more bug-prone and
probably slower. Also if you have ancient legacy stuff you need to scale up,
the SGI box will be more likely to at least run it successfully without
porting.

~~~
MichaelGG
Yep, special cases.

My comment about mounting network drives as local is that all of a sudden, a
file move takes a non-trivial amount of time, and may even timeout. Opening a
"Windows Explorer" type view and generating thumbnails becomes super
expensive.

Naive software, in personal experience, doesn't even work well with modern
multi-core, multi-cache CPUs. Even when it's multithreaded, if it wasn't
designed with all this in mind, you're better off running multiple processes
on a single machine, treating each core (or sometimes pair) as as separate
computer.

------
userbinator
A 64TB flat address space? I wonder what the latency is like...

~~~
somethingnew
That's my question too... There's a reason there's such a thing as a cache
hierarchy.

------
axilmar
It may seem like a large amount of memory today, but it may not be for
tomorrow, just like 4 GB main ram seemed a huge amount of memory 30 years ago.

~~~
userbinator
When x86 started going over 4GB and getting the 64-bit extensions, it was
thought that a 64-bit address space would be so big (more than 4 _billion_
times bigger than a 32-bit one) that it would take many decades before we run
into that limit; and that wasn't too long ago either - the first AMD64 CPU was
in 2003, only 11 years ago, and it supported "only" 52 bits of physical
address.

Now we have 64TB, which is 2^46, which means there's only 18 "unused" bits of
address left - 256K. If you could connect only(!) 262,144 of these machines
together and present the memory on them as one big unit, you would have
_exhausted the 64-bit address space_. That is what I think is really
incredible. What's next, 128-bit addresses? Or maybe we'll realise that
segmented address spaces (e.g. something like 96-bit, split as 32:64) are
naturally more suited to the locality of NUMA than flat ones?

------
witty_username
I read that as 64 GB and thought that's not impressive for a server. Then I
saw it was TB...

------
samstave
Oh god! they found a way to get back into the ludicrously priced computer
market!

------
DiabloD3
Is it wrong that I want one?

~~~
simcop2387
I love the idea of running entire VMs from ram.

~~~
stonith
We started doing this when I was working at Cisco for a CI system - rack
servers with 768GB of ram and tmpfs as instance storage for Openstack servers.
It worked pretty well.

------
jostmey
This machine would probably be really useful for analyzing genetic data :-)

------
sourcex
How much would this cost ?

~~~
wmf
Let's see, 64 TB of RAM is probably at least $1M, 256 E5-4xxx will be another
million, then you add the SGI goodness... you might be lucky to get change
from $4M.

------
sytelus
Amazon should offer this as "Monster Instance" on EC2.

------
nighthawk24
Any idea how much Scrypt hash rate will this computer provide?

~~~
wmf
It has 2048 Sandy Bridge cores, so maybe 20 MH/s.

