Hacker News new | comments | show | ask | jobs | submit login

Apparently, the Cray T90 in the 32-processor configuration had 360GBps of shared memory bandwidth - this is still many times above a shared memory configuration that you can get on the desktop today. Of course, supercomputers have largely moved on from shared memory systems to clusters, and have larger aggregate bandwidth - but shared memory still has its uses. Just a point for all the comparisons to smartphones and what not.

EDIT: just to expand on this a bit, this means that there are workloads that these old-school supercomputers will run much faster than a modern high-end desktop. This particularly applies to workloads which have a lot of shared memory access at random locations - a very difficult case for the modern systems, which depend on high cache hit rates. Also, the GFLOPS ratings of these supercomputer processors are in many ways more real than the ratings of commodity processors, which depend on the pipelines being filled in very specific ways. So no, you can't replace this system (at least the 32-processor version) with a smartphone or even a desktop. Which is not to say it would be cost-effective at 39 million of 1996 dollars.

Could you expand upon the types of applications that benefit from what you describe? I'm guessing simulations, but would love to hear more details.

Most simulations can actually be parallelized in such a way that most memory accesses are local - you can divide the object or system in such a way that various parts don't communicate with each other too much, and map that to a cluster computer. But there are some exceptions in which parts of a simulated system can affect each other remotely. For example, in a lot of nuclear simulations, particles produced in one part of the system can very quickly travel to other parts; you have a problem when you have particles in your system which operate on different time scales, e.g. neutrons, heavy nuclei, and photons. This is a big reason why the DOE and nuclear laboratories liked these supercomputer systems.

Another case which I saw personally was a simulation of a part of the visual cortex of the brain; you had neurons which were connected to their neighbors, but you also had a bunch of connections to far-away neurons, and the bandwidth between processors which simulated different parts of the cortex became a limitation (and the huge supercomputers which had the bandwidth were (a) expensive, and (b) had relatively slow processors for the number crunching in each region).

Except in this case, I found that the physical delay which existed on a long connection between neurons allowed us to buffer the messages and send a notification about the whole train of impulses, effectively compressing the data. Together with some other simple changes, the simulation ran 10 to 100 times faster, and could use clusters instead of supercomputers.

In general, there are not that many cases in which you really can't get rid of the requirement of fast non-local memory access; if there were, these supercomputers wouldn't have died out. But they were useful in some cases, and were also good for freeing people from thinking about how to localize their memory accesses - this speeded up development.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact