
Nobody ever got ﬁred for buying a cluster - jpmc
http://research.microsoft.com/apps/pubs/default.aspx?id=179615
======
marshray
> a single “big memory” (192 GB) server we are using has the performance
> capability of approximately 14 standard (12 GB) servers.

That's 168 GB RAM total, within a single upgrade unit of their 192 GB single
server, suggesting the problem is dominated by RAM.

If I had $6638 + 2640 = $9278 to spend on computing hardware from NewEgg, how
about:

    
    
        1 at $1000 of HP ProLiant DL360e Gen8 Rack Server System Intel Xeon E5-2403 1.8GHz 4C/4T 4GB http://www.newegg.com/Product/Product.aspx?Item=N82E16859107943 http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-5249570.html?dnr=1 (12 DIMM slots)
        4 at $70 of Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1333 Server Memory Model KVR13LR9S4/8 http://www.newegg.com/Product/Product.aspx?Item=N82E16820239540
        1 at $54 of Seagate Barracuda ST250DM000 250GB 7200 RPM 16MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drive http://www.newegg.com/Product/Product.aspx?Item=N82E16822148765
        $1334 ea server, 7 servers = $9338
    

So we could get 7 of these low-end name-brand 16 GB servers for the same money
to give us 224 GB RAM.

> MR++ runs on 27 servers whereas the standalone conﬁgurations are a single
> server running a single-threaded implementation

Sure, nothing will beat a single system at message-passing algorithms when the
entire graph fits in main memory. But when the dataset outgrows that (and it
will), we can triple the RAM in the empty slots, or add more servers in units
of $1334 instead of having to rewrite your whole analysis.

~~~
dragontamer
A bit unfair, esp. when Microsoft budgeted $2000+ on SSDs alone.

I think the paper has a good point: scale-up should be definitely considered.
I see systems that support 1.5 _TB_ of RAM now... do you really expect your
dataset to grow beyond that?

And as long as the dataset fits within 1.5 TB of RAM, scale up will be more
scalable than scale out.

~~~
marshray
Yeah, ignore my numbers and I agree that scale-up should be considered and
it's worthwhile to read a paper where somebody reality-checks the conventional
wisdom.

> I see systems that support 1.5 TB of RAM now... do you really expect your
> dataset to grow beyond that?

According to Figure 1, about 20% of their jobs are already beyond that. If you
spend your entire hardware budget on a single server and all your analytic
processing jobs end up being coded with the (gasp) single threaded assumption,
what are you going to do next?

Most of us have been there before because it's where you end up by accident
("Do you think if we max out the RAM on the database server it will return to
acceptable performance?") This is _why_ people love parallel map-reduce based
systems.

~~~
dragontamer
Certainly, I don't mean to contradict the conventional wisdom :-p. Scaling out
with ~$1500 2U servers makes sense a lot of the times.

I'm more concerned about another problem. There are people out there who think
that they can scale out with Hadoop cluster built on top of Atoms (or more
recently... ARMs). And while yes... there are some applications where that is
a decent scaling strategy, I doubt it works in most cases.

But yeah, I know that wasn't your argument. I'm just musing a bit on random
stuff.

~~~
marshray
I admit to being ARM/Atom M/R curious. Sometimes I wonder about Hadoop for a
few seconds until I remember it's heavily JVM-based. Poking around on the web
you find folks who sound serious about porting it to Android.

~~~
dragontamer
Curious is good. Things definitely move quickly in the tech industry, and I'm
willing to bet that within 5 years things can change. And I'd support building
up the skills / infrastructure needed to hedge against the uprising of ARM.

But at the moment, the evidence is solidly in favor of the typical 2U Xeon.
(by nature of ARM servers barely exist right now. Maybe next year ARM will
have something ready)

Atom looks mostly dead however, with Intel beginning to advertize 16W Xeons.
Can two 8W Centrino Atoms compete against a 16W Xeon? I doubt it. Intel only
has the resources to choose one "hero" chipset in server-space, and it looks
like it will be the Xeon again.

~~~
marshray
Someday somebody is going to realize they are in possession of a trash barge
full of 5 million discarded smartphones and break into the top supercomputer
list with them.

~~~
seunosewa
That idea is a non-starter. Consider the cost of transportation, the cost of
connecting all those ridiculously slow CPUs and incompatible CPUs together,
powering them, programming for them. Building a supercomputer using standard
x86_64 CPUs will probably be cheaper.

~~~
marshray
Here's an example: [http://www.androidauthority.com/samsung-
galaxy-s3-sales-20-m...](http://www.androidauthority.com/samsung-
galaxy-s3-sales-20-million-112790/) Within a few years there will be 20
million Samsung Galaxy S III's with dead batteries and/or cracked screens.
That's a dual-core 1 GHz ARM with 1 GB of RAM, 8 GB of SSD, _and_ a hardware
GPU.

Right now, there are professional 1st-world developers developing apps for all
those "incompatible CPUs". So it's possible and even cost effective under the
right circumstances.

Yet someone is currently being _paid to haul away_ this stuff as electronic
recycling.

~~~
dragontamer
A great example of why Xeons are superior. :-)

Snapdragon S4 completes the Linpack benchmark with 460 MegaFlops:
[http://www.androidauthority.com/snapdragon-s4-pro-vs-
exynos-...](http://www.androidauthority.com/snapdragon-s4-pro-vs-
exynos-4412-benchmarks-127274/). Snapdragon is rumored to use ~3 to 5 Watts of
power.

Consider a Xeon E5-2690: which gets you 347,000 MegaFlops of performance in
135 Watts.
[http://www.intel.com/content/www/us/en/benchmarks/server/xeo...](http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-hpc/xeon-e5-hpc-
matrix-multiplication.html)

Xeon just blows the ARM chip out of the water in Performance per Watt used. A
modern day Xeon gives you somewhere on the order of 20x more performance per
watt compared to a Snapdragon S4 Pro. Otherwise, all the electricity that it
takes to run your supercomputer built out of crap chips will make your
supercomputer more costly to run in the long term.

~~~
marshray
Does Linpack even touch the memory bus?

~~~
dragontamer
I don't know, but the Snapdragon has 1MB of Cache, while the Xeon has 20MB of
Cache. Furthermore, the Xeon actually has a multi-socket connector... so two
Xeons can talk to each other at a rate of 25.6 GB/s through QPI (QuickPath
Interconnect).

So my bet on memory is that Xeon wins that too.

On the multi-processor communications issue... I doubt that the Snapdragon
would have PCIe connections... let alone a high-bandwidth low-latency
connection like the Xeon's QPI.

There's a reason why supercomputers stick with Intel or AMD. They've got the
processor power, the performance per watts, and I/O girth to communicate to
other processors. A small embedded chip (like Snapdragon), just doesn't have
the features to make a supercomputer out of.

~~~
marshray
OK, I'm sold.

~~~
dragontamer
ARM does have promise though, its just not in Snapdragons. As noted, the
primary issue in Supercomputers is bandwidth (so that the processors can talk
to each other).

Calxeda for example has spent a lot of time trying to solve this interconnect
issue. Each Calxeda card is composed of 4 CPUs. Every CPU has 8 10Gb (aka:
1GB) ethernet links, 2 8xPCI connections and more.

Its a relatively young project, but they're moving towards the right
direction. There are rumors that Facebook is deploying a Calxeda server.

------
gruseom
Off-topic, but since younger HNers may not know what the title is riffing on,
[https://www.google.com/search?q=nobody%20got%20fired%20for%2...](https://www.google.com/search?q=nobody%20got%20fired%20for%20buying%20ibm)

~~~
purephase
That one sentence sums up about a decade of professional contention. And I
mean specifically relating to IBM software and hardware.

I should have left that job sooner.

------
samspenc
Nice try, but a few thoughts from someone who now spends a lot of time in
Hadoop/MapReduce. I will admit it took me a while to warm up to the whole
concept, but I'm now so familiar with it that I'm not able to think of compute
before Hadoop.

I'm able to fire off pretty intensive MapReduce jobs on an Amazon Elastic
MapReduce cluster with many nodes for a fraction of the price mentioned in the
post (less than $100).

While I can imagine I could repurpose all my MapReduce/Hadoop code to run on a
single box - especially since Amazon does offer several high-memory instances
today - I would be loathe to.

The MapReduce framework provides a really nice framework that lets me
horizontally scale out compute, rather than vertically, and that is really
handy at terabyte-data volumes (data warehousing and large-data analytics.)

~~~
dude_abides
MapReduce is handy at terabyte-data volumes, but how often do we run jobs with
input size > 1 TB? According to the paper:

 _at least two analytics production clusters (at Microsoft and Yahoo) have
median job input sizes under 14 GB, and 90% of jobs on a Facebook cluster have
input sizes under 100 GB_

The other point the authors make is that DRAM costs are following Moore's law
and terabyte-data workloads should "soon" be cost-feasible on single servers
with DRAM.

~~~
chubot
The thing is that MapReduce is still a very good programming paradigm for a
single box.

Machines have 8-64 cores these days -- you don't want to write multi-threaded
code every time you want to do an analysis. So you can write MapReduce, but
use a multicore framework instead of a cluster framework (hadoop).

The unfortunate thing is that there is no popular open source implementation
of a multicore mapreduce, so people use Hadoop, which is wasteful on small
data sets, as mentioned.

But the great part about it is that you will use the same application code for
both. I fully expect in 5 years or so that people will be running multicore
mapreduce jobs on 100 or 1000 core boxes.

~~~
kyzyl
There is Disco[1], which is a MR framework written in Python and Erlang. It's
open source and pretty awesome, and if I'm not mistaken it will leverage
multi-core processors, no need for a cluster.

[1] <http://discoproject.org/about>

~~~
marshray
Speaking of Erlang, I've got my eye on Riak pipes. Now that the Basho folks
have a large object store, I wonder if more in that department is a natural
path.

------
zdw
totally offtopic typographic comment - anyone else seeing the alt-font ﬁ
ligature in the title? It's quasi-bolded on my machined (Safari, OS X), so it
sticks out like a sore thumb.

~~~
mturmon
Yes. Chrome 27, OS X. I spent a couple of seconds wondering if it was some
typographical quirk inserted on purpose by someone.

------
marshray
"At 100 GB, scale-up still provides the best performance/$, but the 16-node
cluster is close at 88% of the performance/$ of scale-up."

If these message-passing graph algorithms are representative of a "bad fit" to
the parallel map-reduce model, I'd say a 12% penalty is not a bad price to pay
_at all_ in return for all the benefits of the parallel cluster in other
cases.

------
saosebastiao
I've done a handful of big-memory workloads on the JVM, and I have never seen
it not choke on Allocation/GC for anything above 300gb of memory. Does this
paper address this limitation?

~~~
wmf
I think they ran one JVM per core. If you're willing to spend money, large
heaps are supposed to be solved by Azul.

~~~
marshray
It sure sounds like parallel map/reduce tasks are begging for a 'just kill the
process and re-fork a fresh worker' solution instead of garbage collection.

~~~
wmf
One of the great ideas in Erlang.

------
sauravc
Reminds of this article: <http://blog.wavii.com/2011/12/29/your-mileage-may-
vary/>

~~~
sauravc
Specifically this image:

[http://wavii.files.wordpress.com/2011/12/hadoop_too_big.jpg?...](http://wavii.files.wordpress.com/2011/12/hadoop_too_big.jpg?w=129&h=250)

------
rbanffy
Having just recovered from a 72+ hour outage (about a dozen Hyper-V based
hosts on our hosting provider) caused by a single (large) machine attached to
a single (large) storage appliance, I think I'll pass on this idea of just
scaling up instead of scaling horizontally.

edit: "pass on" (thanks, marshray)

~~~
corresation
False dichotomy. This paper is primarily focused on programming models, and
the _truth_ that single scale-up machines grossly exceed all but the rarest of
edge cases (what people call "big data" is often very little data). If you
have a virtualization cluster _of course_ you need redundancy for all
components, and that is just basic fundamentals.

------
jamesaguilar
I'm surprised they didn't select any tasks that couldn't easily fit on a
single machine (largest input set was <200GB). That said, if the scale up
performance can be improved without compromising scale out, why not?

------
Theory5
"... There IS the occasional savage beating, and more than their fair share of
suicides. But that has "statistical clustering" all over it. "

