

Sorting Petabytes with MapReduce - abraham
http://googleresearch.blogspot.com/2011/09/sorting-petabytes-with-mapreduce-next.html

======
abtinf
Google keeps bragging about all their internal capabilities because they want
to hire people. But so what? If I had a real reason to, I could fire up a
couple thousand machines on amazon and analyze data akimbo.

In a sense, google is worse than microsoft - they really don't share any of
their hardcore cs innovations. At least MS is in the business of selling
technology. Google is in the business of hoarding it in order to derive
competitive advantage in advertising.

I just wrote up an entry about this at
<http://news.ycombinator.com/item?id=2972368>

~~~
someone13
I disagree with parts of this - Google does release some rather high-quality
source code (LevelDB, Protocol Buffers, Chrome/V8), and also provides some
well-done research papers.

The entire concept of MapReduce wouldn't exist without Google. And that's
inspired a whole host of products. In the same vein, I don't find it unusual
that they're NOT sharing the code for MapReduce et al., in that it's
proprietary technology. I don't expect Amazon to share the source code behind
AWS, and I don't expect Google to share the source code for MapReduce.

~~~
abtinf
Your points are totally valid.

But my point of view is, which company is more relevant to me as a developer
and as someone trying to build a startup? And it seems to me that I get a lot
more love from Amazon, whether its some nifty new DNS feature or dropping
incoming bandwidth charges to 0.

~~~
DasIch
That doesn't change the fact that unlike Amazon, Google is not in the business
of selling tools to developers; with that in mind it is impressive what they
do for developers despite that.

------
moultano
One of my favorite parts of my job is the ability to grab thousands of
machines at low priority for the hell of it.

~~~
packetslave
...and without asking permission. "Gee, I wonder how much faster this would be
on 5000 machines instead of 2000?" clickicky-click ... "yep, quite a bit
faster."

~~~
jsnell
Not permission, but occasionally forgiveness. Task isolation isn't perfect,
and backends don't scale infinitely.

------
brandonb
Their benchmarks seem a little odd. They claim an 11x speed improvement,
compared to using half as many computers three years ago. But you'd expect
roughly an order of magnitude improvement anyway, just due to Moore's law.

I'd be really curious to see what part of the speedup is due to software
optimizations, i.e., compare the 2011 software with the 2008 software on
identical hardware.

~~~
mda
Moore's law does not suggest order of magnitude performance increase in 3
years.

~~~
brandonb
Moore's law suggests 4x performance increase over 3 years, and they doubled
the number of machines, so you end up with roughly an 8x performance
improvement from "just" hardware.

~~~
mda
No, Moore's law suggests 4x increase in transistor count over 3 years.

