

MapReduce II (followup to "MapReduce: A major step backwards") - neilc
http://www.databasecolumn.com/2008/01/mapreduce-continued.html

======
jimbokun
This is the article they should have written in the first place. Much more
detail, much better qualified arguments. And the point about each generation
of computer scientists reinventing the wheel. Lisp being 50 years old and all
that.

------
apathy
> efforts such as PigLatin and Sawzall appear to be promising steps in this
> direction.

Sawzall is a parallel logfile analyzer. It takes logfiles stored into GFS and
MapReduces them for reliable billing (which used to be a nightmarish,
fundamental revenue problem and is now a nonissue -- all you Nooglers live in
a relative utopia). A unixy tool with a unixy mindset (gee, maybe that's
because Rob Pike wrote it). It's a special-purpose tool that is incredibly
good at its job, not a general-purpose filigreed hammer that is supposed to
nail everything in sight. And the fact that the authors could possibly
overlook this speaks volumes about their myopia.

Stonebraker naturally refused to answer the most obvious criticism of all --
_HEY ASSHOLE, WHAT ABOUT BIGTABLE?_

But then, if he addressed that, he'd no longer have an essay. And if Google
took his advice they wouldn't have the revenues to have made it as a company.
But that's not really something that academics think about, is it?

------
neilk
This is worse than the first article.

They seem to be conflating the use of a high-level, SQL-like language with the
architecture of the system. Of course you could layer a SQL-like language on
top of a MapReduce-based storage and processing array, and for some queries
that would be very user-friendly. If that is their whole point it is true but
trivial.

I think BigTable does have a limited join capability now.

The real difference between a MapReduce-oriented system and typical SQL
storage options is something like this. MR gives you assured scalability at
the cost of limiting what kinds of queries you can do, and as the authors
correctly point out, some queries get more and more onerous to create.
Usually, SQL storage engines place ease of querying above all else, but have
to go through painful and expensive procedures to scale well.

------
jimm
Upmodded, but I chuckled when I read their first item where they take an
obviously relational data problem then observe that you should use a
relational database instead of MapReduce for it. From that, they claim RDBM is
better.

~~~
ntoshev
I believe BigTable implements the kind of indexing they like.

Re the point of scalability; while we don't really know how well MapReduce
scales, usage at Google provides good hints that it does so well. Further, I
don't think relational databases scale linearly either, and it is notoriously
difficult to implement large DB clusters.

