
MapReduce: A major step backwards - iamelgringo
http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html
======
apathy
What an incredible piece of shit.

 _SCREWDRIVERS POORLY SUITED FOR DRIVING NAILS! FILM AT 11!_

Right tool for the job. The right tool for accessing half a petabyte of data
on unreliable Xeon-based servers in bakery racks is _NOT_ a distributed
database. MapReduce... is.

Oh, and one last thing -- he should be comparing BigTable with a standard
RDBMS if he wanted to have so much as a single shred of credibility. Google
uses traditional RDBMSes internally -- but not for the heavy lifting of
indexing and caching. Because _an RDBMS is a shitty tool for that job_. That's
one reason why the founders are rich as all get out -- they didn't worry about
which tourniquet to use while the patient bled out. Stupid fucking religious
wars.

~~~
yrashk
I second that "piece of shit" -- RDBMS are bad tools for some jobs, and there
is still a lot of space for specialized kinds of database; especially I
believe in a need for a lightweight (and ^^ slow ^^ :) decentralized document-
oriented database. And yes, again, this kind of database will not be suitable
for every need, but for some domains only.

------
bayareaguy
This is classic Stonebraker. It's a favorite strategy of his to say something
so offensive to the audience that they will be itching for a rebuttal. Then on
closer inspection you discover that it's much ado about nothing.

Here is one example: he says Teradata used MapReduce techniques 20 years ago
and he's right (Teradata's ability to handle arbitrarily large data sets with
hash partitioning helped WalMart get to where it is today). A Teradata system
makes additional assumptions regarding data placement that MapReduce doesn't.
In a Teradata system the data is actually stored at the processing nodes. If
you take the idea of MapReduce and "extend" it with an optimizer that knows
where the data is and how it is organized, you get pretty close to the
Teradata architecture.

Viola: an RDBMS based on MapReduce.

Now look closely at what he's saying: in terms of _distributed database
research, MapReduce is a step backwards_.

Well, he may be right on there. MapReduce is not likely to help you win any
database research grants - it was old news to that community 20 years go.

------
weel
To be entirely honest with y'all, I haven't carefully read all of this
article. But I already have my opinion ready!

The fact that MapReduce is a step back, in a sense, is true. But it's a step
back from a dead-end alleyway, a step away from a particular very complex and
bloated programming model--to wit the SQL-based RDMS--that is good as far as
it goes, but requires so much effort to implement that it is unlikely to be a
great model to use as the basis for innovative implementations.

If you're going to do crazy shit of the sort that google does, massively
distributed computations on very high volumes of data, then in order to do it
well you have to keep it simple. Such is life. There are plenty of RDMSes that
can do the full whammy of RDMS features, but try to take, say, one that does
not allow for replication and then implement replication. See you next decade.

Sometimes taking a step back can help put things into perspective, is all I'm
saying.

------
anonym
_[Note: Although the system attributes this post to a single author, it was
written by David J. DeWitt and Michael Stonebraker]_

Oh, the irony.

