
Relational Database Experts Jump The MapReduce Shark - iamelgringo
http://typicalprogrammer.com/programming/mapreduce/
======
boucher
This is a good refutation of the linked article. The authors of the original
article definitely seem to misunderstand both the purpose and the usefulness
of MapReduce.

The claim that MapReduce is a poor implementation, for example, is
particularly silly. Most large scale relational databases rely on very
powerful centralized machines, and replicas of those machines. MapReduce, by
contrast, relies on a large number of commodity systems. The decentralized
nature makes hardware failure much less likely to cause problems. The system
knows how to dynamically reallocate tasks, and new nodes can also be added
while the system is running.

------
g00dn3ss
This title makes no sense. Was Fonzie jumping over a more popular show
disguised as a shark? Let's mix some metaphors!

~~~
schorndorfer
I absolutely agree -- the phrase"jump the shark" has now jumped the shark.
Seems to be egregiously mis-used and over-used especially by tech bloggers.

------
bayareaguy
You have to read the original article very closely or you'll miss their point:
_in terms of distributed database research, MapReduce is a step backwards_.

So what else is new? Things interesting to database researchers often take two
decades to become mainstream.

To be interesting to a database researcher you need to think way beyond
MapReduce. Stuff like distributing queries over 10^12 RFID sensor nodes each
storing different data in a different way, optimizing in real time and
adapting to arbitrary breakdowns in the communications network.

------
joshwa
I think perhaps the original authors were really thinking of BigTable:

<http://en.wikipedia.org/wiki/Bigtable>

~~~
neilc
What makes you think that? I think DeWitt and Stonebraker can distinguish
between BigTable and MapReduce (indeed, their blog post mentions BigTable and
Hadoop as well).

------
marcus
The beauty of MapReduce is in its elegant simplicity of use (once properly
implemented) think of Unix shell commands like grep, sort, cut - simple
functional tools (as in functional languages - zero side effects), which do
only one task but do it amazingly well, Google just added a new type of
screwdriver to the toolbox.

------
cridal
Why don't those guys descend from their ivory tower and build something novel,
interesting and useful?

I'm sure the solution to today's exponential data explosion is building one
giant relational database inside of Mount Kilimanjaro...

~~~
Shooter
RE: "ivory-tower", "something useful"

<http://en.wikipedia.org/wiki/Michael_Stonebraker>

~~~
jpbf
Sure the guy is academically merited. But it's not the first time he compares
apples to pears.

[http://dlweinreb.wordpress.com/2007/12/31/object-oriented-
da...](http://dlweinreb.wordpress.com/2007/12/31/object-oriented-database-
management-systems-succeeded/)

This everything must be a RDBMS viewpoint seem rather rigid.

~~~
Shooter
So...you read the Wikipedia link and the main thing you got out of it is that
the guy is "academically merited"?!

The part about him founding five DBMS companies and being the CTO of one of
the largest DBMS companies in the world doesn't suggest that maybe he might
have some practical experience? The fact that he made some major contributions
to the database field somehow leads you to believe he is now confused and
talking complete nonsense? No? Yes? Okay...I guess he is just an ivory tower
egghead. My bad.

I've read the Weinreb post...and I agree with Dan's points, but they're really
just disagreeing about the definition of success. I've tried to use object
databases in industry, and I can assure you that they ARE considered failures
by most people. (I'm not saying their bad reputation is entirely deserved,
however...) By the way, Stonebraker saying that object databases are a
'failure' is not the same as him saying that RDBMS should be used for
everything. Apples and Pears ;-) Actually, Stonebraker is one of the most
vocal proponents of using non-RDBMS technologies when appropriate. See the
"One Size Fits All" paper and its follow-up.

I'm not saying I agree with everything (or even anything) that Stonebraker
writes, but I think it is a mistake to discount his opinion out of hand
because you erroneously think he is just some confused egghead who isn't up on
the latest technologies and is clinging to the RDBMS technology he is
comfortable with. That does you both a disservice. People tend to write blog
posts that argue one side or the other...the 'truth' often lies somewhere
between the two sides.

P.S.

If my reply is unnecessarily nasty, it is partly due to the fact that I have
insomnia and I'm therefore pissed off (it's 4:30am here and I have to leave on
a cross-country flight in two hours.)

~~~
jpbf
Have you read the original article? They are really comparing some abstract
MapReduce with RDBMSs and saying: MapReduce sux its not relational. And from
my perspective making an ass out of themselves. Some quotes:

"we have serious doubts about how well MapReduce applications can scale."

"All of the following features are routinely provided by modern DBMSs, and all
are missing from MapReduce:

* Bulk loader -- to transform input data in files into a desired format and load it into a DBMS

/../

* Updates -- to change the data in the data base"

If this is not criticizing MapReduce because it's not a RDBMS what is. The
only thing worth reading is their summary which is quite good, until it hits
this point:

"Last, before MapReduce can measure up to modern DBMSs, there is a large
collection of unmet features and required tools that must be added.".

So no, I'm not that impressed with him right now, since 2 out of 3 (I've read
the One Size Fits All paper which I found interesting) things I have read
about him/from him to me is utter bull*.

Thats why I stated that he's academically merited, which at least is a fact.

~~~
Shooter
Fair enough. If that is your evaluation of the article, then that's cool. I
usually try not to completely write off people that I know are smarter than I
am in a particular area (Stonebraker), just because I don't understand or
agree with all of their arguments.

I think the article is much less of a fanboy article and more of an academic
"thought provoking" article, though. I think their tone is a little too
adversarial, but I believe they are essentially arguing that people can learn
a great deal from older DBMS technologies and apply that knowledge to their
applications...without always reinventing the wheel. They're arguing that
people not forget the past when they look to the future, because combining
ideas from both 'camps' might lead to a better solution. A "Best of Both
Worlds" approach. Where would we be now if most of the Lisp discoveries and
lessons had been assimilated into the programming culture sooner, for example?

I personally think the original article is more in the vein of suggesting
deficiencies in using MapReduce for most applications than saying it is a
complete dead-end. They acknowledge that "MapReduce may be a good idea for
writing certain types of general-purpose computations." They say they are
excited by its fault tolerance, etc. They just think it is being misused in
many instances where other technologies are superior. I strongly agree with
them on that point: there are relatively few use cases where MapReduce is
ideal. And even when it is appropriate to use MapReduce, it should usually be
augmented by other technology as well. They cite several deficiencies in the
MapReduce approach that are completely valid, and I think your 30,000 foot
overview of their criticism is misleading.

I think too many people are missing the forest for the trees with the original
article (which is partly the fault of the authors, since they could have
worded some of their arguments better.) People seem to think that just because
Google is doing something that it is the optimal solution...but even Google
uses BigTable to get around some of the deficiencies of MapReduce. And what is
BigTable? A column-oriented DBMS! And who is one of the world's leading
experts on and proponents of those? Why, Michael Stonebraker, the ivory-tower
moron who wants to use RDBMS for everything!

As bayareaguy noted: "You have to read the original article very closely or
you'll miss their point: in terms of distributed database research, MapReduce
is a step backwards."

~~~
jpbf
You said: As bayareaguy noted: "You have to read the original article very
closely or you'll miss their point: in terms of distributed database research,
MapReduce is a step backwards."

This to me is really funny since I don't think you can view MapReduce as
distributed database research. I haven't used it, and while I admit that I
might have gotten it all wrong, to me its a library/dsl/paradigm/technique for
parallelization of certain data-processing tasks. So to me that statement
makes as much sense as "lisp, viewed as a monitor , is a step backward
compared to lcds".

Others, that I consider smart, have noticed
<http://bitworking.org/news/288/Stonebraker-on-MapReduce>

Also, while it was a while since i viewed the techtalk on bigtable, and I
really agree BigTable is a column-based DBMS, I can't seem to remember that
they built it to get around deficiencies in MapReduce, I understood it as a
compliment. BUT it might have been much more constructive if, as someone said
higher up, they wrote the original article about BigTable, because I do
believe that their criticism is actually valid (as in apples vs apples) in
some cases regarding BigTable.

~~~
bayareaguy
There is some good information on MapReduce and BigTable here:

\- <http://research.google.com/people/jeff/index.html>

------
tlrobinson
Clearly these guys need to stick to commentary on RDBMSs, which MapReduce most
definitely is _not_.

MapReduce enables Google to do it's thing, so I wouldn't call it a "major step
backwards".

~~~
neilc
The fact that Google use MapReduce has no bearing on whether it is a "major
step backwards" to the database community. The original authors essentially
argue that MapReduce ignores 40 years of database technology, and that it is
the worse for it; whether MapReduce is essential at Google isn't the point.

~~~
Kaizyn
You're absolutely correct here. Wouldn't most users consider the fact that it
ignores database research and history a feature of MapReduce?

