Hacker News new | past | comments | ask | show | jobs | submit login
Relational Database Experts Jump The MapReduce Shark (typicalprogrammer.com)
22 points by iamelgringo on Jan 18, 2008 | hide | past | web | favorite | 18 comments

This is a good refutation of the linked article. The authors of the original article definitely seem to misunderstand both the purpose and the usefulness of MapReduce.

The claim that MapReduce is a poor implementation, for example, is particularly silly. Most large scale relational databases rely on very powerful centralized machines, and replicas of those machines. MapReduce, by contrast, relies on a large number of commodity systems. The decentralized nature makes hardware failure much less likely to cause problems. The system knows how to dynamically reallocate tasks, and new nodes can also be added while the system is running.

This title makes no sense. Was Fonzie jumping over a more popular show disguised as a shark? Let's mix some metaphors!

I absolutely agree -- the phrase"jump the shark" has now jumped the shark. Seems to be egregiously mis-used and over-used especially by tech bloggers.

You have to read the original article very closely or you'll miss their point: in terms of distributed database research, MapReduce is a step backwards.

So what else is new? Things interesting to database researchers often take two decades to become mainstream.

To be interesting to a database researcher you need to think way beyond MapReduce. Stuff like distributing queries over 10^12 RFID sensor nodes each storing different data in a different way, optimizing in real time and adapting to arbitrary breakdowns in the communications network.

I think perhaps the original authors were really thinking of BigTable:


What makes you think that? I think DeWitt and Stonebraker can distinguish between BigTable and MapReduce (indeed, their blog post mentions BigTable and Hadoop as well).

The beauty of MapReduce is in its elegant simplicity of use (once properly implemented) think of Unix shell commands like grep, sort, cut - simple functional tools (as in functional languages - zero side effects), which do only one task but do it amazingly well, Google just added a new type of screwdriver to the toolbox.

Why don't those guys descend from their ivory tower and build something novel, interesting and useful?

I'm sure the solution to today's exponential data explosion is building one giant relational database inside of Mount Kilimanjaro...

RE: "ivory-tower", "something useful"


Sure the guy is academically merited. But it's not the first time he compares apples to pears.


This everything must be a RDBMS viewpoint seem rather rigid.

So...you read the Wikipedia link and the main thing you got out of it is that the guy is "academically merited"?!

The part about him founding five DBMS companies and being the CTO of one of the largest DBMS companies in the world doesn't suggest that maybe he might have some practical experience? The fact that he made some major contributions to the database field somehow leads you to believe he is now confused and talking complete nonsense? No? Yes? Okay...I guess he is just an ivory tower egghead. My bad.

I've read the Weinreb post...and I agree with Dan's points, but they're really just disagreeing about the definition of success. I've tried to use object databases in industry, and I can assure you that they ARE considered failures by most people. (I'm not saying their bad reputation is entirely deserved, however...) By the way, Stonebraker saying that object databases are a 'failure' is not the same as him saying that RDBMS should be used for everything. Apples and Pears ;-) Actually, Stonebraker is one of the most vocal proponents of using non-RDBMS technologies when appropriate. See the "One Size Fits All" paper and its follow-up.

I'm not saying I agree with everything (or even anything) that Stonebraker writes, but I think it is a mistake to discount his opinion out of hand because you erroneously think he is just some confused egghead who isn't up on the latest technologies and is clinging to the RDBMS technology he is comfortable with. That does you both a disservice. People tend to write blog posts that argue one side or the other...the 'truth' often lies somewhere between the two sides.


If my reply is unnecessarily nasty, it is partly due to the fact that I have insomnia and I'm therefore pissed off (it's 4:30am here and I have to leave on a cross-country flight in two hours.)

Have you read the original article? They are really comparing some abstract MapReduce with RDBMSs and saying: MapReduce sux its not relational. And from my perspective making an ass out of themselves. Some quotes:

"we have serious doubts about how well MapReduce applications can scale."

"All of the following features are routinely provided by modern DBMSs, and all are missing from MapReduce:

* Bulk loader -- to transform input data in files into a desired format and load it into a DBMS


* Updates -- to change the data in the data base"

If this is not criticizing MapReduce because it's not a RDBMS what is. The only thing worth reading is their summary which is quite good, until it hits this point:

"Last, before MapReduce can measure up to modern DBMSs, there is a large collection of unmet features and required tools that must be added.".

So no, I'm not that impressed with him right now, since 2 out of 3 (I've read the One Size Fits All paper which I found interesting) things I have read about him/from him to me is utter bull*.

Thats why I stated that he's academically merited, which at least is a fact.

Fair enough. If that is your evaluation of the article, then that's cool. I usually try not to completely write off people that I know are smarter than I am in a particular area (Stonebraker), just because I don't understand or agree with all of their arguments.

I think the article is much less of a fanboy article and more of an academic "thought provoking" article, though. I think their tone is a little too adversarial, but I believe they are essentially arguing that people can learn a great deal from older DBMS technologies and apply that knowledge to their applications...without always reinventing the wheel. They're arguing that people not forget the past when they look to the future, because combining ideas from both 'camps' might lead to a better solution. A "Best of Both Worlds" approach. Where would we be now if most of the Lisp discoveries and lessons had been assimilated into the programming culture sooner, for example?

I personally think the original article is more in the vein of suggesting deficiencies in using MapReduce for most applications than saying it is a complete dead-end. They acknowledge that "MapReduce may be a good idea for writing certain types of general-purpose computations." They say they are excited by its fault tolerance, etc. They just think it is being misused in many instances where other technologies are superior. I strongly agree with them on that point: there are relatively few use cases where MapReduce is ideal. And even when it is appropriate to use MapReduce, it should usually be augmented by other technology as well. They cite several deficiencies in the MapReduce approach that are completely valid, and I think your 30,000 foot overview of their criticism is misleading.

I think too many people are missing the forest for the trees with the original article (which is partly the fault of the authors, since they could have worded some of their arguments better.) People seem to think that just because Google is doing something that it is the optimal solution...but even Google uses BigTable to get around some of the deficiencies of MapReduce. And what is BigTable? A column-oriented DBMS! And who is one of the world's leading experts on and proponents of those? Why, Michael Stonebraker, the ivory-tower moron who wants to use RDBMS for everything!

As bayareaguy noted: "You have to read the original article very closely or you'll miss their point: in terms of distributed database research, MapReduce is a step backwards."

You said: As bayareaguy noted: "You have to read the original article very closely or you'll miss their point: in terms of distributed database research, MapReduce is a step backwards."

This to me is really funny since I don't think you can view MapReduce as distributed database research. I haven't used it, and while I admit that I might have gotten it all wrong, to me its a library/dsl/paradigm/technique for parallelization of certain data-processing tasks. So to me that statement makes as much sense as "lisp, viewed as a monitor , is a step backward compared to lcds".

Others, that I consider smart, have noticed http://bitworking.org/news/288/Stonebraker-on-MapReduce

Also, while it was a while since i viewed the techtalk on bigtable, and I really agree BigTable is a column-based DBMS, I can't seem to remember that they built it to get around deficiencies in MapReduce, I understood it as a compliment. BUT it might have been much more constructive if, as someone said higher up, they wrote the original article about BigTable, because I do believe that their criticism is actually valid (as in apples vs apples) in some cases regarding BigTable.

There is some good information on MapReduce and BigTable here:

- http://research.google.com/people/jeff/index.html

Clearly these guys need to stick to commentary on RDBMSs, which MapReduce most definitely is not.

MapReduce enables Google to do it's thing, so I wouldn't call it a "major step backwards".

The fact that Google use MapReduce has no bearing on whether it is a "major step backwards" to the database community. The original authors essentially argue that MapReduce ignores 40 years of database technology, and that it is the worse for it; whether MapReduce is essential at Google isn't the point.

You're absolutely correct here. Wouldn't most users consider the fact that it ignores database research and history a feature of MapReduce?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact