

What it really means when someone says ‘Hadoop’ - kathironline
http://www.linkedin.com/news?actionBar=&articleID=5572381123670966317&ids=dP4PdzoVc3sScP8NcjwPczsRdiMOd3kTdj4PczsVcPcUe38OdPkRb3sSejoNc3kTdPAPcjAUc3wSdjkIe3gOdzkVe3wPc30MejwOczsRdiMMejwMd3sOcPgRdP4SejwMdPkR&aag=true&freq=weekly&trk=eml-tod2-b-ttl-4&ut=1LV2o36nfCWR41

======
tysont
I still don't understand the value add of MapReduce (in it's various
implementations, including Hadoop) versus Clustered SQL. I see articles like
this one that seem to imply that there is a niche for MapReduce for tasks like
simple string search on super large data sets:
[http://gigaom.com/2009/04/14/mapreduce-vs-sql-its-not-one-
or...](http://gigaom.com/2009/04/14/mapreduce-vs-sql-its-not-one-or-the-
other/). It just seems odd that people are so quick to throw away 40+ years of
research on how to structure relational data and how to optimize queries
against large data sets.

~~~
ebiester
Hadoop is free for 400 (or n) computers. Let's say you need 100 computers
(full time) to process the equivalent clustered query in Oracle, given the
time needed to get the formatted data. How much do the licenses for that cost?
(And if it's a free system, you're still going to have to do an ETL job, so
why not ETL it to a free solution?)

You need to process a couple terabytes of data, but you only need to do it
once a month. However, even the highly tuned SQL will bring the system to its
knees. So, 90% of the time really expensive servers are idling because of
these peak processing needs. Once a month. (And Oracle isn't going to be
available on a cloud service.)

So, yeah, if someone had a free/cheap SQL solution for these kinds of ETL
datasets that was able to handle these embarrassingly parallel problems,
people might be interested. Perhaps you are aware of solutions I am not.

------
Scaevolus
Can a mod change the url to the unframed version?

