I'd love to, but it would take about an hour to run through everything. Here's a...

MichaelSalib · on July 26, 2012

seiji pretty much nails it. Hadoop seems to have come out of a weird culture. It is a distributed system with a single point of failure (name node) because its designers insisted on avoiding Paxos (distributed systems are too hard so we'll just make a broken-by-design protocol instead). Another example is that a lot of the database code built on top of Hadoop is designed around one Java hashmap per row which really limits performance.

There are all sorts of oddities and you can mostly work around them but it is...exhausting, and I spend a lot of time thinking "surely there must be a better way".

jamii · on July 27, 2012

> surely there must be a better way

http://www.spark-project.org/

sqrt17 · on July 27, 2012

Wait, so Zookeeper (= distributed consensus thingie that I think implements the Paxos algorithm) is a Hadoop project but not actually used in Hadoop mapreduce?

mumrah · on July 27, 2012

That's correct. I believe they are using it in some new "high availability" stuff coming down the road

grantjgordon · on July 26, 2012

Thanks for your insightful comments! I appreciate that you took the time to back up your opinion by distilling your thoughts into something quickly digestible.

Have you heard of any other projects outside of disco that are more performant than hadoop when used for similar applications?

cgh · on July 26, 2012

I'd also just like to say: NameNode = single point of failure.

I worked on a contract for a large, very well-known social networking company a while back who refused to consider Hadoop because of this.