Hacker News new | comments | show | ask | jobs | submit login

I agree with your findings on Hadoop. It is a very complicated ecosystem. The core is tied to JVM and abstractions of map-reduce are heavy and difficult to use. There are multiple domain specific languages/api to do the same thing : pig, cascading. Yet if you pick one you cant use other. Hadoop feels like tower of babel.

But I am not in the favour of creating another stack as Hadoop replacement. Spark fills the role of distributed computation framework very well. It uses scala a functional language which provides neat abstractions for map,reduce,filter and other operations. It introduces RDD to abstract data. All the components graphx, sparksql, blinkdb use scala.

Also building a new map-reduce engine is a difficult task. The computation engine will lie in the middle of stack not on top as shown. Libraries will sit on top of that, so if you build an engine you will have to build everything on top of that.

See the Spark ecosystem : https://amplab.cs.berkeley.edu/software/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: