Hacker News new | past | comments | ask | show | jobs | submit login
Yahoo launching HortonWorks, Hadoop spinoff company (gigaom.com)
46 points by neilc on June 27, 2011 | hide | past | favorite | 5 comments



Hadoop is targeted at BigData.

For instance, if you are a financial institution that generates a lot of transactions, how do you data mine the transactions to find out what type of customers could purchase more services from you?

Another example is Facebook. How does it generates activity streams of your friends and your friends' friends? SQL probably isn't the best choice.

RDBMS's whose forte is in transaction processing, isn't as fast when it comes to answering questions like this. Hadoop and its competitors in this space are hoping to generate revenue from selling software and services for this.


Can someone please explain to me what Hadoop is and what the software does? I did some googling and read their page but I wasn't able to follow.


Did you read the Wikipedia page http://en.wikipedia.org/wiki/Hadoop ? It's a framework for processing large datasets in a distributed environment using the MapReduce algorithm from Google's paper http://labs.google.com/papers/mapreduce.html .


Hadoop implements the map reduce api. It is a full software stack that typically is taken to mean:

1 - hadoop proper: java code that implements the MR API;

1a - all the software to allow for job trackers, job retrying, job distribution, reporting, etc; across a cluster

1b - cascading / competitors that help you compose individual MR steps;

1c - task tracking and scheduling software such as the SNA projects from linkedin;

2 - a distributed file system called hdfs;

3 - binary file format code such as avro;

4 - various software that provide a sql like reporting api, such as hive, sawzall, pig, etc;

edit: and you might think the MR api is trivial (which is in some sense true), getting it somewhat right is a lot of work. Building software that will run on your 1 node dev box for development and run on a 6k node cluster is not a simple task. Neither is properly dealing with map/reduce task failure and retry while correctly removing the data that a partially complete task wrote.


Who?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: