Yahoo launching HortonWorks, Hadoop spinoff company

teyc · on June 28, 2011

Hadoop is targeted at BigData.

For instance, if you are a financial institution that generates a lot of transactions, how do you data mine the transactions to find out what type of customers could purchase more services from you?

Another example is Facebook. How does it generates activity streams of your friends and your friends' friends? SQL probably isn't the best choice.

RDBMS's whose forte is in transaction processing, isn't as fast when it comes to answering questions like this. Hadoop and its competitors in this space are hoping to generate revenue from selling software and services for this.

quizbiz · on June 27, 2011

Can someone please explain to me what Hadoop is and what the software does? I did some googling and read their page but I wasn't able to follow.

amock · on June 27, 2011

Did you read the Wikipedia page http://en.wikipedia.org/wiki/Hadoop ? It's a framework for processing large datasets in a distributed environment using the MapReduce algorithm from Google's paper http://labs.google.com/papers/mapreduce.html .

earl · on June 27, 2011

Hadoop implements the map reduce api. It is a full software stack that typically is taken to mean:

1 - hadoop proper: java code that implements the MR API;

1a - all the software to allow for job trackers, job retrying, job distribution, reporting, etc; across a cluster

1b - cascading / competitors that help you compose individual MR steps;

1c - task tracking and scheduling software such as the SNA projects from linkedin;

2 - a distributed file system called hdfs;

3 - binary file format code such as avro;

4 - various software that provide a sql like reporting api, such as hive, sawzall, pig, etc;

edit: and you might think the MR api is trivial (which is in some sense true), getting it somewhat right is a lot of work. Building software that will run on your 1 node dev box for development and run on a 6k node cluster is not a simple task. Neither is properly dealing with map/reduce task failure and retry while correctly removing the data that a partially complete task wrote.

endisnigh · on June 28, 2011