
No one ever got fired for using Hadoop on a cluster (2012) [pdf] - CHY872
http://research.microsoft.com/pubs/163083/hotcbp12%20final.pdf
======
wmf
More recent related project where a laptop outperforms clusters:
[https://www.usenix.org/conference/hotos15/workshop-
program/p...](https://www.usenix.org/conference/hotos15/workshop-
program/presentation/mcsherry)

~~~
caminante
That's awesome.

On a meta-note, I felt like the abstract wasn't very clear and I had to skim
the rest of the doc to figure out what was going on.

~~~
Smerity
His blog posts are easier to follow and aren't forced to be in the academic
format :)

[http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...](http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)

[http://www.frankmcsherry.org/graph/scalability/cost/2015/02/...](http://www.frankmcsherry.org/graph/scalability/cost/2015/02/04/COST2.html)

~~~
caminante
Thanks!

No kidding. These arrangements are much more entertaining and informative.

------
istvan__
Actually I know several projects bursted into flames because of Hadoop. There
is a notion among some developers of Hadoop solving all of the problems and
this is the only solution they consider (literary). Needless to say that real
time systems with tight SLA cannot really use Hadoop, and if you are trying to
sell it for this use case, you might get fired. :)

~~~
sixdimensional
I concur and I have seen the same, especially for folks with complex projects,
tight deadlines and budgets, and older distros of Hadoop that were not as
stable and/or production ready.

In a rush to get to the next big thing, many have taken a lot of risk with
Hadoop. Some with the resources to really support/make Hadoop their own have
had a lot of success. But I think there are fewer who have success stories.

~~~
istvan__
On the other hand, Hadoop is great for analytics. If it is down there is no
customer facing outage. Sometimes the failover works and only few things lost,
you are back in ~15 minutes if you want to check the fs image consistency. For
internal teams this is acceptable SLA. If you don't care about consistency
than you can probably do this faster.

I just don't see a customer facing SLA sensitive workload fitting this well.
At least I would not sleep well being oncall... :)

~~~
sixdimensional
I don't know that Hadoop is _always_ great for analytics in every case.
Sometimes the overhead of spinning up mapreduces (for example) outweighs the
cost of running analytics using other methods. Especially when you have
demanding requirements for subsecond or milli-second response time to provide
such analytics - I don't think Hadoop is there yet, although it's getting
there.

Generally, I agree though, in an extremely demanding SLA environment... I
probably wouldn't sleep well in that case either!

I think that was kind of the point of the paper as presented - Hadoop is seen
as a panacea, when in reality there might be other, simpler approaches that
work just as well or better. It really does depend on the use case, the
volume/types of data, cost/requirements, etc.

For that matter, what the Hadoop ecosystem "is" (vs. just the Apache Hadoop
project itself) means so many things now. HDFS (storage), YARN (distributed
job/resource management), mapreduces, Hive, HBase, etc. vs. new engines, like
Apache Spark, for example, which can run inside or outside of Hadoop. Adding
to that the different distros and fragmenting Hadoop ecosystem, constantly
changing versions, etc. - I don't know about you but it can be a nightmare
even for analytics (in some cases).

When properly supported by knowledgeable staff with a deep grasp for what it
can do as a platform, Hadoop can certainly be and do a lot of things for a lot
of use cases.

~~~
istvan__
I agree, I would not recommend Hadoop for a startup to install and maintain
it. There are several datawarehouse as a service offerings out there. More
recently there was a rise of good and cheap cloud based systems and Redshift
got a lot better as well. Much easier to integrate, probably lover TCO too.

------
reality_czech
This is just a bunch of FUD (fear uncertainly and doubt) that Microsoft spread
back in 2012 when Hadoop didn't have Windows support. Get in our time machine
and go to 2015, and Hadoop has Windows support. Suddenly Microsoft doesn't
seem to mind it. Gee, I wonder why?

Hacker News is ridiculously susceptible to propaganda (MongoDB is WEBSCALE! JS
framework of the month is the bee's knees!). Please think for yourself and
don't read transparent propaganda that is years old. 3 years is an eternity in
the big data world.

~~~
pav7en
Seems like a valid point of view. Not sure why it got downvoted.

Though there are very big kinks to work out in getting Hadoop production
ready.

So the thesis of the parent article is valid too and not just propaganda.

------
shmerl
There are other systems which don't limit you to mad reduce patterns. For
example HPCC.

~~~
threeseed
Modern day Hadoop doesn't limit you either.

YARN containers are pretty generic and I have used them to run all sorts of
things: Kafka, ElasticSearch etc.

