

Linux lessons for Hadoop doubters - joxie
http://www.theregister.co.uk/2012/08/01/hadoop_will_only_get_bigger/

======
mattmanser
I'm not entirely sure the point the author is trying to make. As an honest
question as someone who's not even played with it, why would hadoop ever get
past 1% or probably even 0.01%? Doesn't it have a very specific use case?

Is he only talking about penetration in the BigData market? If not, why would
it ever supplant RDBMS or NoSQL datastores?

~~~
lmm
Hadoop is the first NoSQL datastore to reach a level of maturity and big-name
support that many enterprises insist on. In these companies the only thing for
it to supplant is traditional relational databases, and the advantages there
should be obvious.

~~~
mattmanser
I don't see any obvious advantages of NoSQL in most enterprise apps. Although
it's such a fuzzy term these days.

------
dkhenry
I'll go on the record ansd Say Hadoop won't ever make it past the 1% , but
rather we will take the lessons learned from Hadoop and create a better and
more usefull system. You can see this already with Pig and Hive. People are
learning from the shortcommings of hadoop and building on top of it to try and
make it usefull. Once the right mix of frustration and technical skill gets
together they will evolve the system to make it usefull. That is when you will
see it get enterprise traction.

~~~
monstrado
I work regularly with Hadoop users and I there is rarely talks of
"shortcomings" of Hadoop. More often than not, rather than Hadoop not giving
enough to the user, the user is not giving enough to Hadoop. Once users start
understanding the power of the platform, they start to do some really
incredible things.

Hive and Pig are just applications that interface with Hadoop, although they
could be classified as "SQL" or "PigLatin" to Java Map/Reduce, they are also
more than that. For example, Hive has a shared metastore so you can treat
Hadoop kind of like a shared RDBMS that allows you to map schemas after the
data has already been transformed.

Most the people I talk to, especially on here, are still pretty wet behind the
ears when it comes to Hadoop. It's a very daunting technology, but I can
assure you there is nothing but progress and innovation in the community.

~~~
dkhenry
I agree that this is a new and daunting technology, but its also technically
grounded at this point. I think the problems people have with it ( extensive
knowledge needed to use it ) will be fixed iteravly, but will need a major
change to the platform.

------
jandrewrogers
Linux and Hadoop are not analogous. I've used Linux since the pre-1.0 days and
been building distributed computing and database platforms since before Hadoop
existed; I've seen both of these ecosystems grow from the inside.

Linux won the server wars because it rapidly iterated and evolved in a time
when a free UNIX-like operating system was desperately needed. In the early
days of the Internet, no operating system was ready to be a heavy duty
Internet server OS. FreeBSD, started from a stronger position than Linux but
development was carefully controlled by academic purists. Eric Raymond's "The
Cathedral and The Bazaar" was about these two communities.

Operating systems, like Linux, tend toward natural monopolies. It is
extraordinarily expensive to build a competitor. Linux was the most agile and
rapidly evolving competitor in an early market that was poorly served by
incumbents. While it took the rest of the world years to notice, it was
obvious by the mid 1990s that Linux was on a trajectory to take over that
market, not because it was good (back then, it wasn't) but because it rapidly
evolved in response to user needs.

Hadoop is more like MySQL than Linux. It provided a very early option in a
market devoid of options. Technically the implementation is poor and the
details of these choices are exposed to users as features and interface. This
makes it difficult to evolve the core because it pisses off the early users
who have become used to the way it works. Linux had the advantage that it
copied a good design (UNIX), which allowed it to iterate the core without
breaking too much for users. Over the long term, these poor technical choices
early on limit the ability to grow because fixing them alienates the users.
MySQL had a strong run of dominance, particularly in the early days, but the
inability to easily evolve into a robust and full-featured system allowed
PostgreSQL to eventually relegate it to relatively niche use cases that can be
easily replaced by other bits of technology (like NoSQL data stores).

Hadoop has one other quality that makes it fundamentally different than Linux.
It is a relatively simple system that can easily be functionally copied.
Making a Hadoop-like clone using alternative tool sets is not a big task.
While very popular, it would not require an extraordinary investment to
produce an alternative that is obviously superior.

Some form of Hadoop will be with us for a long time. It will always have a use
case. But unlike Linux (and more like MySQL), it will ultimately be
marginalized by its many manifest technical weaknesses. Not only is it easy to
build a significantly better "Hadoop" than the free Hadoop -- many companies
already have -- but there are some very important distributed computing
technologies and features which Hadoop does not have and which would be
difficult to implement on that platform even if customers want them. The
larger NoSQL and distributed computing space is evolving very rapidly with
many potential alternatives.

The only lessons for Hadoop from the history of Linux and open source
generally is that it is in a ripe position to be marginalized over the long
term. I've seen this hype cycle many times for products in similar positions.
I remember, for example, when Perl was _the_ programming language of the Web.
Perl quickly dominated a void in the market at the time but where is it today?

