
Ask HN: Learning Hadoop - boniface316
I have started to take some basic level data science courses on edx.org. Then I came across Hadoop and I would really love to learn this. I have the following questions and I would really appreciate if you can help me with this:<p>1. What is the best source to start learning Hadoop? I was thinking of starting with Udacity or Big data university.<p>2. Do I need Linux to run Hadoop? I am having wifi issues even after I did the driver upgrade.<p>3. In order to be employed, do I need to learn the entire system or just one portion of it like spark, hive or pig?<p>Please advise.
======
brudgers
Caveat: This is random advice from the internet.

1\. If it were me, I'd start by installing Hadoop on a laptop since Googling
indicates it's doable....for some definition of 'doable.' Even if I could not
get it to work, reading the documentation and researching whatever problems I
encountered would deepen my practical knowledge. Getting Hadoop up and running
is also a facet in a practical working definition of 'knowing Hadoop.'

2\. Linux Wireless driver BLOB's have been a source of pain for me. The work
arounds for me have been:

a. Purchase well supported hardware, e.g. used Thinkpad and cards without
obscure Broadcom chips.

b. Use an external wireless router and an ethernet cable. That's how I connect
desktops and laptops around the office.

3\. My gut is that the important knowledge for many positions requiring or
preferring Hadoop will be more related to data science than technical
expertise. On the other hand, looping back to my earlier advice, positions
that are Hadoop first rather than data-science first will benefit from an
operational understanding.

Lastly, what I've been hearing about the industry, is that 'embarrassingly
parallel workloads that can take full advantage of Hadoop are not as common as
was thought a few years ago. The big useful innovation of Hadoop is looking
like the underlying Hadoop Distributed File System (HDFS) and other big data
search/query tools are being built over it.

That's not to say Hadoop is dead or not worth exploring, particularly at the
technical level of HDFS and in terms of applying data-science concepts.
Learning Pig or Hive makes sense in service of learning how to apply data
science concepts. Because Hive is based on SQL it is probably the more
generalizable skill...and learning SQL is probably more useful than learning
either in terms of employment.

Good luck.

~~~
boniface316
Thank you very much.

I am going to use ethernet to work on linux. I am taking some courses in data
science and at the meantime I am interested in knowing the Hadoop eco-system
and play with it.

------
praneshp
I learned Hadoop in Grad school in 2013. If you can spend a little bit of
cash, get some VMs on AWS, and follow one of the many guides out there (for
example, Cloudera) to install Hadoop. Should be enough to build something
like: [http://blog.cloudera.com/blog/2012/09/analyzing-twitter-
data...](http://blog.cloudera.com/blog/2012/09/analyzing-twitter-data-with-
hadoop/).

I started out trying my VMs on virtualbox, then a couple of different laptops
at home, etc, but AWS was the easiest setup in the end.

~~~
boniface316
Thanks for the link. I am just exploring whats out there before I commit to
the program.

------
mtmail
There's also
[http://www.cloudera.com/training.html](http://www.cloudera.com/training.html)

You can run Linux in a virtual machine (VirtualBox, VMware etc) where you
wouldn't have to deal with wifi drivers because it uses the existing network
connection from the host operating system.

~~~
boniface316
Cloudera and Horton works are expensive. Any alternate options?

~~~
mtmail
Sorry, I meant Cloudera's free training videos. The link is a bit hidden on
the bottom section of the page.

[http://www.cloudera.com/training/library.html](http://www.cloudera.com/training/library.html)

------
mtmail
There's a couple of hints to books in
[https://news.ycombinator.com/item?id=12389595](https://news.ycombinator.com/item?id=12389595)

~~~
boniface316
Thanks for the link!

