

Ask HN: Any practical advice/tips for a new Machine Learning guy? - justanewmlguy

I just moved from a software engineer position to a machine learning team. I know most of ML theory from my comp sci classes, I&#x27;ve re-read the PRML book by Bishop and I&#x27;m working through Hastie as we speak. What I&#x27;m missing is practical knowledge. I&#x27;ve been using Python and Scikit-learn recently on the tutorial problems from Kaggle, and I know I should get familiar with a Mapreduce implementation like Hadoop. What other skills&#x2F;tools should I be familiar with?
======
w_t_payne
Get Webb's book: [http://www.amazon.com/Statistical-Pattern-Recognition-
Andrew...](http://www.amazon.com/Statistical-Pattern-Recognition-Andrew-
Webb/dp/0470682280) Play with tools like GGobi.

Be aware that most stuff in industry is waaaaay simpler than what comes out of
academia. A simple linear classifier, if it works, beats a support vector
machine that takes 10 times longer to implement. Most of your day will be
spent wrangling data, not dealing with fancy math.

~~~
justanewmlguy
Thanks. You hit the proverbial nail on the head. Data wrangling is precisely
where my degree doesn't help. Does Webb's book have any pointers for data
manipulation or are there other resources?

~~~
w_t_payne
You don't need a book -- just follow the KISS principle and you'll be all
right.

IMHO a system that is able to keep its data in plain-text json or CSS files
stands a better chance of success than one that stuffs it into proprietary
databases and weird binary formats.

Mind you, it always depends on the nature of the data and the skillset of the
team. Like most things, the easiest and simplest path is not a universal
absolute, but heavily contingent upon circumstance.

Perhaps if you could provide more details of your particular problem, I could
provide more detailed help....

