
Mistakes Programmers Make when Starting in Machine Learning - robdoherty2
http://machinelearningmastery.com/mistakes-programmers-make-when-starting-in-machine-learning/
======
burntsushi
I have a love-hate relationship with the advice "Don’t reinvent solutions to
common problems." In one sense, it's obviously a good idea because it's
(generally) bad to repeat yourself. When you repeat yourself, the places where
you can make a mistake increase, you violate single point of truth, yadda,
yadda, yadda.

But in the same sense, _for me_ , reinventing things has included some of the
most enriching things I've ever done as a programmer. For example, the first
big project I ever tackled (more than 10 years ago) was an open source message
board heavily inspired by vBulletin. I really didn't solve any problem that
hadn't been solved before, and the last thing the world needed was another
message board. But holy hell, I really learned a lot! And it was maddeningly
fun. I would _never_ want to deprive someone of that experience just because I
think that DRY is a Good Thing.

On the flip side, just a short while ago, I tried writing a package that
handles a cluster of remote peers[1]. Want to know what I learned? That I knew
a lot less about networking than I thought I did, and I'd have to read a lot
more literature before I could get to where I wanted to go.

And yes, this can't always be the case, because you'd never get anything done
otherwise.

(It occurs to me that maybe I'm talking about personal enrichment while the OP
is talking about solving ML problems. Toe-may-toe, toe-mah-toe.)

[1] -
[https://github.com/BurntSushi/cluster](https://github.com/BurntSushi/cluster)

~~~
sp332
Alan Kay said, "To a first approximation, you should never write your own
software. To a second approximation, you should always write your own
software."

~~~
skybrian
Cite? Not finding it.

I did find: "People who are really serious about software should make their
own hardware."

------
RivieraKid
I'm surprised this is on HN, probably the lack of downvote buttons. It's just
banal and generic advice, there's zero useful information for any mildly
experienced programmer there.

1\. Don't put machine learning on a pedestal. – Do programmers really do this
mistake? And what exactly does that mean?

2\. Don't write machine learning code. – A classic programming principle, not
specific to machine learning at all.

3\. Don't do things manually. – Oh really? Thanks, I didn't know that.

4\. Don't reinvent solutions to common problems. – Obviously, the same
principle as 2.

5\. Don't ignore the math. – Ok, good point, but if you want to get serious
with ML, it's difficult to avoid maths.

~~~
gone35
Agree. I found this post by John Langford more informative, on a similar vein:

[http://hunch.net/?p=2562](http://hunch.net/?p=2562)

------
joe_the_user
Interesting,

Having done a small amount of machine learning, I can see how the advice here
is "true". And by "true", I mean appropriate for the way that machine learning
exists and operates in present-day space. Algorithms are difficult,
temperamental and requires expert "tuning".

The sequence seems to be:

\- First you learn the formal theory, the math and statistics.

\- Then you learn the "squinting", the ad-hoc rules for how to apply which
algorithm.

\- Then implement the thing

This works better than just starting your editor and piecing code. However, I
would claim that this doesn't actually work _well_ in the sense that this is
kind of where AI/ML have bogged down. I mean, there are only 5 main
approaches, 20 main algorithms and whatever subsidiaries and random stuff.
They don't work great and the only progress is incremental (though there is
progress and throwing more computer power around at the same time enhances -
while masking the low amount of conceptual progress).

What's lacking is any modularity in combining algorithms. The power of
ordinary programming is, essentially, using function calls to put together
what you want. ML doesn't do that and for all the magic, that makes it weak
and fragile - when one magic algorithm doesn't work well, rather than
improving it, it really is better, at present, in the interest of getting
stuff done, to start with a different magic algorithm. This is true, I'm a
realistic in the sense of accepting the present but an idealist in the sense
of saying "that kind of sucks, we should be able to fix problems, not
surrender and regroup".

Yes, I'm happy to denigrate the good and proper in my question for the best.
But I'm an idealist, I suppose it's a matter of taste.

~~~
agibsonccc
I would seriously look in to deep learning.
[http://deeplearning.net/](http://deeplearning.net/) I am doing everything
with it now. This includes principal component analysis/compression, face
detection,hand writing recognition, named entity recognition, clustering,
topic modeling, semantic role labeling, among other things.

There is a very common structure to this. Despite neural nets having their own
baggage, they're worth understanding.

The structure you're wanting is definitely in there. Edit: Yes a bit of self
promotion here. Just making a point with patterns I've found as I've built
this out.

See:

[https://github.com/agibsonccc/java-
deeplearning/blob/master/...](https://github.com/agibsonccc/java-
deeplearning/blob/master/deeplearning4j-parent/deeplearning4j-core/src/main/java/com/ccc/deeplearning/nn/BaseNeuralNetwork.java)

[https://github.com/agibsonccc/java-
deeplearning/blob/master/...](https://github.com/agibsonccc/java-
deeplearning/blob/master/deeplearning4j-parent/deeplearning4j-core/src/main/java/com/ccc/deeplearning/nn/BaseNeuralNetwork.java)

Half the battle is understanding the linear algebra going on here. Beyond that
you can pretty much do everything with one set of algorithms and terminology.

For those who go WTF java are you insane? The core idea I'm linking to here is
the fact that deep nets are composed of singular neural networks with slight
variations having a very common structure for both the singular layer as well
as the deep nets themselves.

~~~
nl
_Very_ interesting!

Is that Word2Vec implementation you have roughly equivalent of the Google
version[1]?

Any examples of how to use deeplearning4j generally?

[1] [https://code.google.com/p/word2vec/](https://code.google.com/p/word2vec/)

~~~
agibsonccc
Binary compatible yes. Star the repo and watch it in the next few days.
Example apps are on the way. I plan on implementing a full "easy to use"
machine learning lib around this.

Edit: Poke around in the tests, Here's an example of it learning a compressed
version of MNIST: [https://github.com/agibsonccc/java-
deeplearning/blob/master/...](https://github.com/agibsonccc/java-
deeplearning/blob/master/deeplearning4j-parent/deeplearning4j-core/src/test/java/com/ccc/deeplearning/rbm/matrix/jblas/mnist/RBMMnistTest.java)

I have a lot more example usage in each of the tests. Test coverage was a
higher priority above the documentation, but example usage is there. I'm more
than happy to answer emails around the usage of the library as well. I also
take feature requests.

I plan on implementing convolutional nets, recursive neural nets and some
other ones based around that same structure; That includes the scale out
versions with akka for easy multi threading or clustering ( I have built in
service discovery with zookeeper among other things in there.

------
gwern
And here I was expecting things like 'overfitting' and 'not having a holdout
set or at least crossvalidating'.

~~~
aet
Yes, I didn't find this helpful at all. Also, it needs editing.

~~~
frozenport
Hackernews is moving towards Yahoo news.

------
bottombutton
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://machinelearningmastery.com/mistakes-
programmers-make-when-starting-in-machine-learning/)

------
tel
I feel like the first mistake someone makes is to try to practice "machine
learning". If you're going in with that as you're goal you're likely to fail.

Instead, tell a story. Motivate what you're doing with real questions and real
data and you'll be driven to do all 5 of these lessons (and many more smart
things).

Every time someone comes in with a pet algorithm I cringe a bit. There's
certainly an air of everything being "just marbles" and thus having
application of every algorithm to every problem, but the real question is
rarely about the algorithm—it's about the set up, the cleaning, the story.
Even when it's about the algorithm you're actually just trying to tell a
better story.

So focus on that.

Figure out what you want to "do ML" before you get too excited about what ML
is. It's often really painful and annoying with bug fixing turnaround clocking
in the hours or days. It's also some of the prettiest math around and a
collection of neat hacks for getting great answers to nigh unanswerable
questions.

But it's always about answering a question. Start there.

------
drhodes
BTW, edx.org is offering what looks like a relatively rigorous intro to
probability with calculus.
[[https://www.edx.org/course/mitx/mitx-6-041x-introduction-
pro...](https://www.edx.org/course/mitx/mitx-6-041x-introduction-
probability-1296)]. They say it closely follows this course on OCW
[[http://ocw.mit.edu/courses/electrical-engineering-and-
comput...](http://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-041-probabilistic-systems-analysis-and-applied-probability-
fall-2010/lecture-notes/)]

It starts soon, Feb 4th.

------
eCa
Using hostgator?

Ads on a 500 error page is perhaps not the most confidence building place.

~~~
yeukhon
I supposed there is DB connection max...

Anyhow, my first thing came to mind when I saw that was "oh so the greatest
mistake for a new machine learning student is encountering a 500 error..."

------
alexhutcheson
Google's cached version:
[http://webcache.googleusercontent.com/search?q=cache:Oq8_jkw...](http://webcache.googleusercontent.com/search?q=cache:Oq8_jkwVGAAJ:machinelearningmastery.com/mistakes-
programmers-make-when-starting-in-machine-learning/)

------
fnl
This is missing the universally true #1 mistake probably (nearly) _anyone_
commits when starting in ML: Missing an excellent understanding of the
problem/domain (Unless you happen to be a domain expert for the problem you
are working on, but that is a rarity).

If you do not know which features to choose and why, what the lables mean,
which background data you should use, and even more important, what the
_actual_ problem is that needs to be solved, you will be wasting lots of time
- and not just yours...

------
eghad
Please go back and spell check. The simple errors all over the place are a
little embarrassing.

------
trillium
The site looks interesting, but keep getting a 500 error. I'd advise the owner
to switch off HostGator soon; that web host has gone significantly downhill

~~~
jasonb05
Author here. Time for new hosting... a good problem to have I guess.

------
fretless
Here's some things with more substance (I don't think the blog author is doing
any SEO machinations, but there's just not much to learn from his last few
posts. Essentially he's written Data Science advice similar to many other
authors, but he's substituted the words "Machine Learning"

I could dig up other ML gotchas/guidelines posts, need to dig thru bookmarks)

[http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

[http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets04...](http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html)
(read Challenges in each dataset)

[http://alpinenow.com/blog/machine-learning-is-not-black-
box-...](http://alpinenow.com/blog/machine-learning-is-not-black-box-magic/)

[http://research.microsoft.com/en-
us/um/people/minka/papers/n...](http://research.microsoft.com/en-
us/um/people/minka/papers/nuances.html)

------
lowglow
Would anyone in SF be interested in a talk about Machine Learning for Hackers,
or Machine Learning 101?

------
Nicholas_C
> 1\. Put Machine Learning on a pedestal

Someone gave me this advice almost verbatim on HN 5 months ago:
[https://news.ycombinator.com/item?id=6335092](https://news.ycombinator.com/item?id=6335092)

~~~
agibsonccc
Appreciate being quoted >:)

That aside, you really have to take it in bits. Ignoring the math or
fundamentals behind it is by far the worst mistake you can make.

Once you get decent at understanding it, the points I emphasized (feature
vector building) become a lot less of a problem with deep learning
([http://deeplearning.net/](http://deeplearning.net/)).

Auto learned feature vectors are going to be among the best ways to do things
in the coming years. More than happy to answer questions.

~~~
metrix
I got into machine learning through an article off of HN stating that Random
forests would get you 80% of the way (I think they were right!) For my
purposes rotation forest increased my accuracy considerably. I have a few
questions:

1\. I have found that data manipulation and feature creation from a SQL
database is harder than the actually using an algorithm, and knowing how to
extract and aggregate data seemed to be more like "throw something at the wall
and see what sticks" Do you have any suggestions or information on knowing how
to extract the best data?

2\. After getting a random forest going, I had a hard time figuring out which
algorithm to try next, or how to figure out what would work best for my
dataset. Any suggestions on how to take the next step?

~~~
agibsonccc
1\. Use what correlates best with the outcomes. Look in to feature selection
and principal component analysis for this. This will cause less noise due to
smaller feature vectors. It also allows more digestable outcomes. I would also
highly reccomend visualization. Weka is great if you want plug and play;
otherwise there's the more traditional R/matlab. It really depends on what
you're comfortable with.

2 . Depends what kind of learning you're doing. I would look in to multinomial
logistic regression for most applications (more than one class) for supervised
classification. Then there's also k means if you're looking to understand
trends in your data. Keep in mind this is my off the shelf/simple
recommendation.

I would love input on a plug and play machine learning CLI. I planned on
building out my current project in to a full blown command line app. Since it
can handle most features including automatic visualization/debugging via
matplolib I figure with some documentation it might be a neat tool for people
who don't want to deal with feature selection but still want things simple.
It's definitely a problem that there's really no clear way to build simple
models. Domain knowledge is also an expensive problem.

~~~
metrix
Do you have it on a website or github? I would be interested in taking a look
at it.

~~~
agibsonccc
[https://github.com/agibsonccc/java-
deeplearning/](https://github.com/agibsonccc/java-deeplearning/)

Keep in mind documentation is one of the things I need to work on the most
now. I have it built and ready to go for the most part.

------
arasmussen
Mistakes programmers make when putting their blog on HN: not anticipating the
traffic and sending 500s our way.

------
apexkid
Website went down

~~~
cynwoody
Google cache here:

[http://webcache.googleusercontent.com/search?q=cache:Oq8_jkw...](http://webcache.googleusercontent.com/search?q=cache:Oq8_jkwVGAAJ:machinelearningmastery.com/mistakes-
programmers-make-when-starting-in-machine-learning/+&cd=1&hl=en&ct=clnk&gl=us)

------
stcredzero
Also: publishing a blog that doesn't let you zoom.

------
urbanachiever
I don't think "reinventing solutions to common problems" is a bad thing. This
is how we all learn how to do something new. And sometimes the new solution is
better than any of the other solutions out there.

~~~
pfarrell
I agree that reinventing solutions to problems is definitely in the domain of
the hacker.

When learning a new skill, however, you should understand what the common
approach is before re-discovering the work of others. The article is about how
to be more efficient in learning machine learning, not how to be a hacker.

It's akin to why (imho) better musicians learn to play other peoples styles
before developing their own.

