
Machine Learning for Developers - haifeng
http://xyclade.github.io/MachineLearning/
======
zxcvvcxz
Is anyone else at least a bit worried about a bunch of developers running
around doing "machine learning" without much understanding of mathematics and
probability? E.g. consider the creation of fragile models that overfit data
being used in finance, infrastructure, medicine, etc.

~~~
aficionado
I'm a little bit worried. At least at the same level as when I see a bunch of
developers compiling programs without much understanding of what an LL(k)
parser does, or how a pushdown automaton works, or what a Turing machine is. I
usually feel the same every time I see an elevator without a liftman, don't
compute a square root by hand, or hear about Google self-driving cars.

~~~
jorgemf
The difference is that the software or the elevator will work but the
statistical model is wrong and doesn't work. It is like the elevator only lift
people above 120 and below 90 and for the others it just don't work or take
you to the wrong floor.

~~~
chriswarbo
> The difference is that the software... will work

Lots of software _doesn 't_ work. Is there a substantial difference between
putting an overfitting model in production, and putting a poorly tested
program in production?

~~~
photoJ
I think there might be. When ML fails the only individual capable of noticing
is someone who understands the math. When code breaks often the "lay" user
notices. The result is obvious to a novice. When ML fails it looks like a
duck, quakes like a duck but after multiple years of study its immediately
recognizable as an antelope. Though to disagree with my own point, security
vulnerabilities have a similar profile. In essence, to all but the highly
trained the difference is imperceptible.

~~~
forgetsusername
> _" When code breaks often the "lay" user notices. The result is obvious to a
> novice."_

That depends "how" it breaks. As a novice coder myself, I've had things go
wrong that I don't notice or can't identify, and it looks like my program is
running fine.

I think that's the parent's point: it might be stupid to implement crappy
macho learning models into production, but it isn't worrisome. It's expected.

~~~
photoJ
I hear ya. I knew that assertion was going to draw some criticism as its a
judgement call about where we draw the line. Who's a novice and what's
obvious? However I can't get away from my nagging impression that statistical
validity is not inherently clear to the absolute best practitioners. Causality
is the goal, and its notoriously difficult, even for world class minds. In my
experience the only similar effervescent specter for software development is
in security. Such circumstance,seem to me, to require great humility and
introspection about ones abilities, but I suppose a little of that would go a
long way in general too!

------
jwr
Please do not disable zooming. It makes life unnecessarily difficult for those
on iOS devices who do not have perfect eyesight.

------
Omnipresent
Is there a decent introduction to scala as a language with real world ML? I've
come across various ML primers that go into detail on PCA, Linear regression,
etc. But not any that show real world ML usage i.e. if person listen to music
of type X they'll also like Y. Face detection, etc.

~~~
rjurney
You also don't see things about feature engineering, how to know what to do
next to improve things, etc. ML is an art, and nobody covers this much :(

~~~
Omnipresent
exactly my thoughts. Its one thing to "know" about curse of dimensionality but
one doesn't get to knot things unless they are applied to solve problems.

------
dragonsh
I prefer if the author pointed out to course in edx on Data Science and
Machine learning. Python is slowly becoming the gold standard for Machine and
Deep Learning. Since Python has been very strong in scientific and artificial
intelligence community there is a large corpus of knowledge. Given how easy it
is to go from experiment to a live web service using python you don't need to
fiddle with hundreds of xml configuration and infrastructure to just get it to
work. Also with Anaconda and jupyter you can share your knowledge so easy.
Julia is catching up which is good, but its still very far from Python.

------
jtwebman
I would love to see something like this in Elixir :)

~~~
arca_vorago
That was my first thought as well, but as I understand it Elixir seems to
faulter when it comes to computationally heavy stuff, but perhaps it could
make up for it with it's amazing concurrency and scalability?

~~~
hderms
Erlang has good C/OS Process interop so the best route would probably be to
capitalize on something written in a faster language to do the raw processing
and have Elixir there to coordinate resources and report/store results

------
jszymborski
Wonder if there are similar articles about beginner image classification.

------
vonnik
This is fantastic. For deep learning in Java or Scala to feed feature vectors
into Xyclade, we built [http://deeplearning4j.org](http://deeplearning4j.org)

------
p1esk
Java and Scala? Who uses that in ML? Python has long been the best language
for ML, with some competition from Matlab.

~~~
RyanZAG
Python isn't used for much ML in the field from my experience. It is heavily
used for teaching and learning about ML - but for actual production ML, I've
seen mostly compiled languages. The main reason is that ML is highly
parallelizable and Python isn't terribly good at that. Plus you need to crunch
large datasets and speed becomes important.

So, respectfully, lots of people use languages other than Python for ML, and I
doubt if Python is even the largest deploy base of ML.

~~~
nostrademons
I've seen Python (and R) used all the time for exploratory ML. Do all of your
feature extraction, feature selection, parameter tweaking, and backtesting in
Python, and then once you have a model that works reasonably well, port the
feature extraction for _only the features that actually work well_ over to a
compiled language like Java or C++, train your models on lots of data, and do
your actual classification in the compiled language.

Most ML is an iterative process, and the final model that's used in production
is just the tip of the iceberg of the development work that went on. Python
works as well for exploratory programming there as it does for any other
domain.

