Hacker News new | past | comments | ask | show | jobs | submit login

I feel I'm in a somewhat unique position to talk about easy/hardness of machine learning; I've been working for several months on a project with a machine learning aspect with a well-cited, respected scientist in the field. But I effectively "can't do" machine learning myself. I'm a primarily 'self-trained' hacker; started programming by writing 'proggies' for AOL in middle school in like 1996.

My math starts getting pretty shaky around Calculus; vector calculus is beyond me.

I did about half the machine learning class from coursera, andrew ng's. Machine learning is conceptually much simpler than one would guess; both gradient descent and the shallow-neural network type; and in fact it is actually pretty simple to get basic things to work.

I agree with the author that the notation, etc, can be quite intimidating vs what is "really going on".

however, applied machine learning is still friggin' hard; at least to me; and I consider myself a pretty decent programmer. Naive solutions are just unusable in almost any real application, and the author's use of loops and maps are great for teaching machine learning; but everything needs to be transformed to higher level vector/matrix problems in order to be genuinely useful.

That isn't unattainable by any means; but the fact remains (imho) that without the strong base in vector calculus and idiosyncratic techniques for transforming these problems into more efficient means of computations; usable machine learning is far from "easy".

This response, and the replies to it actually come as a bit of a surprise to me. I am a graduate student in artificial intelligence who has a comparatively weak background in mathematics who is still very capable of applying, implementing, and improving upon machine learning techniques. I have never taken a calculus or linear algebra class. (Don't ask how I managed that, an effect of the odd route I took to get where I am). The point is you shouldn't be telling yourself you can't do it because you don't know the math, you may surprise yourself.

That being said there are plenty of barriers to a software engineer picking up some papers on machine learning and getting started. These are the result of some serious issues that I believe exist with academia at present and specifically the "publication game". I will not however get into this as this comment will quickly turn into the longest paper I've ever written. It is a problem with the system, and as the reader it is not your fault.

Your issue with applied machine learning may be largely imagined. Take for example the k-means clustering algorithm in the article. This is quite useful in a number of places be it data analysis or intelligent user interfaces, its straightforward to implement, and many usable implementations exist online. The same case is true for other models such as linear regression, naive bayes, even things like support vector machines are pretty easy to use these days.

In any case, as selfish as this is, this sentiment you're expressing provides some comfort that I'll be in demand when I enter the job market. :P Though if you really want to implement and apply some simple yet useful algorithms he recommends Programming Collective Intelligence at the end of the article and I can definitely second that recommendation.

Just wanted to say thanks for the recommendation! I actually started writing the book because I felt that a lot of the existing textbooks were unnecessarily obtuse. The details are complicated but you can still get some interesting results and have some fun with a high-level understanding.

(I'll warn you that a lot about the book is very dated now)

Hey Toby,

First off thanks for writing the book. It was the first book recommended to me for ML while I was doing my undergrad and it got me using ML as opposed to studying ML. The book opened my eyes to the fact that I really could do these things, and it definitely played a role in influencing me to pursue my MSc in AI. So again, thank you. :)

Secondly I'll agree that the book is most definitely dated at this point though it provides a survey of quite a broad selection of algorithms. It certainly isn't going to put the reader on the cutting edge or anywhere near it but it will get them started which appears to be the hardest part for some of those participating in this comment thread.

Have you considered a second edition?

I think you're glossing over the fact that for a non-trivial problem, you have the pick the right machine learning algorithm from a large set of possibilies, and typically you have to set a number of parameters to make it work. On top of that you have to follow proper methodology to get meaningful results; e.g., to determine whether the accuracy you're getting generalizes to other data as well. I think all of this contributes to making applied machine learning a non-trivial skill. And indeed, AI is just the right background for that, but the point is that you need such a background.

Thats true, though as my optimization friends put it, sometimes a really close answer that is easy to come to is often better than a precise answer which is difficult to come to. Selecting the right algorithm is a bit of an art though some simple rules can make it pretty easy. Say clustering for groups, linear regression for estimating continuous values, naive Bayes / logistic regression for classification. Without parameter tuning you can go from having a bunch of data to having a model that does something quite easily.

When were talking about applied machine learning it is generally in the context of developing a product, a feature. The most important part is going from data to feature, not necessarily squeezing the last few percentage points of performance. In this context getting a good answer easily is more important than getting the best answer.


The combinatorial number of possibilities quickly grows beyond what can be tested automatically, so you have to start from good intuitions. That's actually what machine learning is _not_ great at; it finds patterns in data (maybe even previously unknown ones), but arguably doesn't come up with creative insights.

Another recommendation for programming collective intelligence here. It really demystified the algorithms behind sites like amazon for me.

I feel the exact opposite. ML is a huge part of my job (or was), and I topped out at Discrete Math 2 or so, finishing up a few courses of the usual Calculus but never anything as esoteric as Real Analysis or the like. (I think I finished up some Linear Algebra.)

Anyway, my math is not that great and I'm way out of practice when it comes to theoretical work. Not only that, I suck at algorithm design and I'm at best a moderately decent developer from a theory standpoint.

But applied machine learning came natural, because I have a strong grasp of frequentist and Bayesian statistics as well as solving the problems backwards. If you try to apply ML or any other "exotic" solution by starting with "oh man here is this awesome technique," you will definitely be frustrated; I know that feeling all too well.

ML is just another tool in the shed. With practice, you start to understand where simple linear regression techniques are useful and when more heavy-duty methods are applicable. Keep solving real world problems rather than trying to look for stuff to hit with your new hammer (ML in this case) and I bet you'll come around a lot quicker.

I was going to say something around those lines but you save me the effort.

Although in my opinion using ML and have some decent results is not that hard. Use it and being close to the state in the art (i.e improving your recognition rate from 65% to 80%) is a completely different matter and it is hard and it requires a fully understanding of the details of the algorithm in use.

Still it doesn't meant that you can't do interesting things with an overall understanding of machine learning, but making a good cat detector is hard [0] [1]

[0] http://research.google.com/archive/unsupervised_icml2012.htm...

[1] http://techtalks.tv/talks/machine-learning-and-ai-via-brain-...

For a 65% to 80% jump, I'd say it depends on the subfield. With NLP (which is basically applied machine learning), it's not uncommon to see algorithms with >80% accuracy.

My take on this is somewhat different - my background is pure mathematics in my undergraduate degree, then applied machine learning at Facebook (ads optimization) and GS (algo trading), and currently at graduate school in ML.

My take is that 'applying' machine learning is much easier than one would think, but 'understanding' machine learning is _much_ harder than one would think.

I imagine most people start with an application - that's why they're interested in the first place. Applying an ML algorithm to a given dataset is trivial - only a few lines in R/Python with scikits-learn, etc. There's even a handy cheat-sheet [1] to tell you which algorithm to use. Your regression model spits out a few parameters, some estimates of quality of fit, maybe some pretty graphs, and you can call it a day.

The next stage is implementation - up until now, the black box has been spitting out parameters, and you want to understand what these numbers actually are and how they are determined. This means getting your hands dirty with the nitty-gritty of ML algorithms - understanding the algorithms statistical basis, understanding the computational performance of the algorithms (as you alluded to), and reading textbooks/journal articles. Many of these algorithms have a simple 'kernel' of an idea that can be obscured by a lot of mathematical sophistication - For SVM, the idea of finding a plane that separates your data as much as possible, logistic regression is just finding the line of best fit in logit space, k-means is just of repeatedly assigning points to clusters then updating clusters, etc.

It's not _that_ difficult to get working implementations of these algorithms (although debugging an incorrect implementation can be frustrating). You can implement simple versions of gradient boosting machines - a theoretically sophisticated technique that often works very well in practice - in around 100 lines of Python [2] or Go [3]. You can get implement Hopfield networks (a basic neural network for unsupervised learning) in Julia [4] and Haskell [5] without much trouble either.

I think where a lot of the value from spending 3-5+ years in graduate school comes in when you an off-the-shelf technique isn't cutting it - maybe you're overfitting, you want to play with hyperparameters, your datasets are scaling faster than the off-the-shelf implementations can deal with them, etc. Solving these kind of problems is where a lot of the mathematical/statistical sophistication can come into play - Bayesian hyperparameter optimization of existing ML techniques improved on state-of-the-art in image processing [6], understanding _why_ the LASSO works (and understanding when it doesn't) can mean having to understand the distribution of eigenvalues of random Gaussian matrices [7]. That's saying nothing of figuring out how to take an existing algorithm and scaling it out in parallel across 1000+ machines [8], which can mean using some deep convex optimization theory [9] to justify your approximations and prove that the approximated model you learn is well-behaved, etc.

I suspect that some of the gulf between ML in practice and theory arises from the differing goals in the two parties. If you just want an parameter estimation, then a lot of the rigour will seem useless, and if you want to understand a procedure deeply, you can't do away with rigour. Take the k-means paper used in the article. In the screenshot, the paper's author is presenting results of the flavour

> Under weak conditions on the data, as the number of iterations increases, regardless of the random choices made, the k-means clustering tends towards a good clustering of the data.

Depending on your viewpoint, this result is either obvious, non-obvious, or irrelevant. To prove this needs (at a glance) theorems from topology, measure theory, martingale convergence, etc.

I suppose some people find that fascinating, and want to understand _why_ this is true, and what 'good', 'tends', and 'weak' _mean_ in the above statement - and some people just find this irrelevant to their goal of finding similar blog articles. Both viewpoints are reasonable, but I suspect this divergence is the cause of the articles like these.

Applying machine learning isn't that hard, but _understanding_ it is.

[1]: https://lh3.ggpht.com/-ME24ePzpzIM/UQLWTwurfXI/AAAAAAAAANw/W...

[2]: http://www.trungh.com/2013/04/a-short-introduction-to-gradie...

[3]: https://github.com/ajtulloch/decisiontrees

[4]: https://github.com/johnmyleswhite/HopfieldNets.jl

[5]: https://github.com/ajtulloch/hopfield-networks

[6]: http://arxiv.org/pdf/1206.2944.pdf

[7]: http://arxiv.org/pdf/math/0410542v3.pdf

[8]: http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf

[9]: http://www.cs.berkeley.edu/~jduchi/projects/DuchiHaSi10.pdf

totally agree with you. i'm usually on the applications side of things, and you benefit enormously if you can understand the math behind an algorithm (it's also pretty much required for implementing most papers). that being said, actually writing a new paper is a completely different beast. in fact, many people who can write machine learning papers aren't going to be that good at actually implementing a useful solution without lots of engineering resources backing them up. there are hax, then there are math skills. overlap is rare.

Does there exist a tool that will translate mathematical symbols or an entire equation of them into plain English?

That sounds good but what I think you're really asking for is an explanation of each of the symbols in their given context and an introduction to the mathematical "jargon" and idioms of the equation's branch of mathematics.

Such things do exist. Unfortunately, they are called "textbooks".

This sounds to me like a pointless exercise. There is a reason for using mathematical notation for non trivial formulas, which is that is more compact and succint, to allow it to convey information efficiently and unambiguously. Think of a formula with a few levels of parentheses; you're not going te be able to express that clearly in a paragraph of text. It's not so much the symbols and notation itself which is hard to grasp, but the mental model of the problem space; once you have that, the formula will usually make sense because you can relate it to this mental model.

I agree to some extent. However, I do think that there is a level of "ramping up" that requires a good bit of teaching and understanding.

It sort of is the members only cigar club of academics. While I can read, understand, and subsequently implement most things in computer science land, when it comes to mathematical notation, there is a lot left to understand.

I think a LOT of people would benefit from a five or six video course simply showing a translation of complex notations to a working algorithm in a popular language, so the commonly used symbols start having meaning.

>There is a reason for using mathematical notation for non trivial formulas, which is that is more compact and succint, to allow it to convey information efficiently and unambiguously.

Maybe, but not always. Remember that Richard Feynman took great issue with how integration was taught in most math classes and devised his own method (inspired from the Calculus for the Practical Man texts).

You can always try to find an even better notation, but the only point I was making is that for certain cases anything is better than a wall of akward text.

No. Feynman never took issue with integration notation or how integration is defined or taught. The story you're referring to is how he learned of a technique for computing integrals that was not covered in schools. The technique is called "differentiation under the integral", and is arguably even more involved.

Translating in a nice programming language might work out more nicely, for coders at least.

Not in the context of a length limited conference paper. An implementation adds details for memory allocation and use of data structures &c. This is a distraction in the exposition of an algorithm. It's useful if authors make a well commented implementation available, but the actual paper should contain an abstract definition of the algorithm.

No I was thinking of csmatt's suggestion, to automatically translate the math in the paper to something more accessible to programmers.

Many mathematical concepts cannot be translated into programming, since computation is by definition discrete. For example, even a simple concept like irrational numbers cannot be completely captured by code.

A better approach might be translating them into more uniform s-expressions. Mathematica does this. You can also check out a similar approach in Structure and Interpretation of Classical Mechanics where tensor expressions are reduced to scheme expressions.

Mathematical symbols are overloaded by context, this could be done but you'd still need to tell it whether it was a formula from "machine learning" or "topology".

Then you'd still fail to account for symbols invented/repurposed in a single paper or by a single author.

Basically, the tool itself is a machine learning problem.

Even the simple case of taking a PDF, finding the first use of any given symbol in that paper, and turning the PDF into an interactive document where hovering on any use of the symbol shows a tooltip showing the first use - now that would be both useful and feasible.

But... the Venn diagram of people able to do such a thing, people motivated to do such a thing (i.e. people reading academic papers who aren't math whizzes), and people with the time to do such a task... seems not to have a sufficient intersection. Most of us in this thread fulfill categories 1 and 2, but not 3... come to think of it, I have work to do... ducks.

Some friends and I are trying to do something similar actually.

Imo this would be hard if not impossible.

Mathematical equations don't have any meaning taken by themselves; they just express a certain relationship between quantities without stating anything specific about what those quantities are.

Any translation of an equation to English would be a certain (usually lossy) interpretation of the equation for some particular quantities. But the same equation can have multiple interpretations in wildly different contexts.

Striking example is quantum mechanics. Physicists in the first half of the 20th century figured out a mathematical model (e.g. the Shroedinger equation) that worked perfectly well for predicting the behaviour of fundamental particles. Interpreting what the equation actually means (let's say in English) proved to be hard though and spawned multiple, sometimes conflicting or paradoxical interpretations. In fact, to this very day physicists are split on their favorite interpretation of QM. On the positive side, the mathematics of QM just works no matter how the result or the calculation are being interpreted.

Yes. If the symbols are in MathML there are accessibility tools which will read them out to you - effectively translating them into plain spoken english.

eg http://www.dessci.com/en/products/mathplayer/tech/accessibil...

(yes, I know that's for IE. There are other systems though).

What you're suggesting is converting a universal, concise, formal language into the potentially ambiguous and/or redundant "plain English". What's the point of that? One has to learn some math before being able to read math.

Collection of equations to C would be a good start too.

Sounds like a problem for machine learning :)

I agree that the notation is tough. Especially for folks who need to learn to do this stuff as a one-off and not as a career. That's why I wrote a book that teaches the algorithms with nearly no math notation while trying to be very rigorous: http://www.amazon.com/Data-Smart-Science-Transform-Informati...

Good point. However, the author does recommend using open source ML libraries, so applying the techniques may not require some of the implementation knowledge you suggest. It's likely more relevant to understand the semantic constraints and pitfalls (eg, am I over-fitting right now?) since the work of a small number of talented implementers can be easily reused.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact