My math starts getting pretty shaky around Calculus; vector calculus is beyond me.
I did about half the machine learning class from coursera, andrew ng's. Machine learning is conceptually much simpler than one would guess; both gradient descent and the shallow-neural network type; and in fact it is actually pretty simple to get basic things to work.
I agree with the author that the notation, etc, can be quite intimidating vs what is "really going on".
however, applied machine learning is still friggin' hard; at least to me; and I consider myself a pretty decent programmer. Naive solutions are just unusable in almost any real application, and the author's use of loops and maps are great for teaching machine learning; but everything needs to be transformed to higher level vector/matrix problems in order to be genuinely useful.
That isn't unattainable by any means; but the fact remains (imho) that without the strong base in vector calculus and idiosyncratic techniques for transforming these problems into more efficient means of computations; usable machine learning is far from "easy".
That being said there are plenty of barriers to a software engineer picking up some papers on machine learning and getting started. These are the result of some serious issues that I believe exist with academia at present and specifically the "publication game". I will not however get into this as this comment will quickly turn into the longest paper I've ever written. It is a problem with the system, and as the reader it is not your fault.
Your issue with applied machine learning may be largely imagined. Take for example the k-means clustering algorithm in the article. This is quite useful in a number of places be it data analysis or intelligent user interfaces, its straightforward to implement, and many usable implementations exist online. The same case is true for other models such as linear regression, naive bayes, even things like support vector machines are pretty easy to use these days.
In any case, as selfish as this is, this sentiment you're expressing provides some comfort that I'll be in demand when I enter the job market. :P Though if you really want to implement and apply some simple yet useful algorithms he recommends Programming Collective Intelligence at the end of the article and I can definitely second that recommendation.
(I'll warn you that a lot about the book is very dated now)
First off thanks for writing the book. It was the first book recommended to me for ML while I was doing my undergrad and it got me using ML as opposed to studying ML. The book opened my eyes to the fact that I really could do these things, and it definitely played a role in influencing me to pursue my MSc in AI. So again, thank you. :)
Secondly I'll agree that the book is most definitely dated at this point though it provides a survey of quite a broad selection of algorithms. It certainly isn't going to put the reader on the cutting edge or anywhere near it but it will get them started which appears to be the hardest part for some of those participating in this comment thread.
Have you considered a second edition?
When were talking about applied machine learning it is generally in the context of developing a product, a feature. The most important part is going from data to feature, not necessarily squeezing the last few percentage points of performance. In this context getting a good answer easily is more important than getting the best answer.
Anyway, my math is not that great and I'm way out of practice when it comes to theoretical work. Not only that, I suck at algorithm design and I'm at best a moderately decent developer from a theory standpoint.
But applied machine learning came natural, because I have a strong grasp of frequentist and Bayesian statistics as well as solving the problems backwards. If you try to apply ML or any other "exotic" solution by starting with "oh man here is this awesome technique," you will definitely be frustrated; I know that feeling all too well.
ML is just another tool in the shed. With practice, you start to understand where simple linear regression techniques are useful and when more heavy-duty methods are applicable. Keep solving real world problems rather than trying to look for stuff to hit with your new hammer (ML in this case) and I bet you'll come around a lot quicker.
Although in my opinion using ML and have some decent results is not that hard. Use it and being close to the state in the art (i.e improving your recognition rate from 65% to 80%) is a completely different matter and it is hard and it requires a fully understanding of the details of the algorithm in use.
Still it doesn't meant that you can't do interesting things with an overall understanding of machine learning, but making a good cat detector is hard  
My take is that 'applying' machine learning is much easier than one would think, but 'understanding' machine learning is _much_ harder than one would think.
I imagine most people start with an application - that's why they're interested in the first place. Applying an ML algorithm to a given dataset is trivial - only a few lines in R/Python with scikits-learn, etc. There's even a handy cheat-sheet  to tell you which algorithm to use. Your regression model spits out a few parameters, some estimates of quality of fit, maybe some pretty graphs, and you can call it a day.
The next stage is implementation - up until now, the black box has been spitting out parameters, and you want to understand what these numbers actually are and how they are determined. This means getting your hands dirty with the nitty-gritty of ML algorithms - understanding the algorithms statistical basis, understanding the computational performance of the algorithms (as you alluded to), and reading textbooks/journal articles. Many of these algorithms have a simple 'kernel' of an idea that can be obscured by a lot of mathematical sophistication - For SVM, the idea of finding a plane that separates your data as much as possible, logistic regression is just finding the line of best fit in logit space, k-means is just of repeatedly assigning points to clusters then updating clusters, etc.
It's not _that_ difficult to get working implementations of these algorithms (although debugging an incorrect implementation can be frustrating). You can implement simple versions of gradient boosting machines - a theoretically sophisticated technique that often works very well in practice - in around 100 lines of Python  or Go . You can get implement Hopfield networks (a basic neural network for unsupervised learning) in Julia  and Haskell  without much trouble either.
I think where a lot of the value from spending 3-5+ years in graduate school comes in when you an off-the-shelf technique isn't cutting it - maybe you're overfitting, you want to play with hyperparameters, your datasets are scaling faster than the off-the-shelf implementations can deal with them, etc. Solving these kind of problems is where a lot of the mathematical/statistical sophistication can come into play - Bayesian hyperparameter optimization of existing ML techniques improved on state-of-the-art in image processing , understanding _why_ the LASSO works (and understanding when it doesn't) can mean having to understand the distribution of eigenvalues of random Gaussian matrices . That's saying nothing of figuring out how to take an existing algorithm and scaling it out in parallel across 1000+ machines , which can mean using some deep convex optimization theory  to justify your approximations and prove that the approximated model you learn is well-behaved, etc.
I suspect that some of the gulf between ML in practice and theory arises from the differing goals in the two parties. If you just want an parameter estimation, then a lot of the rigour will seem useless, and if you want to understand a procedure deeply, you can't do away with rigour. Take the k-means paper used in the article. In the screenshot, the paper's author is presenting results of the flavour
> Under weak conditions on the data, as the number of iterations increases, regardless of the random choices made, the k-means clustering tends towards a good clustering of the data.
Depending on your viewpoint, this result is either obvious, non-obvious, or irrelevant. To prove this needs (at a glance) theorems from topology, measure theory, martingale convergence, etc.
I suppose some people find that fascinating, and want to understand _why_ this is true, and what 'good', 'tends', and 'weak' _mean_ in the above statement - and some people just find this irrelevant to their goal of finding similar blog articles. Both viewpoints are reasonable, but I suspect this divergence is the cause of the articles like these.
Applying machine learning isn't that hard, but _understanding_ it is.
Such things do exist. Unfortunately, they are called "textbooks".
It sort of is the members only cigar club of academics. While I can read, understand, and subsequently implement most things in computer science land, when it comes to mathematical notation, there is a lot left to understand.
I think a LOT of people would benefit from a five or six video course simply showing a translation of complex notations to a working algorithm in a popular language, so the commonly used symbols start having meaning.
Maybe, but not always. Remember that Richard Feynman took great issue with how integration was taught in most math classes and devised his own method (inspired from the Calculus for the Practical Man texts).
Then you'd still fail to account for symbols invented/repurposed in a single paper or by a single author.
Basically, the tool itself is a machine learning problem.
But... the Venn diagram of people able to do such a thing, people motivated to do such a thing (i.e. people reading academic papers who aren't math whizzes), and people with the time to do such a task... seems not to have a sufficient intersection. Most of us in this thread fulfill categories 1 and 2, but not 3... come to think of it, I have work to do... ducks.
Mathematical equations don't have any meaning taken by themselves; they just express a certain relationship between quantities without stating anything specific about what those quantities are.
Any translation of an equation to English would be a certain (usually lossy) interpretation of the equation for some particular quantities. But the same equation can have multiple interpretations in wildly different contexts.
Striking example is quantum mechanics. Physicists in the first half of the 20th century figured out a mathematical model (e.g. the Shroedinger equation) that worked perfectly well for predicting the behaviour of fundamental particles. Interpreting what the equation actually means (let's say in English) proved to be hard though and spawned multiple, sometimes conflicting or paradoxical interpretations. In fact, to this very day physicists are split on their favorite interpretation of QM. On the positive side, the mathematics of QM just works no matter how the result or the calculation are being interpreted.
(yes, I know that's for IE. There are other systems though).