That might just be the reality of it, but hoping there might be a better introduction (even something super simple like a Codecademy equivalent).
I took a course on Linear Algebra (Bretscher's book up to chap 9) and a Probability course (Ross' book up to chap 6) and did very many problems by hand on paper. I just finished a ML course (Bishop's book, and Jordan's book), mixed grad+undergrad, which was 80% problems w/ pen and paper and 20% code up something algorithmically trivial but mathematically challenging, and don't think I would've been able to pick up the additional math along the way without these two great books and their many exercise problems behind me.
I read layman's explanations of ML concepts a year ago, and got nowhere in terms of my own implementation+debugging/improvement upon established techniques. Now I can solve problems I saw a year ago and thought "no one can do this."
My advice is to take the gateway drugs first, Probability (Ross <- I love this book!) and Lin Alg (I like Bretscher much better than Strang, but not everyone agrees with me :) Take a course in real life (for a grade and a transcript) at a competitive university if possible, nothing makes you study thoroughly like a gun to your head.
A First Course in Probability:
Linear Algebra with Applications:
Note that the Bretscher book has really terrible reviews, but I can't evaluate if the reviews are correct or not.
Linear Algebra with Applications: http://www.ebay.com/itm/Linear-Algebra-with-Applications-5ed...
International editions of textbooks are immensely cheaper.
Many of these things are described in the comments. I almost exclusively buy international textbooks for home reference if available. The price difference and the relatively small quality difference makes it a no-brainer. If you are doing it for a class though, find a friend with the overpriced version for homework.
For machine learning, a good place to start is Andrew Ng's course on Coursera:
It's pretty light on math, while at the same time giving you experience in implementing and understanding these techniques.
From there, I might recommend Learning from Data and the associated video lectures:
It is a bit of a jump, but it is a great course in presenting the field of machine learning and explaining the mathematical and statistical underpinnings in a systematic way.
It is a very self contained course that is quite easy to follow. You can skip the programming exercises if you don't have the time.
It's not easy to learn, especially if you are not strong in math, but if you want an intuitive understanding as to how machine learning works, I would recommend learning a combination of probability, linear algebra, and formal theories of computation (abstract machines):
This book was really what opened the gate of tying the content together:
You should also be strong in discrete mathematics.
I'm setting up a blog on Artificial Intelligence (which inherently includes Machine Learning) focused on the contents of the AIMA book (by Stuart Russell and Peter Norvig):
How would you like it to be? Code will be developed in Matlab, so the math depth is rather convenient using this tool.
The age mainly affects its usefulness as an off-the-shelf guide to applied ML, because some of the currently best performing general-purpose algorithms aren't mentioned . It also spends quite a bit of time on algorithms now considered mainly of historical interest, like version spaces.
So imo its main current usefulness is as a foundational text, which it's quite good for. It helps that it's also well written and understandable.
 A recent empirical analysis found that random forests and support vector machines seem to perform most consistently well at classification tasks, neither of which are in this book. http://jmlr.org/papers/v15/delgado14a.html
It's actually not just a video, but a specially designed player that shows the lecture video and the slides in sync, along with a bunch of bookmarks so you can jump to specific slides.
Try it in Chrome. At least on the Mac, there's a dedicated but cut-down version, which does not require SilverLight. It's based on FlowPlayer and it also seems to work with Flash disabled.
Lecture 6 seems to be offline, unfortunately.
Video is displayed using Flash Flowplayer, you can extract mp4 from the source
for example first one is http://scs.hosted.panopto.com/Panopto/Podcast/Embed/257476ca...