This book is great, but if your stats background isn't quite up to snuff, it can be an intimidating first-read.

Personally, I studied Duda & Hart's pattern recognition [1] and Casella & Berger's statistics text [2] simultaneously. This took about the equivalent of 2 semesters. Duda's text gets the main ideas across without being as heavy on the probability theory / stats.

Afterwards, I studied "Elements ..." by Hastie et al., which was far more readable after going through Casella & Berger's text. Now Hastie et al. is my go-to reference. I also should note that this all assumes that you also have the requisite math background: up to calc 3, linear algebra, and maybe some exposure to numerical methods (in particular, optimization).

[1]: https://books.google.com/books?id=Br33IRC3PkQC&lpg=PP1&pg=PR...

[2]: https://books.google.com/books/about/Statistical_Inference.h...

