I thought the lack of references was a major failing of the book. Some of the algorithms barely scratch the surface, and it does a disservice to the reader to provide noplace to go. Here's a few to get started: 1. Russel and Norvig's AI text, 2. Elements of Statistical Learning by Hastie et. al., 3. Pattern Recognition and Machine Learning by Chris Bishop.
On the other hand going right into code examples is useful, including jumping right into getting real data downloaded and worked on.
I agree about #2. I have Russell and Norvig and agree that it is an excellent book, but I am not sure how much overlap there truly is here.
I also do not have PRML, but Neural Networks for Pattern Recognition by Bishop is excellent (and includes many non-NN related items).
ESL is excellent and, to me, the best modern text in machine learning. It covers many of the topics in PCI both at a reasonable level and in much more depth and provides MANY references (100s?) to dig deeper.
On the other hand going right into code examples is useful, including jumping right into getting real data downloaded and worked on.