From my look at Part 1, it has some great coverage of the basics, all of which are important. Some of the fundamentals that I see left out are rightly left out since they require experience in real analysis to appreciate, and maybe aren't very actionable. There's few proofs, but, since the goal is a quick understanding, I can also appreciate this.
It looks to me like a great intro of statistics for CS people, as the author says.
Having studied both statistics and neural networks, I'm not sure if I completely agree with that quote. There are lots of neural network applications that have little to do with statistics (image recognition with convolutional neural networks for example).
I am pretty sure that the author means neural networks for statistical applications though.
There are certainly many areas of neural networks where statistics is important (more theoretical areas), but those don't form the core of the research field.
Also, calling (stochastic) gradient descent a form of statistical inference, while technically correct, is a ridiculous stretching of the term. No researcher considers SGD to be a statistical inference algorithm.
I have a pretty weak understanding of statistics and from my perspective it was very common for grad students to also have a weak understanding (of course, those working in statistical learning theory had a strong understanding). This is a pretty ordinary occurrence - it shouldn't be surprising - this is sort of like pointing out that many theoretical statisticians have poor coding skills.
You're kidding, right? The most fundamental reasons that deep convnets work at all are statistical in nature.
Or like programming in C without also knowing assembly and compiler theory? Or flying a plane without having a degree in aerodynamics?
I think we can extract a lot of use from high level frameworks that abstract away much of the gritty statistics and math. For most applications all we need is to have well behaved, well tested libraries and some basic intuition about how they work.
Fortunately, in machine learning almost everything is a function from inputs X to outputs y, and we know how functions work from programming. It's easy to integrate in apps. The devil is in hyperparameter tuning, but we can get away with good initializations and some measure of web research.
In time people will just use precomputed neural nets for standard tasks like Syntaxnet (text parsing), Inception (image classification) ore use web APIs to hosted services (less secure for sensitive data). We make those better, maybe fine tune them to our needs and get away with it 100x faster.
There is also work in automated hyperparameter search. Machine learning could become a black box when they get good enough.
Look at how p values are used in science journals for an example of poor stats knowledge affecting real life outputs.
There are some things in life that do require you to do the requisite reading. Things based on stats fall into that camp.
I'm going to just assume you're being literal here. It's really brightened up my day to imagine the moment such a person discovered you could close up wounds after surgery.
and Hogg, McKean and Craig:
They are both good books. The first one is more rigorous, but the second one covers more breadth and I think has better exercises.
This post links to the website supporting the book and provides links to errata, code and data. The links on the page to Springer and Amazon are broken: Here are valid links:
Here is the Google Books link:
>In comments, it's ok to ask how to read an article and to help other users do so.
Depends on just how broad article is.
Edit: Please stop the down votes, just an electrical engineer here, with one basic course in Probability and Stat. :)
Another way of thinking about it (described in Wasserman's book) is that statistics is the inverse problem of probability.
Probability theory asks: given a process, what does its data look like? Statistics asks: given data, what process might have generated it?
Excellent summary, thank you.
Making inferences and predictions from data, in the presence of uncertainty.
Analysis of the properties of procedures for doing the above.
If you want examples that avoid the feel of just "curve fitting" (assume you mean something like "inferring parameters given noisy observations of them") -- maybe look at models involving latent variables. Bayesian statistics has quite a few interesting examples.
Probability is at the heart of the project: frequencies that summarize reoccurring data. Instead of storing a reoccurring pattern multiple times, we just store it once and record how often it has occurred.
Basically, you count things and compare that to how many things you think you should have counted given your assumptions.
I think you can agree now that your original observation of statistics as "glorified curve fitting" as a bit naive.
On the basic materials level, density functional theory is the current gold standard, and it's extremely statistics heavy.
At the systems and architecture levels, you may be right, though.
If you're new to statistics, try Allen Downey's http://greenteapress.com/thinkstats2/index.html or Brian Blais' http://web.bryant.edu/~bblais/statistical-inference-for-ever.... Both are free.
Then, go in depth on regression. Not just feeding in the numbers and getting back a fitted model, but actually knowing how everything works, what the common issues are, how to interpret the estimates and so on. Once you've got that down, read Regression Modeling Strategies by Harrell to go really in depth.
Or if you're really just interested in prediction, Hastie and Tibshirani is wonderful of course.
For ML Hastie and Tibshirani ISLR is very good but is more for applications of machine learning: classification, regression and prediction.
A while back I had to teach myself Fisher matrices and the Cramér–Rao bound to solve a problem I was working on. I quickly found that 90% of statistics textbooks and lecture notes on this subject are completely useless for people like me who want to arrive at a number, not some abstract expression involving angle brackets or measures or E[...] or whatever.
The Wikipedia article on Fisher information  is one such example of a resource that is full of useless formal crap that crowds out an explanation for real people about how to use this statistical tool. This book appears to be of the same ilk. (Also, this book apparently does not discuss the Cramér–Rao bound. Ironic given the book's title.)
If anyone is curious, the single best explanation of the Fisher matrix and the Cramér–Rao bound that I have found is tucked away in an appendix of the Report of the Dark Energy Task Force . In one page they manage to concisely and clearly explain where the Fisher matrix comes from, how to compute it, and how to apply the Cramér–Rao bound.
I cannot tell you how frustrating this was for me. I wanted just the meat: the core mathematical concepts on which statistical models and inferences are built. Don't tell me a folksy story about gathering soil samples, show me the tools and what they can do, both their power and their limitations. I can think for myself about how to apply those concepts.
I loved this book for being exceptionally clear and terse. I was hooked from the first sentence: "Probability is a mathematical language for quantifying uncertainty." That one sentence makes the concept clear in a way that the entire chapter on probability from "Statistics in a Nutshell" (http://www.amazon.com/Statistics-Nutshell-Sarah-Boslaugh/dp/...) did not.
I'm not someone who thrives on theorems and proofs, I thrive on concepts. And I found this book dense with clear explanations of the key concepts.
I don't have the book here at work so I can't quote the book's introduction, but in some sense the title is meant to be literal. It's an attempt to cram an entire 4-year undergraduate statistics program into a single book, and in my opinion it's mostly successful. This book is is my go-to reference for those "Ahhhh, I remember hearing about [insert statistical test here] back in college, what was it again?" moments.
Some links to problem set solutions there