Hacker News new | past | comments | ask | show | jobs | submit login

Kevin Murphy has done an incredible service to the ML (and Stats) community by producing such an encyclopedic work of contemporary views on ML. These books are really a much need update of the now outdated feeling "The Elements of Statistical Learning" and the logical continuation of Bishop's nearly perfect "Pattern Recognition and Machine Learning".

One thing I do find a bit surprising is that in the nearly 2000 pages covered between these two books there is almost no mention of understanding parameter variance. I get that in machine learning we typically don't care, but this is such an essential part of basic statistics I'm surprised it's not covered at all.

The closest we get is in the Inference section which is mostly interested in prediction variance. It's also surprising that in neither the section on Laplace Approximation or Fisher information does anyone call out the Cramér-Rao lower-bound which seems like a vital piece of information regarding uncertainty estimates.

This is of course a minor critique since virtual no ML books touch on these topics, it's just unfortunate that in a volume this massive we still see ML ignoring what is arguably the most useful part of what statistics has to offer to machine learning.




Do you really expect this situation to ever change ? The communities are vastly different in their goals despite some minor overlap in their theoretical foundations. Suppose you take rnorm(100) sample and find its variance. Then you ask the crowd the mean and variance of that sample variance. If your crowd is a 100 professional statisticians with a degree in Statistics, you should get the right answer atleast 90% of the time. If instead you have a 100 ML professionals with some sort of a degree in cs/vision/nlp, less than 10% would know how to go about computing the variance of sample variance, let alone what distribution that belongs to. The worst case is 100 self-taught Valley bros - not only will you get the wrong answer 100% of the time, they’ll pile on you for gatekeeping and computing useless statistical quantities by hand when you should be focused on the latest and greatest libraries in numpy that will magically do all these sorts of things if you invoke the right api. As a statistician, I feel quite sad. But classical stats has no place in what passes for ML these days. Folks can’t Rao Blackwellize for shit, how can you expect a Fisher Information matrix from them ?


I think Bishop et al. WIP book Model-Based Machine Learning[0] is a nice step in the right direction. Honestly the most important thing missing from ML that stats has is the idea that your model is a model of something. That how you construct a problem mathematically says something about how you believe the world works. Then we can ask all sorts of detailed question about "how good is this model and what does it tell me?"

I'm not sure this will ever dominate. As much as I love Bayesian approaches I sort of feel there is a push to make them ever more byzantine, recreating all of the original critiques of where frequentist stats had gone wrong. So essentially we're just seeing a different orthodoxy dominant thinking with all of the same trapping of the previous orthodoxy.

0. https://www.mbmlbook.com/


What would you advise to ML professionals to do to improve their knowledge of statistics? Some recommended books?


Wait, what’s the problem with people not knowing things that they don’t need to know? This just comes across as being bitter that self taught people exist, or that other people are somehow encroaching on your field.


I think your comment does what the OP complains about, regarding gatekeeping etc.

I don't know about OP, whose comment I find a little harsh, but personally I'm always frustrated a bit and despairing a bit when I realise how poor the background is of the average machine learning researcher today, i.e. of my generation. Sometimes it's like nothing matters other than the chance that Google or Facebook will want to hire someone with a certain skillset and any knowledge that isn't absolutely essential to getting that skillset, is irrelevant.

Who said "Those who do not know their history are doomed to repeat it"? In research that means being oblivious of the trials and tribulations of previous generations of researchers and then falling down the same pits that they did. See for example how deep learning models today are criticised for being "brittle", a criticism that was last levelled against expert systems, and for similar, although superficially different, reasons. Why can't we ever learn?


> I think your comment does what the OP complains about, regarding gatekeeping etc.

Oh absolutely, that's how I intended it. I don't think that preemptively calling out people's reaction gives the parent comment a pass on gatekeeping.

Your concern about poor background... it's only a problem for people who are jumping into things without the prerequisite background and they aren't learning fast enough. But modern deep learning is much more empirical - there are a few building blocks and people are trying out different things to see how they perform. I don't get why we need to look down on people for not knowing things that they don't need to know. If there was some magic that comes from knowing much more statistics, then the researchers who do would be outperforming the rest of the field by a lot but I don't think that's the case.


That certainly is the case. Not for statistics specifically, but all the people at the top of the field, Bengio, LeCunn, Schmidhuber, Hinton, and so on, all have deep backgrounds in computer science, maths, psychology, statistics, physics, AI, etc. You don't get to make progress in a field as saturated as deep learning when all you know how to do is throw stuff at the wall to see what sticks.

I never said anything about needing to look down on anyone. Where did that come from?

My concern is that without a solid background in AI, no innovation can happen, because innovation means doing something entirely new and one cannot figure out what "entirely new" means, without knowing what has been done before. The people who "are trying out different things to see how they perform" as you say, are forced to do that because that's all you can do when you don't understand what you're doing.


To get the prediction variance in a Bayesian treatment, you integrate over the posterior of the parameters - surely computing or approximating the posterior counts as considering parameter variance?


Although this is technically true, in practice probabilistic machine learning makes use of "un-priored" parameters all the time.


Of course it does. You can put hyperpriors on the priors, and hyper hyperpriors on the hyperpriors, but the regress has to stop somewhere. What is your point?


I'm not sure I entirely follow your comment, however I was merely pointing out that reckoning with parameter uncertainty by "computing or approximating the posterior...", as you said, is not always applicable in probabilistic ML.


Yes, but that's true of all statistics. You have to make some assumptions to get off the ground. If you estimate parameter variance the frequentist way, you also make assumptions about the parameter distribution.


No, this is expressly untrue. In the frequentist paradigm parameters are fixed but unknown, they are not random variables, and have no implicit probability distribution associated with them.

An estimator (of a parameter) is a random variable, as it is a function of random variables, however this depends only on the data distribution, there is no other implicit distribution on which it depends.

For instance, the distribution for the maximum likelihood estimator of the mean of a normal distribution is normally distributed, however this does not imply that the mean parameter has a normal prior, it has no prior, as it is a fixed quantity.


But you make the assumption that the data can be generated by your model, and your variance estimate only holds asymptotically.


> But you make the assumption that the data can be generated by your model

Yes

> you also make assumptions about the parameter distribution

No

> your variance estimate only holds asymptotically

Don't follow


Do you think this book is useful for someone just looking to get more into statistic and probability sans machine learning? How would I go about that?

Currently I have lined up - Math for programmers (No starch press), Practical Statistics for data scientists (O'Reily - the crab book), and Discovering Statistics using R.

Basically I'm trying to follow the theory from "Statistical Consequences of Fat Tails" by NNT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: