Empirical Bayes for multiple sample sizes 104 points by csaid81 on May 4, 2017 | hide | past | web | favorite | 17 comments

 Although tangentially linked to in the article, David Robinson's Introduction to Empirical Bayes[1] is also an excellent resource. It deals primarily with beta-binomial distributions.
 It's an excellent blog post, although it's worth emphasizing that it is designed for the binomial case, where you wish to compute the fraction of occurrences within some events, such as batting averages. For continuous variables, however, it makes more sense to use one of the methods described in the original post.TL;DR: One blog post is for Rotten Tomatoes and the other is for Metacritic.
 Absolutely, and thanks for better defining the distinction.I really just wanted to point out another solid Empirical Bayes resource, as there's not that many about. Yours and David's make a good combination covering different cases.
 Stan is great! Glad to see it on HN. Nice write up too.
 That definition of symbols! So good!
 I know! I love when math books and papers do that.
 Technical term is overkill. Just use https://en.wikipedia.org/wiki/Bayesian_average
 Thank you. As a statistician, the fact that mixed effects models (e.g. does this rater tend to rate high?) are overlooked is, IMHO, a death sentence. Too much nomenclature, too early (link to the table within the text, please, and omit needless words), and with too little attention paid to the value of an external citation.Also, MCMC for ratings? Surely you jest. If the author had touched on mixed models, then maybe it would make sense. But given the sample sizes involved here, and the noise in the variance estimates, I recommend that the author investigate mixed models tout suite if they do in fact care about the sources of shared and unshared effects on variance. Because that is what mixed models do.
 Author here. Please see the section on mixed models in my post. As I mentioned there, I would love if an expert could expand on the relationship between mixed effects and Empirical Bayes.Regarding MCMC, one of the things I try to emphasize throughout the post is that the best solution depends on your needs (for example if you want a full posterior). In fact, most of the post is devoted to quick and simple methods -- not MCMC -- because they are good enough for most purposes. I welcome your feedback though on how I could make this point clearer.
 > Also, "empirical Bayes" is in modern parlance equivalent to "Bayes". What's the alternative? "Conjectural Bayes"?My understanding of the difference, as a frequent user of empirical Bayes methods (mainly limma[1]), is that in "empirical Bayes" the prior is derived empirically from the data itself, so that it's not really a "prior" in the strictest sense of being specified a priori. I don't know whether this is enough of a difference in practice to warrant a different name, but my guess is that whoever coined the term did so to head off criticisms to the effect of "this isn't really Bayesian".
 Do you have a webpage? I just helped my wife (physician) with stats for a research presentation she made that sought to track infection spreading in hospitals (via room number, location specific) via movement of equipment and staff which was tagged. They then PCRd the strains to make sure it was the same one.The experimental design was good, the stats person they had to help them decipher the results was.... Left much to be desired.Can you please be so kind to email me jpolak{at} email service of a company where a guy named Kalashnikov worked.
 Yup, I agree about throwing away rater information. The actual application at my company that motivated me to research this doesn't have rater information, which is why I didn't think to adjust for it. The movie case was just an example I used to motivate this post for which, yes, I agree, rater information would be quite useful.
 This seems different and a bit lacking in detail (although I don't dispute that it could be useful). How exactly does one choose m and C? And what are the conditions under which it would reduce to the James-Stein / Bulmannn / BLUP model?
 The choice of m and C need not be exact. It is enough to choose them so that1. If there are no ratings, Bayesian average is close to overall mean, and2. If there are many ratings (how many depends on how big the site is), C and m do not affect the result much.You probably can do a little better if you have a lot of data and ability to run A/B tests, but for vast majority of cases pseudocounts work just fine.
 Got it. Thanks for the clarification. In that case I would think that James-Stein / Buhlmann / BLUP is a better approach, since it is just as easy to implement and the amount of shrinkage is optimally chosen based on the data, rather than on guesswork. In fact it may be more easy because no guesswork is required.It would be interesting though to have people try to guess suitable values of m and C and then see how close their MSEs get to the James-Stein MSE. I suspect that some people's guesses would be meaningfully off target.
 But that's not how you should measure it. You goal is not to minimize MSE. Your goal is to rank movies in a way that users like.So the test would be to randomly split users into test and control, show ranking based on Bayesian averaging to control, show ranking based on James-Stein or some other method to test, measure some metric of user happiness (a different hard problem, click rate on top titles?), then do the comparison.

Applications are open for YC Summer 2019

Search: