I have a newer article (not mentioned here) that ranks 5-star items using the variance of the belief. It ends up yielding a relatively simple formula, or at least a formula that doesn't require special functions. Like the OP I use a Dirichlet prior, but then I approximate the variance of the utility in addition to the expected utility:
The weakness of the approach (as well as the OP) is that it doesn't really define a loss function for decision-making (i.e. doesn't properly account for the costs of an incorrect belief), which one might argue is the whole point of being a Bayesian in the first place. In practice it seems that using a percentile point on the belief ends up approximating a multi-linear loss function, but I haven't worked out why that is.
In the machine learning community the above problems are addressed with submodular loss functions, bandit algorithms, and no doubt other methods I don't know about. Now I don't value complexity for its own sake, so I wonder if the additional power these approaches bring is warranted.
Penalizing variance would be the opposite of my intuition. Given a boring low-variance item with 10 3-star votes, and a divisive item with 5 1-star votes and 5 5-star votes, I'd think you'd want the one at the top to be the one with the medium chance that they'll "love" it than a high chance they'll find it passable.
If you further assume that the average person is going to check out the top few results but only "buy" if they find something they really like, the risky approach seems even more appealing. A list topped by known mediocre choices has a low chance of "success". What's the scenario you are envisioning?
Also related: http://planspace.org/2014/08/17/how-to-sort-by-average-ratin...
> Every upvote should increase the score, every downvote should decrease the score and the more votes there are the less an additional vote should matter. Only "adding pretend votes" satisfies this.
That really puts into words why "adding pretend votes" just felt right to me in practice.
Really? If users only ever look at the top 10 items, you'll never find out that item #33 would end up much higher if it got some attention from voters. This is not only a statistical problem, but also a policy/intervention problem. There is an explore/exploit trade-off to be solved.
A very popular policy for similar problems is to use Thompson sampling, e.g. don't sort items according to their expected score, instead draw a score at random and sort according to those. (At random from your current belief about the plausible true scores, e.g. the beta distribution you have learned.)
Fascinating. Does this follow as a straightforward consequence of how the beta distribution is defined? Otherwise, is there a proof that someone could point me toward?
> The popularity of an item has a Beta(a,b) prior
Is there an optimal choice of a, b given, say, a specific utility function?
If you look up definition of beta distribution and then write down formula for
posterior it should be quite clear that result is also beta distribution -
modulo normalizing constant which may be a little more tricky to determine.
Star ratings have problems with compression of scales at the top and bottom. You'll never know which item is someone's favorite (or least favorite) with star ratings, because typically there will be several items with the maximum or minimum number of stars.
Pair-wise comparisons are also more fun and easier for users. When I'm doing star ratings, I often find myself trying to remember what star ratings I've given to similar things that I liked a little more or a little less so that I can try to be consistent.
Pair-wise comparisons probably make more sense for items in similar categories, though. It makes a lot more sense to pick a preference between two novels than it does to pick a preference between an ice cube tray and a camp chair.
I extend it, hackily, to allow categories to be declared to apply to objects with arbitrary confidence at any time, and to declare the same categorization multiple times.
I then consider both the confidence amounts and number of declarations in comparing the overall confidence in different categorizations.
I use a 0-1.0 scale for confidence, then product adjusted confidence for each potential categorization as (sum of confidences) / (num confidence declarations + 3).
This is equivalent to assuming a prior of three declarations of zero confidence; this effectively rewards higher binders of votes, such that a single declaration for category A of confidence 1.0 will tie three declarations for category B of confidence 5.0.
So, I wouldn't call it perfect.
And rewarding bumps usually means rewarding bickering flamewars.
300 / (300+100) = 3/4
And not equal to 1/4