

Why the 5 Star Rating System Isn't Suited to Local Reviews - pmjoyce
http://www.thebuzz.at/blog/2009/05/why-yelp-have-it-all-wrong/
Yelp (or Qype/Citysearch/TrustedPlaces/Tipped etc.) are not deliberately misleading their users, but fundamentally they’re all broken, the foundations on which they’re built are unsound.
======
nimbix
I like Amazon's 5 star rating system because it also tells you how many people
gave the product which rating. There's a big difference if 20 people gave a
product 3 stars or if 10 gave it 5 stars and 10 gave it 1 star. Amazon's
system makes this difference visible, but most others don't.

~~~
zain
Is everyone looking at a different Yelp than I am? I see the same ratings
distribution as Amazon on Yelp, as well as a "trend" graph that tells you how
the ratings changed over time.

Here's a screenshot:

[http://img.skitch.com/20090504-n2jsr44y1py38pbd5wgcqs5uyn.pn...](http://img.skitch.com/20090504-n2jsr44y1py38pbd5wgcqs5uyn.png)

~~~
pmjoyce
I still think this lacks any degree of real clarity, although I admit that it
looks like it does. At best I might use the trend graphs as an starting point
from which to interrogate the qualitative data (comments).

One major problem is the need to click through to the detail page to see these
charts, this doesn't lend itself to easy decision making on the fly. However,
the real issue is that even when you do click through, it doesn't show
probably another important metric by which to judge: volume of votes over
time. If you were to add that into the equation and suddenly the user has 3
charts per venue to interrogate to establish the reliability of the initial
score out of 5.

I think that's a reasonably big ask for someone who just wants to know where's
to get a decent Kung Pao chicken around here.

------
apinstein
This article seems kinda troll-y to me.

Why? Because even Yelp understands the "The man who drowned in the river whose
average depth was 6 inches" problem, which is exactly why they offer both a
histogram view of ratings, and an average-over-time graph. I _frequently_ use
both of those tools.

Now don't get me wrong there are all kinds of problems with 1-5 rating
schemes, and there are probably better schemes out there. But what Yelp does
is as good as any I've seen towards combatting the problem, which is why I
enjoy using it.

As someone that took 4 semesters of business statistics in college, my
personal favorite alternative scheme is paired-comparison, or even better yet,
the related (and newer) MaxDiff algorithm
[<http://en.wikipedia.org/wiki/MaxDiff>]. The only problem with these schemes
are that they require much more user input which frankly is a lot to ask for
on a site like Yelp.

~~~
pmjoyce
I agree advanced statistical analysis is not suited to these types of sites as
the requirement on the user is such that the response rate would be virtually
nil.

But the point is that while the 5-star system initially looks and feels like
it's offering some real data with which to evaluate a bar/restaurant, the
reality is that the data need advanced interpretation to come even close to
being meaningful not to mind useful.

There's probably not a perfect system out there but I'm pretty sure that other
models would be more "honest" at least.

~~~
thwarted
The 5-star rating system is exactly the same as asking someone "would you
use/watch/go/buy-from this product/place/movie/business again?". You ask a few
people and you consolidate their answers into a aggregate perception and then
make your own decision based on that. Then you go there (or not) and make your
own judgement and add your input to the pool. This is exactly useful for the
intended purpose and audience.

~~~
pmjoyce
I disagree strongly with your statement that the 5 star system is "exactly the
same as asking someone "would you use/watch/go/buy-from this
product/place/movie/business again?". "

In fact, I think this is exactly where the complication arises from. I agree
that that is probably a close approximation of the question one would like to
have answered, but asking people to give a rating out of 5 stars certainly
doesn't ask it. Not only that but it actually muddies the water somewhat when
aggregate data is spat out as an answer.

~~~
thwarted
Agreed, that is a question that needs a boolean answer, and a rating scale is
a range of answers. But if you were to ask ten people that question and their
average answers came back at above the half-way point, would you think it was
a good place or not? If nine of your friends said they had a good experience
and one said they didn't, how is it not that this place is considered to be a
good? That's all these rating system strive to do.

I also don't see how any of this leads to less or more "honesty" in the
ratings.

------
thwarted
While a simple scale rating system is just that, simple, anything more complex
quickly exceeds the amount of effort both rater and searchers are willing to
put in to understanding it. "Rate this restaurant on a scale of 1-10" is much
simpler to understand than a massive questionnaire (even ratings on something
like 3 or 4 axes may be too many/too much effort). It also has the advantage
of being ambiguous enough that ANY opinion can fit into it -- you would be
hard pressed to find the right kind of questions that fit all the different
things that can be rated (What if you liked a place but didn't experience any
of the specific things being asked about? Would you rate a movie theater on
service if you never used the concession stand?). A site that didn't allow a
reviewer to put in prose to explain their rating would be useless; I believe
all the sites the OP talked about do. The rating scale is just meant to be a
summary, each one is a summary of the thing being reviewed, and in aggregate
it's a summary of the community's feel for the thing being reviewed.

When flipping channels looking at movies that are on, the star rating system
is largely worthless because 1) it was most likely done by some paid movie
buff who has inherently different motivations and likes/dislikes than I do and
2) there is no explanation as to the reasoning why a rating was given. It's
all subjective measurements, and it can't be objective. The whole point of
rating systems is opinion. It's just like asking a friend if a place is worth
going.

Often times, the use case is "find something good with a minimal amount of
hassle". This comes down to things like "places people liked that are within a
mile of where I am right now". I have five minutes to make a decision on this,
the single data point of a rating scale helps me make the decision quickly and
(perceptually) accurately (it may not actually be accurate, but I feel like
I'm making a good decision). These sites are also designed for repeat users
and users being contributors. You learn the way other people have rated things
on the site over time and are more able to decide for yourself what a "1 star"
means vs a "5 star" means given the context.

I think a larger problem is getting people to _want_ to expend the effort to
provide a review of a place that they aren't extremely excited about (either
positively or negatively). I know I don't bother to review places I had a so-
so experience at, but if I had excellent service or really bad service, I make
it a point to rate them. In some way, this skews the results, but is most
likely isn't that big of a deal.

I don't know what the OP has in store for his next posting, but if it's truly
revolutionary, he should be starting his own competing review service.

~~~
pmjoyce
Good analysis, and it real get to the nub of the article: the difference
between the perception of making a good decision based on the available data
and how it's presented versus the reality of the decision making process.

What I'm proposing isn't new or revolutionary - it's simply that we need a
more honest appraisal system - something that everyone can understand not just
_think_ they understand because it's presented in a familiar and reassuring
format (5 stars).

~~~
thwarted
What is dishonest about the current systems? And what makes it look like an
opinion system could be honest? Does it seem like people are purposely
misrepresenting their position in these kinds of rating systems? Everyone
_does_ understand the simple rating system, which is why it works and is
popular. I think there is a perceptual accuracy, and not necessarily an
absolute accuracy, to the way people interpret the data because they are
looking for absolutes in a system which can not inherently contain absolutes,
as it is based on opinions.

Discussion of the Netflix Prize is a good place to mine data on the accuracy
of rating systems and improve them.

~~~
pmjoyce
What is dishonest about the current systems? The dishonesty (wilful or
otherwise) arises from allowing the 5 star system to proxy for something that
gives users something meaningful with which to base their decision making.

And what makes it look like an opinion system could be honest? I said an
opinion system could be made _more_ honest. To be clear, I'm not suggesting
that the people voting are misrepresenting their position (the users are
dishonest) merely that the 5 star rating system that purports to tell us
something of value, doesn't (the system is dishonest).

Honesty comes from simplicity and it's my opinion that aligning online
recommendations systems more closely with "real world" models is not only
purer (in that we'll have a greater innate understanding) but probably more
accurate and more helpful as a decision making aid.

As for the Netflix Prize, I can't help feeling that it's not much more than
asking how many angels can dance on the head of a pin?

------
Caged
I've always thought 5 star rating systems were fundamentally flawed because
they're based on the assumption that the same experience experienced by two
different people will get the same star rating. For example, what might be a
four star experience for me could be a two star experience for you.

Maybe it's just me, but I'd prefer a probability system such as up and down
votes. If I see a restaurant has 100 "likes" and 10 "dislikes" it's probable
that I'll also enjoy the place (given the fact I already know they're serving
food and drinks I would normally enjoy).

~~~
chris11
I think that would be relatively easy to estimate from Amazon's rating since
they break down the votes. It's a safe assumption that 1 or 2 star votes would
be equivalent to a down vote, and 4 or 5 star votes would garner an up vote.

------
adamc
Nothing about his analysis is limited to local reviews -- the mean is a
limited measure regardless, and the limitation of reviewers reviewing at
different times is a function of the number of reviews per unit time, not
locality. (They're correlated, obviously, but far enough out in the "long
tail", even international reviews are going to be thin.)

------
jambalaya
What about the Wilson score confidence interval mentioned here?

[http://www.evanmiller.org/how-not-to-sort-by-average-
rating....](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html)

~~~
benburkert
I recently implemented a ranking system which combined the two:
<http://pastie.org/467334>

Not sure how sound my math is, but it seems to produce the desired results.

------
derefr
There's nothing wrong with _asking_ users to rate things out of five. However,
just seeing an out-of-five score is, indeed, pointless when you talk about
products that can vary per-person and over time.

I'd suggest that the score be rendered as a Sparkline of aggregate-score-over-
time, with a surrounding colored field, the width of which is the deviation
for that aggregate sample point. Thus, you could see whether a five-star
restaurant used to be a three-star at a glance, and see how many people
disagree with the current rating with a simple visual geometric comparison.

------
pmjoyce
Right, but that's not the whole story. The 5 star system is probably a lot
better suited to the sorts of things that Amazon sell (music, DVDs,
electronics etc.). These ratings are not as time sensitive as a bar or
restaurant ratings for example which could change entirely in the space of a
very short period of time.

------
jonno99
this type of problem is very common when surveying people. A 6 or 4 star
system would be much better. People then have to decide it it was above or
below "average" because there is no middle star.

------
edw519
_That's_ why you read the comments.

The best rating systems I've seen looked something like this:

    
    
      # of stars (1-5): __
    
      If not 5, the #1 thing that would get more stars:
      _________
    

or

    
    
      If not 5, the top 3 things that would get more stars:
      _______________
      _______________
      _______________

