
YouTube Comes To A 5-Star Realization: Its Ratings Are Useless - vaksel
http://www.techcrunch.com/2009/09/22/youtube-comes-to-a-5-star-realization-its-ratings-are-useless/
======
Timothee
The official post from YouTube: [http://youtube-
global.blogspot.com/2009/09/five-stars-domina...](http://youtube-
global.blogspot.com/2009/09/five-stars-dominate-ratings.html)

~~~
jmah
That chart is veering dangerously toward misleading; it should be a bar, not a
line.

------
acangiano
Having "xx% liked this video" under each video would be more useful and
meaningful.

BTW, they really should have used a histogram here.

------
icey
I'm shocked that such a big-data company has missed the mark on ratings by
this much. Especially given that they have Flash running in the browser.

If I were YouTube, I would start by automatically ranking the videos by the
percentage of times it was completely played but discounting times it was
played completely and the page was just left open with no activity (to
counteract the people who opened the page and ignored it afterwards, or left
their machines idle). Discard the 5% outliers on either end of the spectrum
and you should have a pretty decent idea of which videos are popular and which
ones are boring.

After all, don't they care mostly about whether the videos are engaging?

It seems like there should be a ton of techniques that they should have
already been using to supplement the star ratings in order to fine tune any
algorithms they wanted to play with.

~~~
stcredzero
_I'm shocked that such a big-data company has missed the mark on ratings by
this much_

Well, given how badly they messed up with _comments_...

<http://xkcd.com/202/>

------
InclinedPlane
At least they realize it. Amazon book ratings have a similar problem, to a
lesser degree. You really have to dig into the individual reviews to get any
reasonable ratings info. I find, at least for technical books, that comparing
the price the book is being sold for used to the new price is one of the more
accurate ways of determining if a book has legitimate value.

~~~
netsp
Actually, Amazon's review & rating system seems to be one of the few that
works well. Seeing the distribution and reading the most useful positive and
negative reviews works really well in my experience.

~~~
stcredzero
_Seeing the distribution_

This is key! Almost always, only good or great products will have a bunch of
ratings and a distribution like:

    
    
        ****************
        ********
        ****
        **
        **
    

_Someone_ is always going to dislike just about anything. But the above
distribution seems to indicate something that a lot of people genuinely like.

I also look at the most useful positive and negative reviews.

~~~
netsp
It's not just about good vs bad. It is also the fact that i can see what
people didn't like it. Often there are downsides that are a big deal to some,
those that rate low which may or may not apply to me.

------
tel
Though I don't have aggregate data, I don't think Amazon has this issue.
People seem able to differentiate within the 2-4 range when there's an actual
purchase involved.

Then again, Amazon doesn't account for the statistical uncertainty of votes,
so it's a little odd as well.

~~~
netsp
Maybe it's because Amazon asks (and receives) written reviews as well. They
make people think a little more. I assume there is still a bias towards the 1
& 5 stars though.

~~~
Anon84
Not quite. See <http://arxiv.org/pdf/0909.0237> Page 9 Table 4.

The closest I've ever seen to the "ideal" curve is the IMDB curve right next
to it.

~~~
netsp
I love that on HN you can get as a response to a vague & half-arsed comment a
reference and link to an academic paper.

Cheers.

------
DanielStraight
I've always thought smaller scales were better. I think asking people to rate
something on a 1-5 scale is optimistic. A 1-10 scale is simply insanity. No
one, without substantial effort, can reasonably rank something from 1-10...
even if they think they can. I once had someone try to convince me that they
could rank the attractiveness of girls accurately on a scale of 1-100,
differentiating between every point on the scale. I don't think anyone would
argue that that's insane. I feel the same way about 1-10 scales. It simply
isn't possible. When a particular rating is really important, I say use 1-4. I
think preventing people from picking a middle answer will help get more honest
opinions. If it's an issue about which you can truly be apathetic, 1-3. Hacker
News seems to do quite well with a scale of 1-1 on submissions, or perhaps we
should call it a scale of apathy-opinion.

Maybe for YouTube they could count opinion votes when someone watches the same
video twice or uses the embed/share feature. If someone watches a video but
never shares it or watches it again, then I think it's fair to say they were
apathetic about it.

~~~
derefr
Just to supply evidence to the contrary, I make full use of the five-star
scale in iTunes, with each additional star translating to some privilege the
song gets as a strict subset of the songs with lesser ratings (e.g. 4+ =
always on iPod.) I frequently wish for half-ratings across the board, which
would make the full, discrete scale 1-10 :) I imagine I may indeed be in the
extreme minority, though.

~~~
roundsquare
See, thats key to making the system worthwhile. Your ratings come back to
haunt you in a qualitatively different way for each rating. Not that I can see
how youtube could do this...

------
NathanKP
I commented on the article:

"Five start systems are supposed to work on the basic idea that the one and
five votes will average out to a value somewhere in the middle. In this way
the two, three, and four star ratings are averages based on the ratio of one
star votes to five star votes."

However, that doesn't always work. People seem to be lazy and they don't want
to judge the comparative value of different items.

------
IsaacL
I found this interesting, as I've just started working again on a web app for
rating learning resources - I made a previous post on HN asking about the
merits of a 5-point system versus a 2-point system (thumbs up / thumbs down).
It seemed the consensus was to the five-point system, as it's more
informative.

However, TC seems to believe that the 5-star system is more poorly defined.
Opinions? I've been wondering about whether to include guidelines for the
different rankings, (3 stars means this, 4 stars means this, etc) or just to
let people use their own definitions (since, in all likelihood, the guidelines
would be ignored). I've always thought it strange when reviewers provide a
little box saying "5 stars - Excellent, 4 stars - Good..." and so on, since
most people should have got the idea by now.

I also want to segregate links by level - Getting
Started/Beginner/Intermediate/Advanced - and was trying to think of precise
definitions for each. Again, I've also thought that since people likely would
ignore any such definitions, it might be better for them to use their own
definitions. Theory being that if 100 people thought this link should be in
the 'beginner' category, then others won't be surprised to find it there.

So, give definitions for each ranking, or let users work by their own
interpretations for each ranking?

2 ideas to improve things:

Idea 1: Change the weighting of a vote based on the voter's voting habits. Eg,
if a person only gives 1 and 5 star votes, decrease the weighting for their
votes. I doubt I'm the first person to come up with this idea, does anyone
know of any sites that implement such a scheme?

Idea 2: Users with 'editor' priveleges have the ability to move things around
into their 'correct' place. This could make it more useful to have predefined
definitons for each ranking and category.

------
JCThoughtscream
I'd say that a rating based on the number of favorites makes more sense. It's
less arbitrary than the five-star system, in that it tracks the nominal level
of interest in the video. People don't generally favorite a video at random,
so any favoriting at all is generally a good sign as to the video's content.

------
brazzy
I really don't see why, as the article claims, the fact that opinions are
subjective and everyone has a different one makes the 1-5 star rating any more
useless than a thumbs up/thumbs down/do nothing or a favorite/not favorite
rating. Those are subjective and different for everyone as well, and the idea
is that it averages out over lots of people.

No, the problem is that people can't be bothered to spend time to think about
just _how much_ they like something, compared to everything else, and that it
can become uncomfortable to try and do that because you realize that your
relative liking may not be neither constant nor consistent, and trying to make
it so could be a lot of work (I like A a lot, but B even more... hey, C is
_really_ cool! But some things about A are better than C... now what?).

~~~
potatolicious
_"I really don't see why, as the article claims, the fact that opinions are
subjective and everyone has a different one makes the 1-5 star rating any more
useless than a thumbs up/thumbs down"_

Because you're implementing a system that on paper has a lot more resolution
than what you're really getting. Imagine buying a 1080p HDTV and then showing
solid colors on it only - not only a waste of engineering effort but also
building subsequent systems around the validity of your 5-star ratings will
also be fundamentally broken.

Also, based on their data there's a very concrete reason why the 1-5 star
rating is worse than the thumbs up/thumbs down. With the thumbs up/down system
you have a single dimension of data ("likedness"), whereas with the 1-5 star
rating system they're only getting data from people who like the video (and
almost none from people who disliked it, look at the distribution). This makes
the data practically useless for determining the quality and user preference
for a video. Consequently ranking algorithms just won't work on the star
system - the difference between video #1 and #100,000 can be an average rating
of 4.8 and 4.92.

 _"Those are subjective and different for everyone as well"_

So are movie ratings - but it's still a very useful metric to a lot of people.
With a large enough sample size you get the lowest common denominator
preference measure - which may be what YouTube wants.

 _"the problem is that people can't be bothered"_

I object to the labeling of basic user behaviour as laziness or some type of
stupidity. Users will behave how they behave - assigning value judgments to
this behaviour just makes you a prick, and disconnects you from your users
(who also happen to be your customers, yay!). If your users aren't using your
system in the way you intended, _you_ need to fix it. Trying to pawn off your
responsibility in the equation as "lazy users can't be bothered" simply is a
cop-out.

~~~
brazzy
_Because you're implementing a system that on paper has a lot more resolution
than what you're really getting. Imagine buying a 1080p HDTV and then showing
solid colors on it only - not only a waste of engineering effort_

The difference is that the HDTV costs extra, while the rating resolution does
not.

 _but also building subsequent systems around the validity of your 5-star
ratings will also be fundamentally broken._

It depends on how you interpret and use the data.

 _Also, based on their data there's a very concrete reason why the 1-5 star
rating is worse than the thumbs up/thumbs down. With the thumbs up/down system
you have a single dimension of data ("likedness"), whereas with the 1-5 star
rating system they're only getting data from people who like the video (and
almost none from people who disliked it, look at the distribution). This makes
the data practically useless for determining the quality and user preference
for a video._

Again: With only people who like voting 5 stars and a few people who dislike
voting 1 star you have the _exact same_ information as with a "favoriting
only" voting system, or as a thumbs up/down one with few people voting down
(which is very likely). So how is it any more useless? The only real problem I
see is that people probably interpret the ratings differently - to some, 1
star is "the worst possible vote", to others it is "one notch better than no
vote".

 _Consequently ranking algorithms just won't work on the star system - the
difference between video #1 and #100,000 can be an average rating of 4.8 and
4.92._

Which is not necessarily a problem, if you weigh the average by the number of
votes. Sure, you don't want to rate a video with one 5-star rating higher than
one with 5000 5-star ratings and one 1-star rating. But you don't have to.

 _So are movie ratings - but it's still a very useful metric to a lot of
people. With a large enough sample size you get the lowest common denominator
preference measure - which may be what YouTube wants._

Um, yeah. That's exactly what I said.

 _I object to the labeling of basic user behaviour as laziness or some type of
stupidity._

You're reading something into my comment that I did not say. Actually yeah,
it's lazy... and people have every right to be - they go to YouTube to be
entertained, not because they're being paid for it.

People do something because there are rewards. With user-generated content,
the rewards are generally immaterial: attention, status, self-expression. The
only reward for thoughtful ratings is self-expression, but very limited
because nobody can see your vote. Even a single-word comment is more rewarding
than that. So if you want people to spend the time and effort that thoughtful
fine-grained ratings require, you'd have to add artificial rewards - which is
almost impossible because you can't measure how thoughtful a rating is.

The alternative it to make rating easier, so that more people will do it at
the current reward level. And "easier" here means "requiring less thought".

------
JDigital
The more motivated you are to vote, the more likely you are to vote it a five.
A mediocre video is characterized by a lack of votes, not a preponderance of
three-star votes.

Back when Youtube rounded the average vote result down to the nearest star, I
joked that Youtube videos really only had three ratings:

Five star: New video (eventually it will accrue a vote of other than 'five'
and drop to four stars) Four star: Top notch (mainly fives, a few troll one-
star votes) Three star: Disgusting trash (at least many one-votes as fives)

You don't see anything less than three-star since it will have been deleted by
the time you get there.

------
johnfn
The way to make ratings more valuable can be seen on a site like
www.rateyourmusic.com . You only get one vote per item, and voting again just
changes your previous vote. Furthermore, you can see your voting distribution
(this usually cows people into not just voting 5 on everything, because that
makes you look like an idiot). On the other hand, youtubers probably aren't
too concerned with looking like an idiot...

------
bdmac97
Kinda always had a feeling that's what happens but nice to see graphical
proof! I feel good now about choosing +/- rating only for launchly.

~~~
chrischen
You still have to be careful with a +/- rating system, because for it to work
you need a constant number of viewers for every item. The more the actual
number of viewers deviates from the ideal constant then the more unfair the
voting is. An example would be a popular item getting more finite votes
because a larger sample of people were exposed to it. So a +/- system probably
must be expressed as a percentage to correct this problem. However I can see a
5 star system being improved by showing some sort of relative data too.

So I guess a +/- system may not necessarily be more advantageous. It all
depends on how carefully and correctly you interpret the data.

------
rsheridan6
It really doesn't matter very much. The main thing the rating system does is
allow you to avoid utterly crappy or misrepresented videos. I don't see what
difference it makes whether you're avoiding one and two star videos or videos
with too many thumbs down.

I don't think changing to a thumbs up or down system would affect my
experience of youtube on way or another.

------
chrischen
YouTube is slow to realize it... You don't know statistics to show that 5 star
ratings are innaccurate.

