Then trying to conclude that some convoluted scatter plot system makes more sense is laughable.
Not to mention, this system is still just a star rating system. This would be no different than having two histograms side by side.. assuming, of course, that you'd even want to rate different aspects of the same thing.
I can't even imagine scatter plots on amazon, or trying to convince the general public that "it makes more sense"
Their argument that historgrams are just.awful. (I didn't care for the extra periods) seems to have two components:
1. asserting that historgrams are bad
2. showing us 3 histograms and saying they tell you the same thing about all 3 movies, when in fact there is a very clear and important difference between the histograms.
Its completely obvious from inspection ("people are really good at seeing patterns") that Starship Troopers has a much lower percentage of 5 star ratings than the other two, and a much higher fraction of 0 star ratings. It also appears to me that the Fifth Element has a higher fraction of 4 or 5 star ratings, and is probably the most apprciated of the 3 films, although Blade Runner is fairly close.
If you are going to cherry a set of 3 specific films to make your point, you should be sure to at least pick 3 films that support your point instead of refuting it.
We then learn about their hypothesis that while 5-star rating system sucks, a system that relies on two correlated 5-star ratings is great. They demonstrate this by using the two question system to draw the exact same conclusion I drew from the histograms of the 1 question ratings.
I would've liked some sort of objective attempt to compare the two rating systems. Perhaps it would be possible to measure how frequently the two question system leads people to make a better choice than the one question system, or at least some sort of statistical wonkery that would purport to show me that the two question system in practice draws more distinctions than the one question system. Unfortunatley we only get this one rather uninspired example ("watch this if you’re in the mood for something really good").
They also didn't address why "would you re-watch this film" is a better choice than any other second question. There are attempts to justify it being a good question, but no real evidence that other questions were tried and didn't perform.
Finally, the thing that really irked me was that this proposed system doesn't seem to do anything to address most of the actual problems with the regular 5 star system, namely that people who feel really strongly about something are more likely to rate so that most ratings tend towards the extremes, and that without context we have no idea why someone rated something a 5 instead of a 1. Those problems now exist along two dimensions instead of one.
I see this less as an article and more as an advertisement hitching a ride on an xkcd comic.
Speaking as a film buff; this is actually quite a good guide to the sort of movie it is (when combined with quality). If lots of people mark it good quality, but wouldn't watch it again, that implies that you have to be in the right mood for it.
If people mark rewatchability high, even if the quality rating varies, you know it is much more easy going film.
And so on. Combining data points is good :)
Oh well I'll answer your question anyway. Because 90% of everything is crap (probably more than that on P2P sites). Re-watching a movie can be like walking in the same park more than once, looking at pleasant things with a sense of recognition. You could put a movie in to suit your mood. Sometimes you'd rather watch a good "original" movie again than watch yet another crappy remake or rip-off.
This is not about picking a movie to watch again after you have already seen it once: this is about picking a movie you haven't seen but that is so good that other people like to watch it again and again.
So rewatchability roughly translates to "emotional connection + quality".
If you've got kids, you know they'll just watch the same flashy animated movies over and over again, even if their opinion of the content is "meh".
What would probably happen is that family films, especially animated ones, would have skewed results.
Plus, they do not normally use online rating systems...
Yes it would, and the article shows why and how. Scatter plots are easy to read (for comp./math. educated people). Two histograms side by side are easy to read (and find correlations) for nobody.
Also, side-by-side histograms aren't the only way to display two parameters in a single histogram. What about stacked histograms? They scale up to an arbitrary number of parameters and everybody who can read a histogram can read a stacked histogram. Scatter plots seem like thermonuclear overkill for a problem which most movie sites seem to consider to be "solved".
Read the text of the article describing each movie. The correlations are absolutely meaningless. He doesn't hit on them at all. In fact, with comments like "almost nobody" he's specifically looking at averages. Then coming to a conclusion effectively based on the average of Score 1 and the average of Score 2 to determine what type of movie it is.
So on this Quality/Rewatchability grading system:
Starship Troopers: 3:4
The Fifth Element: 4:4
Blade Runner: 4.5:3.8
Drop that in place of the graphs and the conclusions would sound as seemingly valid.
It would be nice if the article actually compared different visualizations of the same data, rather than showing histograms of 2 separate data sets and scatter plots of a 3rd data set.
A scatterplot conveys objectively more information.
Actually it is fundamentally different as the histograms show aggregate data and scatterplots show individual data points.
As far as reading the distribution, two separate histograms are definitely more readable and understandable in order to understand the distribution. Adding the complexity of a scatter plot because it also shows a correlation that people are not actually interested in makes things less understandable.
And when I am looking at user rankings for a movie, almost by definition I am only concerned with rankings for movies I haven't yet seen, since I already have an internal self-ranking for a movie I've seen already.
Histograms are extremely useful for knowing the spread of rankings, which the scatter plot also illuminates.
It looks like they would still give a good picture of the data, as the trends are mostly linear.
If you gave someone those 3 histograms and those 3 scatterplots, I bet they could match them up correctly.
Aside- The scatterplots are an awful user interface because of the cognitive effort to interpret them, but perhaps there's a way to present the same information in a usable way.
Star scores are an attempt to map a qualitative experience (enjoyment of the film) with some quantitative measure. Which is fine if you just want to get a sense of 'how much' somebody liked a something. If I say I give scotch A a 5 scotch B a 3 and scotch C a 4 then you know that I like the scotch's in A, C, B order. It's a short hand way to express my personal ordering of qualitative experience, just like we use the words: 'good', 'better', 'best'.
The problem is this data is not really numerical, so even basic mathematical operations don't make any sense. When we add 2 heights, 2 masses, 2 speeds etc the result makes sense. But not so with ratings. Even basic difference doesn't make sense, is the difference between 5 and 4 stars the same as between 4 and 3 stars? There is no 'unit' distance in scoring system. So doing any sort of averaging is just going to give you nearly meaningless results.
This implies that the arithmetic mean is a broken concept, however the _median_ should still survive intact. I thought about ways to implement this in Software Center, however I'm still not quite sure what a good algorithm for ordinal rating data would look like.
Please feel free to post ideas on this stackexchange question: http://stats.stackexchange.com/questions/19115/how-do-i-sort...
A store that is also a movie theatre could do away with numeric representations by just watching what the users are doing with it's content. Things like "did they finish watching the movie?" or "did they get through the whole thing in one sitting?" could be helpful. Not to mention you can actually see if those titles were being watched again or not.
But there's still the problem of how to communicate the findings to the user, or formulate them.
The first time I really noticed the problem was when I published my own Flash game on Kongregate and started paying closer attention to the ratings. That led me to examine my own rating habits and I conjectured that is probably what happens to everyone else.
The bias I'm talking about is caused by the fact that most people can't be bothered to rate something. Most people only rate something when there's a powerful impulse to do so, so most of the votes will be 5 stars or 1 star. The 4-star ratings come from people who liked something enough to be moved to rate it, but not enough to gush about it; note that the group of people who makes that distinction is already substantially smaller than the 5- and 1-star reviewers. The rest comes from a very small minority, most of whom are people who didn't have anything better to do at that moment and decided to spend some time rating, but don't do it on the regular basis.
By the way, I realize that this is just a conjecture, but from what I've seen so far, it seems to be pretty accurate.
I think that introducing an additional axis will only exacerbate this, by raising the bar for rating. If the act of rating starts demanding more effort, you'll get a distribution that is even more skewed than now.
The two improvements I would like to see are:
1. a system that infers ratings from users' actions
2. better mechanisms for gauging the relevance of someone's review/rating based on my preferences/tastes
The first would help reduce the bias and the second would help me extract more useful information from the biased dataset.
As a benefit, I've noticed I can usually find the best reviews on Amazon by looking for 3 star ratings, and to a lesser extend 4 and 2 stars. People who rate a 3 have looked at pros and cons of the product, and generally compare to similar goods. 3 star reviews usually provide FAR more information than glowing or glowering 4 and 1's.
might in this context be paraphrased to "Who is rating the raters?" The hope in any online rating system is that enough people will come forward to rate something that you care about so that the people who have crazy opinions will be mere outliers among the majority of raters who share your well informed opinions. But how do you ever know that when you see an online rating of something that you haven't personally experienced?
Amazon has had star ratings for a long time. I largely ignore them. I read the reviews. For mathematics books (the thing I shop for the most on Amazon), I look for people writing reviews who have read other good mathematics books and who compare the book I don't know to books I do know. If an undergraduate student whines, "This book is really hard, and does a poor job of explaining the subject" while a mathematics professor says, "This book is more rigorous than most other treatments of the subject," I am likely to conclude that the book is a good book, ESPECIALLY if I can find comments about it being a good treatment of the subject on websites that review several titles at once, as for example websites that advise self-learners on how to study mathematics.
The problem with any commercial website with ratings (Amazon, Yelp, etc., etc.) is that there is HUGE incentive to game the ratings. Authors post bad ratings for books by other authors. The mother and sister and cousins of a restaurant owner post great ratings for their relative's restaurant, and lousy ratings for competing restaurants. I usually have no idea what bias enters into an online rating. So I try to look for the written descriptions of the good or service being sold, and I try to look for signals that the rater isn't just making things up and really knows what the competing offerings are like. When I am shopping for something, I ask my friends (via Facebook, often enough) for their personal recommendations of whatever I am shopping for. Online ratings are hopelessly broken, because of lack of authentication of the basis of knowledge of the raters, so minor details of dimensions of rating or of data display are of little consequence for improving online ratings.
While I agree that this is a problem, I think a bigger problem is a simple matter of scale:
Amazon is huge, and many people buy things, but they don't split reviews and ratings by what kind of person is rating them. If they wanted to make an improvement, why not show me only ratings and reviews by people who are similar to me? They have tons of data about me and other people who use the service, so it should be possible for them to say "people like you rated this on average a 4, but everyone in the world rates it an average of 2.5."
That's much easier than having to read all the reviews and decide if the person is in my demographic or whether I agree with their review.
Netflix does. By cross referencing your likes and dislikes against those of your fellow Netflix members, the company is able to create a meta rating system, in which the score you see for a movie is your own. You see that score because that's how much Netflix thinks you'll like it, based on how similar people liked it.
This is the only good way of going about this method. The trick is, it's easy to do this with movies, but much more difficult with product ratings and the like. Maybe this is an opportunity for someone to build something on top of Facebook or Amazon.
It works exceptionally well. You just listen to the stream of incoming songs, you never pick songs yourself. After a good song you click like, after a mediocre song you keep listening, during a bad song you click skip. After a few days you will only get good songs! (with a few exceptions of course) It's like magic, i can't even count how many new bands i found through pandora without any effort.
Too bad it doesn't work outside US anymore. :(
Once you realize that people have different tastes and you know someone's preferences that is the obvious solution. Or is the process of crawling through that much statistical data that expensive that it can only be offered to paying subscribers?
I think Amazon's "Was this rating helpful (Yes/No)?" provides a good filter for ratings. A lot of mindlessly negative reviews get filtered out by the users who come along afterwards and rate the rating in their own self-interest.
Indeed, my suspicion is that organizational politics have more to do with the lack of a better rating system than any technical limitation.
The approach I use is to read 3 star ratings first before biasing myself with the more extreme ratings. I also check to see what else the reviewer has rated and if there's nothing there then I immediately dismiss the review.
N people rated this better than X movie, but less than Y movie.
Ranking movies can be easy. Show 5 movie posters instead of 5 stars or have an auto-complete field for this movie is up there with:.
I've often thought about some start-up ideas around relative ratings, and this book was the reason
 - http://danariely.com/
However, the type of rating mentioned in the OP and the type of rating on Netflix only seem to work in specific niches. I can't imagine how a website like Amazon would implement anything even close to what Goodfilms is doing.
When you first get on the website, you're asked to rate a number of products. The more you rate, the more accurate the solution becomes.
When it can couple your test with users having the same test, you know see these users ratings in priority for new products. You can even ask them why they disliked/liked something if they didn't write a review. Because their opinion matters to you now.
While words do express the point better, this rating system is a step in the right direction.
E.g. when I go to Amazon I don't buy some random product with a 4.5 star review -- I search for a specific product or a specific kind of product and then reject candidates which are lousy. How is that not INCREDIBLY useful? Similarly, who goes to a movie simply based on whether it's good or not.
In general, if you create any point rating system people who like a thing will tend to rate it towards the top of the scale, e.g. 4/5 or 9/10.
I actually did an informal experiment -- I used to run role-playing tournaments, and do exit surveys on participants. For the first few years we asked players to rate us on a 5-point scale and scored slightly over 4/5 on average. Then we switched to a 10-point scale and scored slightly over 9/10. Not scientific -- but I don't think we suddenly got better.
This finding is backed up by serious research (which is why when a psychologist creates a scale, the numerical ranges need to stay constant in follow-up studies or the results are not statistically comparable).
Netflix, which tries to give users customized ratings, actually subtracts value (in my opinion) from its scores because it tries to make ratings mean "how much will you enjoy this?" BZZZT. I pick stuff for me, my wife, my au pair, and my kids. We don't all like the same stuff, and we don't want to track ratings individually. My kids want good kid stuff. I want good me stuff. Don't try to guess what I like based on our collective tastes.
Early on in the Netflix Challenge, I was able to get myself (very briefly) a leaderboard score with nothing more than analyzing every user's ratings; re-centering them by their mean, and re-scaling them according to their standard deviation. The by remembering their translations and scales, I could put a globally-predicted score back into their own language.
So just some very basic statistics is sufficient to erase much of the bias toward higher numbers, as well as halo effects and the like.
(I was pretty surprised that Netflix's own algorithm apparently wasn't doing anything this simple)
Netflix does have really interesting blind spots. They claim to take ratings seriously, to the point of offering a million dollars for the best rating algorithm. Then, as the GP says, they implement the rating algorithm in a way that renders it completely worthless to any household with more than one viewer.
Netflix does offer us a good demonstration of the failings of absolute technocracy, but it leaves the question of how best to rate movies wide open.
Considering how many Amazon 1-star reviews I find that can be summed up as, "UPS sucked," averages are kinda useless.
If you actually have friends, why don't you ask them for recommendation in person. If your friend is really into arty movies and recommends you an arty movie as being very arty (and well done) you can consider it. Collapsing it into a single number doesn't make sense </rant>
EDIT: that's not to say that the scatter plot isn't an interesting idea, it's just not going to help much because people's background is important for rating
Criticker's rating system is out of 100 points but for each user it scales ratings into tiers (deciles) 1-10. So for someone like me who watches lots of movies that I sort of know I'm gonna like (thanks to Criticker!), most of my ratings end up in the 70 to 100 range, but I still have 5 tiers in that range. The wide range allows the system to adapt to a user's biased view of the scale. Also plenty of users simply keep their rankings from 0-10.
Criticker gives recommendations in two ways. First it predicts my ranking for a movie. So I can just browse unwatched movies and filter them however I like and then sort by how Criticker expects I will rate them. It is actually scary how predictable I am.
The other method of recommendations is to browse users who have very high correlation to my rankings and see what movies they've ranked highly which I have not seen. This might be the best way to find movies. It also seems to be the key to how the expected ratings I mentioned above are computed.
No doubt one of the things that keeps Criticker running so well is a community of serious film buffs. It makes it easy to find movies I would have never heard of otherwise (foreign, limited release, shorts).
A butterfly flaps its wings, xkcd puts up a comic on ratings, someone piggybacks on the comic, it makes the front page of HN, you wander by and mention criticker, a bunch of geeks pile onto the site to check it out... and it ends up crashy for a while.
Cool site, thanks for mentioning it. From what I saw before it went down (too many mysql connections?), it even looks like I can export my ratings.
The more a histogram resembles an exponential increase, the better it is. The higher the exponent, the better.
Having a two-dimensional graph might have more information, if the dimensions really matter. I'm doubtful that "stars" and "rewatchable" are really independent, and I'm unsure why I would care about it when I haven't seen the film. (If I have seen the film, I'll have my own opinion and not need the graph.)
I'm all for looking for improvements to the ratings game, though. What seems to work best for me is to actually read the reviews, but that's obviously time-intensive.
It's very accessible for an academic paper.
If you are looking at electronic devices or camera lenses there's the issue that a certain fraction of people get lemons. Some bad reviews are because of that.
Other people have unrealistic expectations of the product and give a bad review.
A histogram gives some immediate insight into this problem, and then looking at stratified samples of the reviews helps there on out.
Now, I will say the star ratings on Ebay are weak because of the fact that a less-than-perfect ranking gets people in trouble. Although "acceptable" performance on Ebay goes a considerable range (It's certainly a worse experience to have a long confused exchange with somebody with poor english -- this person shouldn't be punished, but they shouldn't be rewarded either.)
Edited to add: "If you're going to get any useful info, you have to read the reviews," is so incredibly true. I'm surprised it's not getting more mentions in these comments. The scatter plot is kind of cool, but I'd so much rather have a histogram and actual reviews to check so I can find out why the product got those ratings.
"How not to sort by average rating" (2009): https://news.ycombinator.com/item?id=3792627 For thumbs-up/thumbs-down systems, suggests using the lower bound of a Wilson confidence interval for a Bernoulli distribution, which is what Reddit does now. Convincingly refuted by How to Count Thumb-Ups and Thumb-Downs: User-Rating based Ranking of Items from an Axiomatic Perspective, http://www.dcs.bbk.ac.uk/~dell/publications/dellzhang_ictir2... by Dell Zhang et al., which argues for simple smoothing with a Dirichlet prior (i.e. (upvotes + x) ÷ (upvotes + x + downvotes + y)), which was also suggested by several people in the comments.
In 2010, William Morgan wrote http://masanjin.net/blog/how-to-rank-products-based-on-user-... partly in response, applying Bayesian statistics to the problem of ranking things rated using 5-star rating systems.
Perhaps related: HotOrNot started out displaying the mean of the rankings as the rating of each photo (after you clicked on it). But they found that there was a gradual drift down in ratings: they started with around 1-5 (out of a theoretical max of 10), then ended up around 1-3, etc., with the predictable damaging effects on egos, people's willingness to post their photos, and the information content of the ratings. The solution they adopted was to display not the mean of ratings but the percentile: a photo rated higher than 76% of other photos would have its "average" displayed as "7.6", even if the mean was 4.5. This trained the users to flatten the histogram!
http://www.nashcoding.com/2011/10/28/hackernews-needs-honeyp... suggested that fake "products" to attract ratings could distinguish intelligent ratings from unintelligent ones. Although written about thumbs-up/down systems, it applies to multi-star systems as well.
IMO movie ratings should iterate on Amazon's powerful statement "People that bought this item also bought...". That is, one should look at people with similar tastes and see how those people have rated the movie.
Easier said than done as it needs a ton of data in order for it to work, but that's the only way you're going to get close to more personalized ratings.
There are very high quality, well made movies that I don't like, and there are some really crappy ones that I do. And that's a good distinction to see in a review system.
Because sometimes you just want to watch a good shitty movie, but it's really difficult to tell the good shitty movies from the bad shitty movies when The Brady Bunch movie (brilliant) has the same rating as any Adam Sandler movie (awful).
In my opinion, current ratings systems are 80% UX and 20% data.
For example, Newegg uses a pretty intuitive system of allowing you to sort a product page by Best Reviews and Most Reviews. In my opinion, this allows the user to make a more educated decision if they seek the information out.
If there is a written review component, make a note of the review but don't quantify the value of said review until the minimum threshold is reached.
Probably not very scientific though.
5 stars - OMG I LOVE EVERY PRODUCT
4 stars - Love this product, but I am withholding one star because of _____
3 stars - Everything to me is just meh.
2 stars - I hate everything but this product earned 1 star for ___ and another for ____.
1 star - UPS drop-kicked my item and it arrived late, so this product is trash!
I'd rather a boolean system than one where someone's 4-star rating is different than my 4-star rating. Whenever I see a multi-star rating system, I remember back to a prof I once had that said "The top grade is B+. A's are reserved for God." Albeit disgusting, it taught me that everyone has a different rating scale.
But ratings are a tricky issue and I think they require a more sophisticated mathematical treatment and modeling if one wants to get it right, not just a few histograms that treat all people equal.
There are a few modeling challenges that come to mind: For example, people disagree on quality of movies based on their taste. This could be modeled as a latent variable that must be inferred for every person in some graphical model. Another example of a relevant variable would be person's rating habits: some people rate movies 5 or 1, some people have a gaussian rating centered at some value. These should be explicitly modeled and normalized. Every rating could ideally be used to make a stochastic gradient update to the weights of the network, and since we are dealing with very sparse data, strong priors and Bayesian treatment seems appropriate. Ratings could then be personalized through an inference process on the graph.
Has anyone heard of a more sophisticated model like this, or any efforts in this direction? I'd like to see more math, modeling and machine learning and less silly counting methods.
The classic is in "readers choice" reviews of restaurants or eateries. Fast-food franchises dominate? Why? Because the philosophy of such sites is often "majority rules", and the establishment (or brand) with the most votes wins. But there are far more McDonalds or Taco Bells than Jacks Cook Shacks or Trader Vics. Even when the quality of JCS or TV exceeds TB or MD, it's not going to be reflected in the ratings.
Adjustments such as taking a Likert (3-7 point scale) and adjusting reviews based on the number of reviewers, to give both the actual qualitative assessment, and the probable maximal review can help. This is how sites such as Reddit have adjusted their comments/submissions ratings.
The broader and more philosophical problem is that "quality" is not a one-dimensional attribute, interpretation of quality differs among individuals, and "fitness for purpose or task" should be considered when assessing quality as well. McDonalds may very well be appropriate when your goal is a quick, inexpensive meal on the run (a conclusion I'd differ with), while Trader Vics is where you'd head to impress the boss, date, in-laws, or client.
It's a tough problem. It's also one that sees a great many very poor proposed solutions.
The canonical example is the SICP ratings on Amazon: 3.5 average; 177 ratings, 96 five stars, 53 one stars.
Pick the MPEG1 versions. They are much heavier than the MPEG4 versions, but the text on the projected computer screen is at readable. IIRC, the MPEG4 are re-encoded versions of the MPEG1, which themselves were ripped from VHS.
But if you really want to dig into it, you have to consider all kinds of stuff like bimodal distribution of ratings (controversial items), rater quality/consistency, age or ratings, etc, etc.
It's really not as simple as you'd think!
Something I never see addressed: what you want to watch eventually rarely correlates with what you want to watch now. On the whole, you'll spend lots of time putting "good" movies on your queue, but when time comes to pick something and hit Play you'll pass over those and pick some recent release which engages your excitement now and will be long forgotten soon after.
The star rating system confuses this even more by relying entirely on people who will bother to rate something at all - a very different crowd than the "what's good?" and "thrill me now" mindsets.
Insofar as ratings exist, I focus on written 1-star reviews (movies, apps, products, whatever), looking for a subclass of "there really was a particular problem" comments.
I wonder if there's a measurable value for the system. I care about discovery, so I want a site that can recommend me movies that I wouldn't normally think about. How many more movies would this system recommend to me?
A lot of times, I don't care too much about accuracy as long as the system isn't too far off. This is simply because the cost of an inaccurate recommendation isn't too high when I can stream it on Netflix.
I like the idea of providing more data for people to make more accurate assessments, but I don't necessarily believe optimizing for accuracy optimizes the value provided to the user.
Or did I miss the point and rewatchability is just a placeholder for something more useful?
there’s a lot of disagreement over whether it’s high quality of not, but generally this scores high-rewatchability. So, maybe not the most intelligent movie, but good fun.
What I intuitively deduced from this example is that rewatchability is metric of enjoyability.
On a single 5 stars rating, some people will give 5 stars because they really really enjoyed the movie, some others will give 5 stars because they thought the film was perfect on a cinematrographic-quality (i.e. scenario, cinematography, acting, casting, etc. insert here some academy-award-technical-category) point of view.
Take a random sample of the true population of the data set (everyone that has seen a movie) and not just the people that logon to rate it.
A 2 axis system seems like a good idea. But I'd like to see it with 3 options per access - [UP] [INDIFFERENT] [DOWN].
I'm also interested to know how the system will cope with "controversial" films ( life of brian, for example) where some people are going to downvote whether they've seen it or not. And they'll campaign and ask all their friends to downvote too.
In their example, blade runner has sligtly more "meh" votes and starship troopers is mostly "meh"
I like the histograms as it reveals a little into the rating. After all, if a single person gave it 1 star and everyone else rated it 3 or more, the average is likely skewed because of that one person who is clearly "gaming" the system because they weren't happy.
I think ratings should take into account intent. If multiple people are rating it 1 star, then clearly it should be weighted downward. However, if a single person out of 100 people gave it 1 star, I don't think the average should be weighted evenly. It's a difficult problem to solve and XKCD is just making a joke.
I find it hard to rely on aggregated ratings for that reason.
When it came to picking movies to watch, I used to love watching Siskel and Ebert, because I knew their tastes.
If only Siskel (whose tastes were more like mine than Ebert's) gave a thumbs up, I knew there was a pretty good chance I'd at least think the movie was "ok". On the other hand, I'd be less likely to give a movie a chance if only Ebert gave a thumbs up.
These days, what I have to do is go to Rotten Tomatoes and take a sampling of four or five reviewers that I trust/like (which actually includes Roger Ebert and a few of the people he used to have as guest reviewers on Ebert & Roeper) and base my decision on that.
The goal of a recommendation system should be to expose me to things I wouldn't be likely to find by myself.
And frankly who has ever mentally rated a film in terms of "re-watchableness".? People just think in terms of of "good" or "bad" and current ratings systems a la Amazon leverage that. It's simple, fast and given the histogram presentation tells me everything I need to know about the number and distribution of votes in a flash. Plus whether I want to rewatch a film or re-read a book is largely down to my mood at the time. But my opinion on whether it's "good" or "bad" is pretty static.
Maybe Amazon's system is not statistically bullet-proof, but who cares? We're talking movies here: a cheap, casual and discretionary purcahse.
I think there was an article about that a while ago on HN.
Im not sure if the success rate was any better or worse than the online star rating system these days, but it seemed more fun. However, the barrier to trying something else was also a lot higher if you made a poor choice, which might have had a side effect of narrowing one's tastes.
One of the most interesting features in Pandora is the "Why was this track selected?" action. Imagine something similar where a list of movies and TV shows are presented to you, with sentences for each as to why.
Netflix's recommendations were close, but they still seemed to always focus on one facet at a time, be it a user-predicted rating or a single subcategory of related shows.
Edit: Goodfilms seems to be better in the it tracks two facets at the same time, which does end up creating a diagonal scale from super funny movies you can wath again and again to super serious ones you'll watch once, but that's still not quite like filtering down on tons of facets at the same time.
The closest thing I can think of is the metadata from TV Tropes.
That being said I'm not a fan of the goodfil.ms graph at all. Why do I care about how rewatchable someone else thinks a movie is? Do I want to watch it again? I would know because I'd have seen it.
I think rewatchable is the wrong term. Fun is what you're looking for. Major Payne is a fun movie. It is incredibly rewatchable but I don't need a website to tell me that. I could use a website to point me to it if I haven't seen it before.
I think the site is on the fringe of something important though. There are multiple ways to rate a movie, and mood plays a huge role in what you want to watch at any given moment.
Thus is born a system in which everyone is happy. The site will get more ratings because people want the algorithm to be better and it needs more data for that.
The movies will be happy because this points to legal ways to find movies.
We'll all be happy because it's a complex solution to a complex problem and yet it can still be solved in an elegant and visually stimulating way. (I.E. you could color code the reviewers who are more likely to be similar to yours)
"We rate movies on two criteria - ‘quality’ and ‘rewatchability’, so you can admit to your guilty pleasures and properly capture the feeling you get when a film leaves you exhausted."
You are using rewatchability to infer some potentially helpful labels ("guilty fun", "exhausting but worthy"). But there's no guarantee that those inferences are safe/generalisable across viewers/films/genres, or that people who see a rewatchable axis will know to interpret it like you do...
Yes. People don't have the attention span to independently analyze 5 different scatter graphs of 5 similar products. Sometimes the scatter plots can be actually more confusing and less informing than something simple like a histogram.
I firmly believe that people's attention spans are more captured by things like star ratings and histograms. If they get past the stage of their interest being captured, THEN they read the reviews to find out more in-depth information and opinions. I think it's a system that works well, as Amazon has shown.
Ratings are about post-choice satisfaction, not about pre-choice decision making.
Google Tech Talk: http://www.youtube.com/watch?v=Yn7e0J9m6rE
See also: "YouTube: Five Stars Dominate Ratings" (http://youtube-global.blogspot.com/2009/09/five-stars-domina...)
Examples of movies I would highly rate but would not want to rewatch: Schindler's List (1993), Hotel Rwanda (2004), Blindness (2008), Amistad (1997), etc...
(Translation of the emotions used: happy, relaxing, surprising, aggressive, sad, explosive)
I think the solution is knowing who is rating what you're looking at. In the old sense - "Quality over quanitty." I don't need to know what the whole world gave it, just a few people who I've come to trust. We're trying to do this here: www.criticrania.com.
I think this is the only real way around this problem.
What's especially difficult with the scatter plot is that it requires you to assess density, rather than a simple scalar value. The other histograms have 5 numbers to indicate "weight" for each star, and a bar next to the star visually indicates the proportion the ratings received for that star. For their scatter plot, if there are 1000 ratings, how will it look different from a film with 100 ratings? The relative proportions of the ratings will only get muddied with the scatter plot approach.
The other thing about the scatter plot is that it still essentially maps to a 5 star rating, but only makes it more difficult to asses which star. That is, we are expected to visually assess: [1 star] - greatest density in lower left quadrant, [2 star] - density greatest in middle of graph, [3 star] - greatest density in lower right quadrant, [4 star] - greatest density in upper left quadrant, [5 star] - greatest density in upper right quadrant. There are only 5 useful density assessments which brings us back to the same categories as the 5 star system. Only in the scatter plot, its much much much more difficult to assess which quadrant (star) the ratings map to. And really, what is the meaningful difference between the 2, 3 and 4 stars (in my example)? Those density groupings seem almost equivalent (or some might argue). So in reality, the scatter plot will really only be meaningful if there is very little deviation between quality and re-watchability (which isn't true), which will help to group the ratings making density easier to assess. If they diverge frequently, then the plots are just going to be ignored by users since they'll have to assess density in every plot to try and make sense of it. That's hard. That's work. Users don't like to do work.
Finally, re-watchability? The question on a user's mind is, "would I want to watch this movie?" not, "if I saw this movie, would I want to watch it again?" I rarely watch movies again. Even the ones I love and own. That seems to be true of most people. The reasons for wanting to re-watch a movie are unrelated to whether it would be worth watching the first time. I'd argue that re-watching a film is more of a personality type than anything related to a movie.
They are watched without pausing, skipping, rewinding, from beginning to end.
If it's not true in your case then you're not a reliable rating source.
A rating based on this can be fully automated.
No need to depend entirely on this one rating as it could be used for weighting a users subjective rating. E.g. if the film was watch over 3 evenings, then your 5 star rating is worth 1 star.
We're curious to see how people respond to rating/giving feedback when it doesn't show up in their social media stream . Will it be more honest because it is direct feedback?
This post has sparked some new ideas for me on developing a new rating system beyond nero, stars, etc.
Ratings systems are worse - they are self-selective, so most adequately-satisfied folks will not take part.
A better system than both, would be to somehow extract ratings from folks behavior. Did they read the whole article? How long did they dwell on each part? Did they return to it more than once? Stuff like that.
They just made a nice blog post to buzz around.
But I went past through that and played a bit with their app, until I found out that everything you did was posted on your timeline by default.
I erased the application from my facebook.
And the comic is also explained now. Thanks.
If the movie is good, I would like to watch it again. What is the difference?
The acting, directing, cinematography, pacing, and sound design are all excellent. However, the film is such a grueling emotional experience, I don't foresee myself sitting down with it again.
Contrast that to, say, Airplane. It's a good movie. Funny. If I'm bored in a hotel room, I'll watch it again. However, it is emphatically not a "great film."
> Why would I care about rewatchability?
Speaking as a film buff; this is actually quite a good guide to the sort of movie it is (when combined with quality). If lots of people mark it good quality, but wouldn't watch it again, that implies that you have to be in the right mood for it.
How do I glean from this wither of not I will like the movie?
A High quality movie can be terrible and a low quality movie can be great. What does quality mean? Does that mean they had good special effects and angles? Does that mean the color is true and the acting was great? How does this equate to me liking the movie?
Rewatchability? Many of my favorite movies I would not watch again (Lord of the rings) because they were so long. There are also so many movies that I want to see in the future that I will choose to watch one of them instead of re-watching one that I have already watched. Re-watching is something that people do less and less as they get older (kids and teenagers maybe do it) but adults (your target audience) not so much.
Also those scatter charts mean the same to my brain as the histograms that you are blasting so I would stick with what people are already used to (the histograms). You are not doing anyone any favors by changing the presentation of the same data. People are not stupid. They will see both equally in most cases but they will prefer the familiarity of the histograms.
Trying to be different is not always the best thing to do. As many have already mentioned. Use machine learning to augment ratings (like Netflix).