
Is That Review a Fake?  - cwan
http://www.nytimes.com/interactive/2011/08/20/business/20110820-is-that-review-a-fake.html?src=tp
======
nazgulnarsil
Fake reputation is an interesting problem space for libertarians to grapple
with since reputation is their solution to so many problems. One of my ideas
in this space is a bitcoin like system that maintains a public ledger of
contracts you've entered into and completed as verified by a variety of
sources. People could supply their public key allowing you to check their
reputation history and sign contracts with their private key.

~~~
spottiness
We are experimenting with reputation derived from complete anonymity since we
believe that anonymity is required to maximize sincerity. In our current
stage, we're manually moderating all the content generated, rejecting what
looks fake based on common sense and intuition. If we ever generate traffic
above a certain threshold then we have a mechanism where the content is
validated by the creators themselves. Basically, the writers of opinions
("spots" for us) will have to evaluate other opinions before they can post
their own, and a decision is made based on the number of coincidences among
all the reviewers of a specific posting. We got the idea from previous work by
CMU's Luis Von Ahn, in particular his (now Google's) ESP game.

These are examples of anonymous reviews about hotels that we have in our site:

A positive one => <http://www.spottiness.com/spots/BHBZ8QJT>

A negative one => <http://www.spottiness.com/spots/RKTPXLJJ>

Interestingly, they don't have the strong deceptive indicators...

~~~
watmough
"We are experimenting with reputation derived from complete anonymity since we
believe that anonymity is required to maximize sincerity. "

Interesting. Where do "real name" Amazon.com reviews fit in? These make a
selling point of being attributable to real people, and to me, imply
sincerity, since often these reviewers seem to write reviews almost as a
hobby, and often make a point of covering both good and bad aspects of a
product.

There's a interesting dynamic here, since Amazon is vanishingly unlikely to
harass you on the web, unlike say an ebay seller, who might well come after
you if you leave anything other than a perfect review. In this case you are
likely to be anonymous as far as everyone but the seller is concerned, yet
being sincere may carry some risk to your own ebay account.

~~~
spottiness
Amazon is a good example where "real name" reviews cause a bias towards
positive reviews. I believe that most people are reluctant to write negative
reviews if their real names are associated with it, even if the fairness of
the review is not in question. Negative reviews always put a negative halo on
the reviewers, so good reviews are overwhelmingly more common. Curiously in
our site, where anonymity is a requirement, also the positive reviews dominate
by far, which is indicating that there's much more good than bad in the world.
Cool!

~~~
marze
"in our site, where anonymity is a requirement, also the positive reviews
dominate by far, which is indicating that there's much more good than bad in
the world"

Or indicates a lot of fake reviews or something in between.

~~~
spottiness
You mean a lot of "positive" fake reviews, in which case people tend to fake
positive opinions much more than negative ones. Definitely better than the
opposite...

------
JonnieCache
Direct link to the paper: <http://aclweb.org/anthology/P/P11/P11-1032.pdf>

~~~
tansey
Interesting. To summarize their methodology:

\- Take the top 20 hotels from trip advisor

\- Filter out all non-five star reviews, plus any non-english, excessively
short, or first-time reviews. Then sample (via a log normal distribution on
review length) 20 reviews per hotel. This is their "real" dataset.

\- Use Mechanical Turk to collect 400 reviews for these hotels. Turkers are
instructed to pretend they work for the marketing dept of the hotel and are to
write deceptively fake reviews. Again, quality filters on length, user
approval rating, and deduplication are applied. Turkers are paid $1 per
review. This creates their "fake" dataset.

I suppose one could still argue that there are selection bias issues here. The
sample size is also moderate. Nevertheless, it's a novel approach and you have
to start somewhere. Interesting work.

~~~
patio11
It seems like writing a naive Bayesean classifier for "Was this written by a
Turker" should be like taking candy from a baby who hates candy, has very
slippery fingers, and is unconscious.

~~~
crdoconnor
If it was that easy, why did their human judges fail at it?

~~~
socratic
Does the experimental setup for the human judges sound fair to you?

For example, the naive Bayes classifier knows the a priori distribution of
review spam (which appears to be held to 50%), but do the undergraduate human
judges? It would appear not, given that one judge only labeled 12% deceptive.

Likewise, were the human judges able to see examples of truthful and deceptive
reviews before beginning the task? (In other words, are the human judges
solving a different problem, e.g., "deception detection", than the classifier
e.g., "similarity to prior deceptive reviews from Turkers").

If these are differences between the human and computer annotator setups, are
they major differences? Can you spot any other big differences between the two
experimental setups?

------
cpeterso
I usually find the low-star reviews more informative. What can the 5-star
reviews do other than affirm the quality claims of the reviewed
book/movie/hotel/restaurant? I read 1-star reviews for a laugh and 2- and
3-star reviews to be informed by reviewers who care enough to write a review
but whose opinions are not extreme.

~~~
hugh3
Indeed. While there's always _some_ one-star reviews no matter how good the
thing being reviewed might be, the best guide is just to read them and see
whether they sound like rational complaints from sensible people. If the one-
star reviews say things like "the checkin girl rolled her eyes at us the third
time we asked for our room to be changed to one with a better view" then it's
probably an alright hotel. If they say "filthy, noisy, unsanitary" then I'm
probably not gonna stay there.

------
tathagatadg
It might be interesting that at UIC, we do a data mining course research
project on this topic - <http://www.cs.uic.edu/~liub/#projects>. We used
resellerratings.com and apart from a handpicked, there is no way of
determining a pattern. There is no luxury of mechanical turks for a course
project, and we are strictly forbidden from writing any reviews. I had to use
the #of reviews, variance in similarity of the review text, time interval
between posts, user since date, helpful review count, average
rating(user/review/store),etc. and calculated the Mahalonobis
distance[<http://en.wikipedia.org/wiki/Mahalanobis_distance>] to separate
outliers as the spammers. Then with these as labeled spammers, used graph
based semi-supervised learning to classify the reviewers as spammers. Its a
wild open problem - and very easy to argue on both sides of any method :D

------
amcintyre
If I was writing a real review, it might look a lot like the one marked up
with all the "deceptive" indicators, with the exception that I'm not a big
exclamation mark user.

The grumpy old man in me wants to suggest that anyone who actually learned how
to write would be marked as a fake. Is well-written English so hard to come by
nowadays that it makes people suspicious when they see it online?

(Edited to add: apologies if I'm saying something that's covered in the
article--couldn't get past the paywall.)

~~~
billswift
That was similar to my impression, that is not the way I write, but most
normal people's reviews would trigger the "deceptive indicators" presented on
the linked page.

------
bmac27
As these kinds of issues continue to bring into question the reputation of
(mostly) anonymous review sites, the reliance on reviews from your social
graph (I.E. folks you presumably know & trust) could continue to rise,
particularly if the search engines remain unable to differentiate between the
spammers and legit reviewers.

No wonder Google proactively sought out that kid who worked on the paper.

~~~
skimbrel
Yelp has had this for a while: when you visit a business page, the first
reviews they show you are ones from your friends, if available. It's a good
way to make sure you're not seeing spam reviews right off the bat, but it does
rely on you actually engaging with Yelp's social features.

~~~
jjb123
"while", "friends", "good", "you're", "you"... yelp spammer!

------
chaz
The related parent article is worth a read as well: "In a Race to Out-Rave,
5-Star Web Reviews Go for $5."
([http://www.nytimes.com/2011/08/20/technology/finding-fake-
re...](http://www.nytimes.com/2011/08/20/technology/finding-fake-reviews-
online.html))

------
kleiba
The interesting point lies in how they construct their training data. In
addition to producing fake reviews via mechanical turk themselves, they use a
complex procedure to retrieve actual positive reviews from tripadvisor.

...which is kind of funny if you think about it: to develop a classifier that
can identify real reviews, the first thing they do is _create a classifier
that can identify real reviews_ to produce a training corpus.

Then they try to approximate the output of their first (rule-based) classifier
with a machine learning classifier.

------
vaksel
personally I focus on negative reviews.

But then you have to grapple with whether or not the bad review is true or
not(i.e a competitor posting it). In those cases it's best to focus on places
where the reviewer's reputation can be checked.

i.e. all those review sites that let anyone review are more or less worthless.
But a blog post by someone with hundreds of posts is a lot more likely to be a
real story.

------
radog
Shrewd (though likely just very lucky and timely) choice of topic. Looks like
they can either (i) take their talents to any one of 10-15 companies that
desperately need improvement in this area OR (ii) create a very promising
startup and perhaps get taken out in a talent acquisition.

------
marze
As businesses get more adept at gaming the online review process, the rewards
for a system capable of a high "honest review" fraction will increase.

------
kahawe
Sometimes I still cannot believe how prevalent fake reviews are.

Not too long ago a friend of mine contributed a chapter to a programming book
for one of the major publishers and they made sure to tell him to ask his
friends to write reviews for the book on amazon, down to detailed instructions
how to fake it "properly" and in accordance with how they were trying to
position it on the market - like saying that "it is great for beginners to
learn X" but one should also add a negative point about how this-and-that
chapter needed more clarification or more examples or images.

------
donnyg107
As if the bots wont adjust their syntax when the software becomes more
comprehensive. Silly Cornell, you're up against spammers, not actual robots.

~~~
Vadoff
If you read the article, it's a system to detect fake reviews in general -
there's no mention of bots or spammers at all. By looking at their methodology
it's clear that it targets people who write fake reviews, not bots.

Also, it's not as if every spammer in the world is aware of this software and
will adjust to it accordingly. In fact, I'd say the vast majority wouldn't
even be aware.

