
Ratings inflation on Uber and elsewhere - imartin2k
https://qz.com/1244155/good-luck-leaving-your-uber-driver-less-than-five-stars/
======
manigandham
The problem is that a 5-star rating system is almost never a good fit. The
stars are meaningless and it ends up coalescing into some tiny fractional
scale.

My uber ride either gets me to my destination safely, reliably, and timely, or
it doesn't. It's just a yes or no, and should be rated as such. This would
smooth out the subjective reviews and give drivers a much more fair rating.
Support requests can take care of any specific complaints. Likewise tips can
do the same for positive signals.

~~~
vitorgrs
I guess this would be a nice solution. Yes or no, and then create a
percentage. Just like Netflix and some services.

So let's say, if you have less than 30% or something like that, you get out of
Uber. And if you have more than 85%, you can be part of Uber X VIP (at least
here in Brazil, if you use a lot Uber, you earn one month of Uber X VIP, so
just works with drivers with top ratings. If you use a lot again, then you'll
earn another month and so on)

~~~
polskibus
What if it didn't get you on time because of you or because of an external
event that couldn't have been planned for? Life is not black and white.

~~~
jdmichal
That's why it's not a no-tolerance policy. But, you will probably find that
some people tend to "attract" such external events. Because they don't leave
enough time to account for _any_ externalities. Or, in this particular case,
maybe they actually are a bad driver that does not know routes good enough to
avoid anticipated traffic patterns. And so on...

If Uber really cares about _why_ the service was bad, they can always break
out multiple yes-no questions. (They probably don't, though.)

------
V-2
The thing that annoys me with Uber is that one common (at least where I'm
located) misbehaviour doesn't get penalized with their rating system at all...
You find a ride, but the guy doesn't even start the car in 10 minutes. Then
they simply cancel it, and now you're late.

Besides, I don't want to be connected again to someone I ranked negatively
once (even if other people gave them 5 stars). I don't think such blacklisting
- even in mild form, eg. with some sort of expiration period - is implemented.
[EDIT: apparently it is, although the app never explicitly states that].

This is to say that inflation isn't the only, or even main inefficiency of
Uber's system; from my perspective.

~~~
estsauver
This is an ENORMOUS problem in Kenya. Many drivers do this to "protest" Uber
as well. Some of the drivers will accept the fare and then just not come if
you're paying by credit card.

I would happily pay the minimum fare to give these drivers a 1 star review.

~~~
amelius
I think it is only fair. These drivers had reasonably good jobs until Uber
came along and took part of their income.

~~~
V-2
So it's like protesting against eBay by tricking ordinary people into putting
orders you have no intention of ever shipping?

Just because someone has a beef with Uber doesn't mean it's a good excuse for
making me late by trolling me into a transaction they are not willing to
fulfill to begin with. I'm not Uber. It's like getting a job at McDonald's and
sabotaging from the inside (because they made your hot-dog stand go bust).

If that's the level of social ethics they demonstrated at their prior
reasonably good jobs, I'm not too surprised about their losing them.

~~~
amelius
I don't think that's reasonable. Stuff like this (delaying people) happens
with strikes all the time. Besides, Uber has a history of ignoring laws and
regulations themselves.

These people are being sent into a race to the bottom by an outsider. An
outsider who just happened to have attracted a big pile of money in a faraway
country.

~~~
V-2
> _Stuff like this (delaying people) happens with strikes all the time._

It's not the same thing at all. Strikes may delay people, and they do, but
they don't involve deception.

Strikes, especially, though not exclusively, transport strikes, are
_announced_ \- and announced in advance.

Even if - say - your local surgery strikes, surely they won't book you an
appointment, then trick you into sitting in their waiting room, while noone is
actually going to see you.

~~~
amelius
The comparison to surgeons is not adequate. We're talking here about people
with little savings, and little to no other job opportunities.

The rebellious acts of these people and the collateral inconveniences they
cause are rather small compared to the moral injustice that is imposed on
these people.

And yes, you, as a client of Uber are also guilty to some extent.

Perhaps ask yourself this question: what amount of money would be necessary to
put _me_ out of business? Imagine that someone stands up and brings together a
bunch of VCs and collects that money. Imagine that you have invested a lot in
your current business, and you have nowhere to go if you lost that business.
That's the situation we're talking about here.

~~~
V-2
> _The comparison to surgeons is not adequate_

I meant a surgery as in primary care physicians or dentists, not surgeons. It
could be a barber though, or whatever really. You're missing the point I laid
out explicitly - strikes aren't based on deception, and are _announced_.

And it's for a reason: to maximize the impact it has on the employer, while at
the same time minimizing it for the "guilty to some extent" average people.
Which is the exact opposite of the strategy you described.

~~~
toomuchtodo
Deception is required when you're fighting against an adversary (Uber) where
there is a drastic power imbalance.

Uber does not fight fare, so therefore, neither should those fighting against
them.

------
Legogris
Has anyone else observed that ratings inflation is even more extreme in the
USA than in Europe or Asia? This goes back to eBay, where "A+++ five stars"
basically means "I got what I ordered and it arrived in time" or "I didn't but
the seller resolved it reasonably and in time".

As others have stated, a single-dimension scale is difficult in Uber where it
compounds the basics (get to my destination in a safe and hassle-free way)
with the extra mile (some drivers are extra nice, offer you water and snacks
and let you play your music etc).

If everyone who fulfills the basics are supposed to get five stars, how do we
signal the extra mile except for tips?

~~~
slivym
Ask a Brit how their day is they'll say "Not bad" \- meaning everything is
going fine and they're happy. Ask a Californian how their day is they'll say
"Not bad" \- meaning something is seriously wrong, and you need to ask about
it.

Some cultures in America naturally tend to 'Fantastic' as their base line of
descriptive language. This probably translates to ratings too.

~~~
tauntz
Ask an Estonian the same question and they'll answer with "it's OK", "meh" or
they'll just shrug. Which means the same - everything is going fine and
they're happy :)

~~~
mantas
As a Lithuanian, I feel an urge to give a full story of my day when asked "how
is it going". Don't want to hear the long story? Ask if everything is OK..

------
hitekker
I once took an uber with a driver who was deaf and couldn't speak english. In
a normal situation, these two qualities were forgivable: but when he drove
away with my luggage, I struggled to even communicate, let alone chase him
down.

Being told in clipped, broken, heavily-accented sentences: "Go to Uber", "call
Uber", "it's with Uber". But Uber _where_ , exactly?

While I was calling all the nearby Uber locations, I kept thinking, "If this
guy got 4.6 stars, do the stars mean anything? At all?"

I would retrieve my stuff, finally, when his daughter took the phone
explaining that he accidentally took it and then dropped it off at his local
Uber... two-or-three cities away.

All in all, an unpleasant eye-opener.

~~~
tmalsburg2
What do you think we should learn from this anecdote (especially given that we
don't know what the driver's rating was)?

~~~
hitekker
>this guy got 4.6 stars, do the stars mean anything? At all?

~~~
tmalsburg2
Ah, I interpreted the "if" (missing in your quote) as a true conditional (not
implying that he actually did have 4.6). Thanks for clarifying. In my
experience, anything below 5 is potentially a red flag on Uber/Lyft.

------
raz32dust
This is extremely relevant and overlooked. Sharing economy relies heavily on
ratings for trust. That's literally the only thing that separates Uber from
Taxi services and AirBnb from hotel services, and why it is considered OK that
they don't have to go through the same licensing and vetting process. I am
surprised that the companies themselves are not as concerned about this. This
whole gig economy is going to fall apart real quick if this isn't brought
under control.

Edit: I think I see why the companies don't need to be too concerned. They
only need to remove the worst actors. The ones that could get them into media
(e.g, someone ends up with an Uber driver showing criminal tendencies). For
that, this system probably works. Anyone who didn't make you feel like your
life is at risk is a 5-star. This situation is unlikely to change unless there
is competitive pressure on these companies. So it is very important to prevent
these companies from monopolizing the sharing economy. Once older cabs and
hotels are gone, and these companies have monopoly, we could be left with no
option but to take an Uber/AirBnb with unreliable, terrible and probably even
dangerous "service" and no legal recourse.

~~~
icebraining
While I'm in full agreement on the need for more competition in these
industries - which thankfully seems to be happening - I don't think it's
important to prevent this particular issue.

Uber has no incentive to keep bad drivers around; after all, it costs them
little to fire them and onboard a new one, and there's plenty of people
willing to drive a car.

Besides, it's not such a big problem. What ends up happening is that they
raise the bar; e.g. an average of 4.6 becomes the equivalent of a 3. A friend
of mine has a house on AirBNB with a rating of 3.9, and it's essentially
invisible - you'll only find it if you set very specific filters.

Specifically, they can use percentile ranks to do the "translation":
[https://en.wikipedia.org/wiki/Percentile_rank](https://en.wikipedia.org/wiki/Percentile_rank)

------
aaronbrethorst
Much like how YouTube abandoned ratings in favor of a simple thumbs up/thumbs
down, I wonder if Uber and Lyft will do the same.

[https://daringfireball.net/linked/2017/03/18/youtube-
thumbs-...](https://daringfireball.net/linked/2017/03/18/youtube-thumbs-stars)

~~~
baxtr
Yeah same with NetFlix. To be honest though: I don’t get it (I’ve read the
Gruber piece btw). The reason, I don’t get it: now with all the thumbs up and
down, how am I as a customer supposed to make an informed decision? Before,
Netflix would suggest shows and I had a very strong (crowd based) filter
mechanism I could rely on. Now they take your feedback and feed their machine
and you have to trust them whenever choosing new stuff. It hasn’t improved my
experience yet. I start watching stuff and then quit frustrated after a short
time.

~~~
maxyme
In Netflix's case I think that's the point. A ton of their originals had
terrible ratings. The new system doesn't display ratings anymore so their
originals look better.

~~~
hobofan
> A ton of their originals had terrible ratings.

Maybe for you. The Netflix ratings were always a personalized rating (which is
explained in one popup on sign-up but never mentioned again). I think people
not understanding that was also one of the reasons they abandoned the star
ratings and changed the language to "90% match" in the transition.

------
V-2
It could be replaced with concrete questions. In case of Uber:

Was your driver late?

Was your driver polite?

Did you feel the way they drove was safe?

Did you have troubles communicating with the driver?

(Yes, no, kind of).

ASK FOR FACTS, NOT OPINIONS.

I understand customers can't be bothered to complete a full-blown
questionnaire, but the system could pick one or two questions at random after
every ride. It could even incentivize answering two instead of one, or three
instead of two (as in Google's "Local Guide").

Over time, data will accumulate.

~~~
bpicolo
> Was your driver late

This would be a poor question. Any perceived lateness is because Uber
underestimates the time-to-pickup.

In Manhattan they pretty consistently underestimate by 50% it feels like.

~~~
pimlottc
On the contrary, it's definitely not a poor question, it's very valuable
feedback -- for Uber. I agree it's not something that should be levied on the
driver, though.

~~~
bpicolo
Maybe? They definitely have exact data on how long it takes drivers to arrive.
But yeah, I'm referring to using it as a judgment of the driver.

I wouldn't even the least bit surprised if they already figured out that
people are more likely to take trips if they intentionally underestimate.

------
lsc
Even within an office, it's a general rule of office politics that when asked,
if you can say something good about someone without lying, you should do so,
and you should generally avoid saying bad things about other people.

I mean, there are exceptions, but at the individual contributor level, that's
a pretty good heuristic.

~~~
scrrr
IMHO this is a good comment. It explains why ratings tend to be inflated, but
I think it also indirectly asks in what kind of society we want to live.

(I think our profession is vulnerable to unadjusted people making the rules
for the rest.)

------
whack
This seems like something that can be corrected via normalization, at the per-
user level. Find the average rating (and variation) for each individual user.
Use these 2 metrics, in order to transform all user-supplied-ratings into
normalized-ratings. For example, if someone always gives a 5 rating, then
treat all their 5 ratings as 3s, and any sub-5 ratings as 1s. Conversely, if
someone mostly gives 2s, then any of their 3s/4s should be treated as 5s.

~~~
Steuard
I once had similar thoughts about, say, movie rating systems like IMDB. But
you can pretty quickly convince yourself that straightforward ways of
implementing this would open themselves up to horrific levels of system gaming
very easily: a user who gives everyone a 2 but occasionally gives their best
friend a 5 will wind up having that "5" normalized to some astronomically
great rating (which even by itself could heavily inflate the friend's
average). And every fix for that sort of thing that I came up with opened the
door to yet another exploitable flaw or failure to accurately reflect user
preferences.

It's a hard problem. And especially in cases like Uber, I don't think there's
a technical fix: as long as the company puts the cutoff for continued
employment above the second-highest rating, there's a high perceived moral
cost to giving a rating less than the maximum.

~~~
whack
_> a user who gives everyone a 2 but occasionally gives their best friend a 5
will wind up having that "5" normalized to some astronomically great rating_

You can just cap the max-value at a 5, to avoid someone getting a _"
astronomically great rating"_.

Rating normalization isn't meant to prevent rating-fraud, nor does it make the
problem significantly better/worse than it currently is. But it will solve the
problem of rating inflation.

Not to mention that once users know that their ratings are being normalized
anyway, they won't feel the need to give everyone a 5-star rating.

------
Camillo
This is probably also related to what happens with the senseless custom of
tipping. Does anybody actually spend time evaluating the service they got? Do
people carefully adjust the tipped amount to correspond to quality? Unless the
service was a real disaster (happens maybe once a year), I just give 20% all
the time, because I don't even want to think about it. Same thing with Lyft or
Uber, five stars and move on.

And note that tipping also saw inflation. If you read old newspapers, the
standard amount used to be 15%, and 10% before that. At least with stars it's
capped at five.

------
boubiyeah
My experience with tripadvisor:

4.5: Should be fine if there are many votes and the top ones don't look forged

4: average

3.5: Avoid at all cost

People are so bad at giving objective ratings. I often see people mostly rate
their friends and the good time they had: "had a very good evening with my
friends, blabla" Ok, what about the food, the service? seriously.

~~~
dazc
Also bear in mind, it's almost impossible to give a rating lower than 3 on a
lot of these sites (not sure about TA, specifically though?).

~~~
gruez
why?

~~~
dazc
Last time I tried there wasn't an option to give an overall review, it was
split into stuff like:

Cleanliness: 1-5; Reception: 1-5; Location: 1-5

There were about 6 or so of these and I gave a 1 for each (since zero wasn't
an option). MY final score was 3.5.

------
gnicholas
Reminds me of Airbnb. I’ve stayed in several dumps that somehow garnered
amazing reviews.

~~~
smelendez
I've found the text reviews more useful.

If the reviews all say "Place was a bit messy but the price was right and the
location was great" versus "too bad this beautiful home is so far from the
train station," you'll know whether or not the place is right for you.

And people generally won't downvote for a negative they knew going in.

------
zakk
The solution is simple: ratings are inflated since they are an endless
resource. Let’s make them a scarse resource.

It’s what works in the academia, where anyone can get aribitrarily close to a
4.0/4.0 GPA, if the grades are inflated enough, but clearly only 10% of
students can fit in the top 10% of the class, making this latter rating much
more significant.

Simple example (I am sure there are better ways of doing so) after ten Uber
rides you are given five stars, and you can distribute that to the drivers you
liked the most.

~~~
ACow_Adonis
Except it doesn't work in academia, because it's the same system that's
commonly present in corporate-land, and in practice it's horrible and has
multiple flaws.

~~~
zakk
It’s not perfect, but it’s much more meaningful than GPA.

~~~
ACow_Adonis
Is it? Understand I am in no way defending GPA, but I don't think it can be
taken as given that one works better than the other without some evidence.

GPA in practice has numerous problems on which I could write a whole other
book, but where it fails it is often for social reasons which would equally
apply to and be gamed by people under the "limited resource/grade to the
curve" model.

You will see people start to game away and politicise how the "stars" are
awarded and distributed in both. In the later specifically, you run the real
risk of having to start to fail genuinely good people and pass genuinely bad
people. And there is no universal algorithm used by people in their subjective
distribution of stars that allows one to compare or interpret the distribution
of stars after the fact or between different "ecosystems".

------
lsc
seems to me like an easy fix would be to simply adjust the ratings based on
the person doing the rating... if someone almost always gives five stars, and
they gave a four? there was probably a reason. but that person who always
gives 3 stars? four stars from them is probably a good thing.

~~~
s73v3r_
Well, the other thing is that people know that the drivers have a ludicrously
high rating level they have to maintain. Basically, giving anything less than
a 5 means that they should be fired. Most people, while they may not have had
an outstanding ride, don't think the ride was bad enough to warrant someone
getting fired, so they just rate 5 stars.

~~~
lsc
right, I think most people give mostly fives because of this. I've probably
given less than 5 less-than-five ratings, and I've been using these services
almost daily for more than a year.

But, some people never give 5 star ratings. The way things are now? I think
that what a driver's rating is depends more on if they get me or if they get a
"nobody's perfect" rater than it depends on their actual driving ability.

If the ratings were adjusted by the rater, I think the end result would be a
rating system that better reflected the things they are trying to rate than
the luck of who they get rating them.

------
guyzero
I feel like this is an article that could only be written by someone who has
never owned a car. One, because it's clearly about Uber. But two because
rating inflation has been going on at car dealerships for over a decade.

For those who don't know, when you get your car serviced you're asked to fill
out a satisfaction survey. And they all, ALL, have this disclaimer at the top:
if you can't give us a 9 or 10, talk to us first. Because they're pretty
heavily penalized if they don't get an average rating of 9/10 or better.

It's this bizarre game - they could just say "we want to make you happy so
talk to us if you have a problem" but corporate HQ doesn't trust local dealers
to do this so they survey them. And of course local dealer game the surveys.

------
ThrustVectoring
The thing with ratings like these is that they're always effectively asking
"should this person continue to work for us?"

There's two answers to that question. If they should, you give them every
maximum numeric rating. If they shouldn't, you file a customer service
complaint.

------
lozenge
The article completely misses the reason Uber does this: it looks good when
your ride request gets picked up by a driver with a five star rating, even if
you know what it means. It is similar to how a takeaway aggregator company in
London asks customers to rate their restaurant on a six star scale.

So, it's yet another dark pattern.

------
citizenpaul
This is because in general people have absolutely no idea how to properly rank
and review things. The other part is people generally don't have time so they
review only when they hate or love something. The last is people review for
personal gain or retribution and has no basis on the product or service.

I admit until I was asked to judge an event and received a short training
session on judging I also fell into this category. I now realize how bad I
was. There should be some sort of how to rank/judge/review things class that
everyone should take in school.

------
oh-kumudo
But people readjust it anyway. For my area, 4.8 is a like a baseline good
driver, anything like 4.6 indicates some problem, 4.5 and below is like no-no
area.

------
4ad
I have the opposite problem. Here in Austria drivers give me, a passenger, bad
ratings for absolutely no reason whatsoever.

I had a perfect 5 star rating for many years, then I visited a 3rd world
country where I had some problem with the Uber driver that was supposed to
pick me from the airport. He gave me a one star rating. Then Ubers from the
same country all gave me bad ratings. I have no idea why. Perhaps because I
didn't leave a tip? I don't know.

Then I came back to Austria, my rating was around 4.90, but then it continued
to drop slowly, and it's still dropping. Now it's 4.62. I'm having trouble
getting rides, presumably because drivers reject me. I used to be picked up by
4.9+ drivers, but now it takes up to 10 minutes to find a ride, and they are
all 4.4 drivers.

I don't understand why my rating keep dropping and dropping, especially since
it used to be a constant 5, in the same country, for many years.

~~~
ThrustVectoring
My guess is that there's an informal "I'll rate the driver five stars if they
rate the passenger five stars" system that's emerged, and once you don't have
five-star history drivers suspect that you won't give a five-star rating back.

------
tempestn
I'm thinking Uber might be able to create a better driver quality score using
meaningful user inputs rather than ratings. For example, after a ride, it
could give the user the option to prefer the same driver in the future if
available, or even to wait up to X minutes extra for this same driver (or
something along those lines). The percentage of users who choose these options
would be a very clear signal of driver quality, and would be less likely to be
gamed because the user has a clear motivation to answer truthfully.

I expect there's other info that could be used to calculate the score as well.
Average tips compared to other drivers in the area comes to mind, although
that could have negative consequences (mainly increased pressure to tip).
Perhaps if it were combined with other inputs though. I'm sure there are other
options too.

~~~
dalore
That would be good as the driver would start to learn your pattern and be
ready for your ride (if you're a regular).

------
hackermailman
Homakov wrote about the uselessness of AirBnb ratings VS booking/agoda
recently too

[https://medium.com/@homakov/why-airbnb-reviews-are-
flawed-b8...](https://medium.com/@homakov/why-airbnb-reviews-are-
flawed-b8a961e04c32)

------
dalore
It would be nice if the questions were factual (like Google's guides) but
that's too many questions. For a simple one input perhaps rating out of 110%
would be clearer.

Was your job/ride successful? 0-110% ? (and probably use intervals of 10).

Then most people for the average ride where it works will give 100% as the job
was done. If the driver went above and beyond, you can give them 110% If you
pick 110% an extra option asks, "What did the driver do that was above and
beyond?" Which stresses that 110% is for exceptional service, and also
collects what makes it exceptional.

For rides where the driver didn't do the job properly, like took a wrong turn,
was late, etc. You can mark it accordingly, 90%, 80% etc

~~~
notahacker
I like this idea a lot, but can easily imagine Uber screwing it up by setting
100% as the "this driver is at risk of being dropped" threshold so everybody
feels obligated to give 110% if the driver's been basically polite and on time
anyway...

~~~
dalore
Yeah it's a social problem. But by making rating 110% slightly more cumbersome
to give vs a 100% rating you ensure that people giving 110% went out of their
way to give when they could have just pressed ok.

Actually making it even less options might make it simple and removing a
number system all together.

Scale would be:

\- Horrible/bad/did not do the job at all/etc - For when the job/ride did not
completed or something truly awful happened. Can play with wording. When
choosing this you get a big box to type in the issue and a customer support
service should reach out and see what's wrong. \- Did the job but slight
issues - When choosing this you have to say what the slight issues were, so
there is some friction still in choosing this option \- Did the job, no issues
- This will be the default, friction-free option. So if they pick this they
don't have to type anything else. Quick and easy. \- Exceptional service - If
they pick this they are also forced to type in what the exceptional service
was.

You could even socially tweak/test it with forcing people to tip if they pick
exceptional service (so effectively putting their money where their mouth is,
if it's exceptional you need to pay why). With countries where tipping is
optional this is a good play (but in USA maybe not so).

The tipping part is just an idea. But the idea is to make choosing exceptional
a bit more friction then just choosing job done. And also how to translate
that culturally to other markets. By making them type in more for those
situations it should funnel only users willing to type to give those ratings.

And an option is for users to see not only the ratings of drivers but what
other people typed in the exceptional (or typed in the slight problems)
categories.

------
V-2
What I think about drivers' ratings:

4.5 - he (or she) is gonna be polite

4.6 - ooh, I (mildly) wonder why that is

4.7 - nothing

4.8 - nothing

4.9 - nothing

5.0 - oh gosh, they're still learning

------
makecheck
Meta-reviews might help; for instance, on Slashdot, periodically being shown
random comments from others, with ratings, where you can say whether or not
you agree with the original rating. At least that way, _some_ additional
scrutiny occurs (maybe not objective individually but probably an improvement
when aggregated).

I guess in Uber’s case this might be implemented as: randomly showing previous
riders a rating for a driver they had, and asking if they agree.

------
M_Bakhtiari
What's wrong with the driver's own solutions, except put into the app and
displayed to every customers and adjusted with descriptions are actually
reasonable, eg. "exceptionally bad", "acceptable" and "exceptionally good",
and follow up to make sure the median driver lands squarely on "acceptable"?

------
bradleyjg
Why is it that so many people are volunteering to act as free management
consultants to Uber in the first place? They have billions of dollars, they
can hire secret shoppers to evaluate their employees.

------
lvoudour
Eponymous reviews are a double edged sword. One the one hand there's trust:
other customers know it's a real person assessing a service they actually
used. On the other hand that person can't be 100% honest because there are
social repercussions.

Making the reviews private (as the study suggests) may solve the social
pressure issue but it will damage trust (how do I know the 2 stars are
legitimate and not a company's way to oust specific
contractors/landlords/etc?)

------
lw
This is great news – it makes the service economy more ethical. If a negative
rating is left only when people were truly dissatisfied with the ride, and not
every missing smile or snippy response receives assessment like in the
dystopian Netflix show Black Mirror, we are actually treating our service
contractors as humans.

I also think it's completely fine that people are rating apps and toasters
more negatively, because most of the time, the source of dissatisfaction (e.g.
a bug) can be weeded out once and for all, while being an Uber driver is
emotional work 10 hours a day.

