
The Problem with Feedback - JeanMarcS
https://www.theatlantic.com/technology/archive/2018/11/why-ratings-and-feedback-forms-dont-work/575455/
======
orev
One issue I have is when feedback is clearly being collected with intent to
manipulate, like with a cable company. They ask you to specifically rate the
individual person, and you know that the person sitting in a call center all
day might lose their job because of a bad rating. You might give a good rating
even if you had bad service (or at least you’ll give a 5 instead of a 3). Then
the company turns around and advertises that they have the best customer
satisfaction ratings, etc. when in reality everyone hates them, but doesn’t
want to blame it on the poor person stuck in the call center.

~~~
maroonblazer
>and you know that the person sitting in a call center all day might lose
their job because of a bad rating.

It's more likely they'd lose their job if they consistently receive bad
ratings, from several different customers. And they should.

Given how much it costs to bring on new employees it's more likely that one
bad review results in a consultation, additional training, etc.

~~~
jschwartzi
My girlfriend works in retail, and every month there's someone who gives her a
bad survey because she was following company policy or because the customer
didn't listen to her when she was advising them. For example, she worked with
a couple to port their phones over from AT&T despite advising them that their
phones may not work on her cell network. After all was said and done she got a
bad survey because their phones ended up not working well on her network. And
her manager even caught them outright lying in the survey about how the
interaction went down. She was still punished for that survey.

Customer satisfaction surveys can be a useful tool, but they're often applied
as a blunt instrument on employees who have no control over the situation the
customer is dissatisfied with. It's basically a lazy copout as opposed to
actually fixing company policy. I tend to ignore them unless it's to give good
feedback.

A more humane system would take the feedback and then interview the employee
in question and their management to understand what the root problem is. But
the management organizations that use these surveys are almost never humane.
Plainly, they don't give a shit about any of their employees.

------
iambateman
Whenever I ask for feedback with friends, I say “on a scale of 1-7” which has
the effect of being both a tad off-putting and rather memorable.

Assuming I care how they respond, there are a few benefits: \- people are
conditioned to rate at the extremities with a 10-point scale, whereas they
tend to use the whole spectrum more when the scale is reduced to seven points.
\- most people don’t realize that “1-10” has no true middle number. On the
other hand, 4/7 is perfectly “in the middle”. \- if you believe psychologists,
the average persons working memory is “7, +/-2” items. In my opinion, the
brain seems to treat the “2” and “3” responses on a 10-point scale as roughly
indistinguishable. There is similar haziness between “6” and “7”. The average
person seems more capable of making distinctions across seven points.

Anyway, when you’re asking for feedback (and not on a first date), consider
the seven-point scale!

~~~
brianmcc
Curious about why you ask for feedback with friends? Do you mean to get a
rating of a movie or restaurant or something - or are you looking for feedback
on your friendship performance! :-)

~~~
iambateman
Baha good question. Movies and restaurants is it.

I don’t frequently ask friends to rate their interactions with me. ;)

Perhaps I should.

------
a13n
I've been working on a startup in the user feedback space [1] full-time for
the past 3.5 years, and love to see the topic of user feedback on Hacker News.

Uber doesn't want your feedback because they care about your ideas/opinions.
They want it because it's critical to their business.

Put yourself in their shoes. They have 75 million users and 3 million drivers.
A critical part of growth is retention: not bleeding the existing user base
you have, by offering a quality service.

So how do they identify undesirable drivers to maintain the safety and quality
of their service? And in a scalable way...

You ask users to give drivers a review out of 5 stars. Chances are, if
multiple people give a driver 1 star, others will too. It's a pretty good
proxy for driver quality.

I can definitely see how this feels "dehumanizing" and like you're a "cog in
the system". You are a cog in the system. You're a cog in their automated
system for keeping their service safe and high quality.

There are different kinds of user feedback, for different reasons. One kind is
user interviews, where you want to hear people's ideas and opinions for how
you can do better. This is pretty human. Reviews are pretty automated, and
just a different kind of "feedback".

If you run a consumer company (as opposed to B2B), it isn't practical to
conduct user interviews on even 1% of your entire user base. So it isn't
surprising that giving feedback to consumer companies tends to be a pretty
negative experience.

[1] [https://canny.io](https://canny.io)

[2] [https://www.businessofapps.com/data/uber-
statistics/](https://www.businessofapps.com/data/uber-statistics/)

~~~
iaabtpbtpnn
I give Uber drivers 5 stars unless they did something egregiously horrible,
because I know that if I give 4 stars, and enough people do the same, that
driver will be banned from the platform, even though most people would say
that 4 out of 5 ain't bad. Similarly, if I ever actually take the time to fill
out a receipt survey from a retail store, I just give full marks across the
board, because I know that otherwise some poor underpaid single mom is gonna
have to "be accountable" for my responses to an uncaring corporate drone. On
the other hand, a person giving a 1 to an Uber driver or bad marks across the
board on a retail survey is probably unduly pissed off, and upon further
reflection (or exposure to the consequences of their score) even they might
not agree they suffered a truly 1-worthy experience. So my question to you is,
how do you distinguish signal from noise?

~~~
hugh-avherald
I'm surprised Uber doesn't add a remark like "Your rating for <Driver> will be
compared to your ratings of other drivers." beneath the rating. This could (a)
encourage people to give 1-5 star ratings more uniformly and (b) discourage
constantly giving 5-star ratings.

Depending on how much they wanted to get more uniformly distributed ratings,
they could (a) make the rider rating depend on the uniformity of the ratings
given or (b) just use a comparison rating ("Was this better or worse than your
previous trip?").

~~~
lozenge
They don't want uniform ratings. They can either show "average rating 4.7" for
a driver (assuming everybody is rating 4 or 5 because they don't want the
driver to be fired), or "approval rating 70%". 4.7 of course sounds better.

------
jasonshen
This is a frustrating piece to read as a startup founder and product manager
because I don't think the author understand how much companies already do
think about the things established in the piece.

> "If thumbs-ups or ratings on a five-point scale are not automatically
> useful, what kind of feedback would be? Finely tuned feedback that targets
> the system it’s meant to regulate will always surpass a barrage of angry or
> ecstatic reviews. Rather than trumpeting the desirability of all feedback,
> apps and review sites should pursue only the information that is crucial for
> making the system work better."

It's hard to know what data points correlate to making the system work better
until after the fact — so developers generally have to "over ask" so you can
back test data to outcomes.

Probably the best criticism of the proliferation around feedback is that it
doesn't always make the customer's experience better, it just creates better
conditions for the company to make more money.

~~~
taeric
"back test data to outcomes" sounds scarily like p-hacking. That is, it seems
highly likely you will still not find anything actually meaningful. You are
just guaranteeing that you will be able to find something.

This is especially true if you have retention numbers, already. Why bother
asking if I was happy with a purchase, if I never buy again anyway.

Note that I am sympathetic. There is every chance that I will buy something
again simply because the exit survey reminded me of your existence. There is
also the chance having an exit survey will make me not want to come back. :)

~~~
stuartaxelowen
Not so - the business is looking for real ROI, and thus will not be satisfied
with topics that are "in the noise". Also, you create the measurement that
establishes a problem before you apply the treatment (create new feature X or
fix bug Y), and then verify after treatment that the desired affect was
achieved.

~~~
taeric
If you are scanning historic data looking for correlations, you will almost
certainly fall for some of the same traps that are common in p hacking.

Now, can you mitigate some by carefully doing an experiment again? Certainly.
But you should also not constrain experiments to only things to have been
measuring already.

------
stuartaxelowen
I think the author has it wrong, what the author sees as a "barrage of angry
or ecstatic reviews" is actually a rich array of personal experiences waiting
to be interpreted.

Traditionally, this interpretation has been done wholly by humans, and was a
long and laborious task. However, we can now do much of the manual work with
NLP [1] - discovering topics of concern and measuring the prevalence and
sentiment. This lets us ask the dataset useful questions, like "What issue is
angering customers most?", and "What features should we be talking more
about?"

It's a move towards a much, much wider sampling of customer experiences
(compared to focus groups), and I think it should be celebrated for empowering
companies to make products/services that consumers really, really like.

[1] [https://taggit.io](https://taggit.io)

------
chiefalchemist
"Think about it: The proliferation of ratings systems doesn’t necessarily
produce a better restaurant or hotel experience. Instead, it homogenizes the
offerings, as people all go to the same top-rated establishments. Those places
garner ever more reviews, bouncing them even farther up the list of results.
Rather than a quality check, feedback here becomes a means to bland sameness."

The problem is, these rating lack context. Consider a pop song with a
gazillion streams on YouTube. How many of those were "unemployed" 15 year olds
pretending to be sick and staying home from school? Without some sense of
context those gazillion likes have little meaning.

And of course, baring some drastically awful experience __, there 's
confirmation bias. The truth is most people aren't objective enough to admit
they might have made a bad choice. Of course it was the best ____ ever! It was
my ____. I bought it.

 __Even so, there are always idiots who will order pizza at a place that
specializes in curry, or curry from a place that does Mexican. Yes, it 's on
the menu. But that's doesn't mean you should order it and then complain about
it.

------
robocat
Some places want you to rate the employee (e.g. 1 to 5 stars at touchpad at
checkout, or at toilet exit at airport in some countries).

Just about every time I want to rate the employee as a 5, and the company (or
management?) at a low rating because the business has obviously been the one
to screw up, not the employee.

Asking to rate an employee is a sign to me that the company service is poor.

~~~
tapland
Also, they hired the employee and should be around enough to be able to
evaluate if the employee is doing what they want.

------
MrQuincle
One feedback loop for the customer might be insight in what type of other
customers like a place.

I actually might visit a place if I know that a particular group of people
strongly dislikes it. People have different tastes and a review system might
account for that.

It would require people to fill in a few questions about themselves. Or it
needs cross-referenced with Facebook likes or something like that.

This information can belong to the people!

~~~
dsamarin
Can we identify groups of people by similar interests and personality type and
classify them with a label? It doesn't seem like it would promote inclusivity
and diversity among people with disparate interests since we are all "equal"
but it would be very helpful for this use case. We can create labels by having
people self identify who they think they are.

U.S. Patent Pending

------
mark-r
I have trouble completing those surveys - I want to be fair and accurate, but
it's simply not possible with subjective impressions and no guidelines. Life
is too short for such angst; I generally refuse surveys when they're presented
without a second thought.

------
gumby
This is why Netflix eliminated the star ratings -- it was too hard to figure
out the degree of difference inherent in four vs 5 stars, plus it was a manual
step.

And to the author: kybernetes (κυβερνήτης) is the steersman of a boat, not a
governor.

~~~
Wowfunhappy
If you want a counterexample though, Goodreads seems to work really well with
a five star rating system. Goodreads reviewers in particular seem to take the
precise number of stars quite seriously, and many reviews include a line like
"I _really_ wish I could give this book 2.5 stars, but GoodReads won't let me,
so I had to round up/down."

I've wondered before if this says something about heavy book readers as a
population, since the five star rating system doesn't seem to work that well
anywhere else outside of professional critical reviews.

------
TeMPOraL
Cybernetics didn't fail expectations. It works astonishingly well. But it
seems people had some pretty weird and dumb expectations when lifting it from
the domain of hardware and into more "fuzzy" real-world systems. Feedback
loops are an incredibly powerful tool, but they get misused (often
purposefully), and then the blame for that misuse somehow falls on the
feedback system.

Consider:

1) As a user, won't be satisfied with a feedback system if the company on the
other end is evil. Take Uber as an example. The tension between drivers and
passengers isn't caused by the 5-star rating itself. It has another critical
component - that Uber set the system to squeeze out as much value out of
drivers as possible. Since they essentially optimize for bad driver treatment,
drivers get defensive, and this screws up user experience. Hitting with
1-stars until morale improves isn't the solution; not exploiting people is.
Similar thing happens in some customer service interactions, where rating is -
again - not a signal on quality, but on whether or not I hate the other person
enough to want them to lose their job.

2) A rating system won't work for you if you connect it to a money-printing
machine. This one should have been obvious immediately, and definitely should
be by now. This is why Amazon, eBay, Yelp, et al. have problems with reviews.
As long as their presence or absence impacts immediate sales, they'll be gamed
to the point of uselessness.

3) Some companies like to deploy ratings to skip doing the work, and then they
cry foul when the results don't materialize. Again, Uber won't solve driver
problems with star ratings; that would require actually meeting and evaluating
people. I can hear the screams - "but that doesn't scale!". So what. I want to
go to the Moon by lifting myself by my own bootstraps, but I can't, because
that's not how physics works. I'm not going to cry that I can't get what I
want without doing the work that's necessary to get it.

4) Feedback loops must be actually closed for them to work. I.e. not just
collecting the data, but acting on it.

5) If you can't capture the entirety of your vision in directly measurable
metrics (and you probably can't), then don't follow them blindly. In
particular, be mindful of what you _actually_ measure, 'lest you end up
pissing off a lot of users and defending yourself with "data told me so". "You
make what you measure" works both ways.

\--

TL;DR: Blame the people/companies deploying feedback systems for being
clueless or having malicious intent, not the feedback systems themselves for
simply working the way they work.

------
lotusko
adjust loop

