
AI Assistants Have Poor Usability: Study of Alexa, Google Assistant, Siri (2018) - spideymans
https://www.nngroup.com/articles/intelligent-assistant-usability/
======
sdenton4
This is a bit inevitable. The UI of your usual screen based app has buttons
for the things you can do. This has the effect of limiting your expectations
to those features which have been implemented. A general voice query is open
ended, though: there's no guidance around what exists, so it's very easy to
invent a query that's 'off book' in one way or another... Things which haven't
been implemented at all, or which have, but are requested in a way the machine
does not expect.

~~~
chrisin2d
Indeed, an open-ended interface like a VUI invite unimplemented interactions.

I worked on a conversational agent and now realise that there are fundamental
usability barriers that are very difficult to overcome:

\- Limited information bandwidth. Adults can visually read between 250–600
words per minute, plus they have peripheral vision that can help in scanning
for shapes, colors, and images. Plus information in a GUI is persistent on a
screen for easy reference. For voice, adults can only comfortably listen at
150–160 WPM and would need to hold information in their memory for reference.
This makes voice ecommerce impractical for anything beyond familiar
essentials.

\- Lack of editing layer. It’s simple and straightforward to correct text box
input, but it’s difficult to correct a voice command and doing so is ambiguous
and adds extra chaotic information. A lot of people think aloud, so they
frequently self-correct themselves mid-sentence or add on information in an
ad-hoc manner: “I’d like a cappuci—er, make it a latte. (pause) And oh with
soy milk.”

\- High context switching cost. GUI’s keep contexts within frames, tabs, and
windows — and the user can easily (low cost) switch between them. This is an
Amazon shopping cart context, this is a Reminders context, and so on. A button
press is unambiguously contained within its context. In voice, contexts are
not quite parallel but sequential and have to be built up over time.

------
thelazydogsback
We've had better technology in NLP and problem-solving since the mid-1970's --
and many methods that were deemed intractable are trivial now given that
compute is on the order of 1 million times more powerful. Systems that use
partial-order hierarchical planning, abductive logic programming, bi-
directional search, constraint satisfaction, multi-modal interaction, multi-
agent conversational modeling based on planning applicable speech-acts, etc.,
have been replaced with simple state-machines, and in "advanced" cases, simple
slot-filling -- and more recently relatively shallow black-box stochastic
methods based on deep learning models.

Having worked at two companies that offered such systems, I can say that this
is due to both not understanding/appreciating what has been done in the past,
but also because companies treat these systems like car companies treat
features -- they know that an deliver rich features in the future, so the
"logic" is to wait to do so, in the hope they can make more money in the long-
term, and be able to dribble out new features.

To be fair, there is also the issue of reproducibility and predictability --
companies expect everybody's agent to respond the same way given the same
input, rather than have variability in agents due to non-determinism or
context and learning -- but we can't ask for for flexible "human-like" AI on
one hand and also not expect the variation (and occasional misunderstandings)
that humans would also suffer.

------
jbarrs
>Even though it goes against the basic premise of human-centered design, users
have to train themselves to understand when an intelligent assistant will be
useful and when it’s better to avoid using it.

I find this is one of the truest things about this article. It's very easy
when you first get a new voice assistant to get carried away with the features
it has available. Most are either very basic, struggling to take much
additional context, or simply just gimmicks/novelties. The majority of tasks
can be performed faster through a GUI when you account for the time it takes
to brute force the right "natural language" expression to do the thing you
need.

~~~
aaomidi
Well the idea is that you get hands free control of the device.

I'm laying in bed and I suddenly realize I need an alarm for tomorrow morning.
I don't want to turn my phone on cause that can mess with my sleep.

I just tell siri or Google to set an alarm and sleep.

Or I'm in the middle of cooking and I want to listen to music but don't want
to have to wash my hands first.

That's really where the benefit of the assistants lays.

------
29athrowaway
These days, a useful command has been "Alexa, what is the air quality index?".
It's a good way of knowing if I can open the window.

~~~
brokenmachine
That's /r/aboringdystopia material right there.

------
raobit
Who genuinely uses Assistants from their respective phone and for what tasks
in general,just wanted to know is someone really making most use of it?

~~~
endanke
I use them almost on a daily basis to control the lights in my apartment, it's
surprisingly convenient and helps with multitasking when I'm in a hurry. The
same thing goes for weather and reminders. The best use case when you're in
the middle of something, eg. cooking and just want to make a simple note or
set a timer.

~~~
gambiting
Does it actually work for you though? I've quickly abandoned Google assistant
because it just misunderstands what I say about 2/3s of the time. A friend of
mine who has all his lights controllable by voice complains about the same
issue - a lot of the time, Google just doesn't understand the command so by
the time you get it "right" you could have walked up to the switch 3 times.

~~~
TheOtherHobbes
I had Alexa controlling the lights for a while, but I found talking requires
much more cognitive effort than picking up a remote, finding the right button
by touch, and pushing it.

I can literally do the latter three quarters asleep, but not so much the
former.

It would be far more useful to have the process almost completely automated.
Lights go on when someone enters a room and go off when everyone leaves, with
optional manual override.

This turns out to be a hard(ish) problem that needs better sensors and/or some
form of personal ID.

------
meheszjeno
These solutions don't mean to be help/assist their users but to help their
vendors to collect information about the users.

~~~
IshKebab
Why does HN have so many conspiracy theorists?

They're clearly designed to make money by keeping you in the Amazon / Google /
Apple ecosystem. I don't think any of them make much money from data
collection because they don't collect that much data.

I have Google Homes and you can go and look at all the data it collects - for
me it is 90% "hey Google play some music" or "hey Google what's the weather"
or "hey Google set a timer for 10 minutes".

Useful for improving they speech recognition but not much else.

~~~
hyperman1
There is 1 more important piece of information: Your voiceprint.

With it, google can identify you by voice, put a mic in a public infopanel,
and gues which commercial has most impact.

If the NSA has access, they can classify public recordings by people at that
place, so I assume they are interested.

HN has so many conspiracy theorists because we know what is possible with
infotech on an industrial scale, the spooks have the money for it, and snowden
basically provided the proof they did it.

~~~
DEADBEEFC0FFEE
I don't really understand why people don't seem to want tailored ads. All ads
interupt attention, better that they have a chance if being useful.

~~~
luplex
In today's world, ads are no longer about telling me about solutions to my
problems, but they are about creating new needs. I'm fine with my current
needs and I don't want smart people pulling my psychological levers to spend
money on things I don't already want.

As ads get more and more effective, people will be incentivized to put up more
and more ads. But there are already way too many ads for my liking.

~~~
harshitaneja
I am quite happy with buying things I didn't need. I work to buy luxury(and I
say this as a nomad who lives out of a single luggage with 6 t-shirts). Most
of what we buy today were not available a couple of centuries ago and yet
quite a large population would swear that they are "needs".

There are too many ads but it is possible to avoid a great percentage of them
if inclined. My interaction with ads is pretty minimal and when presented
often enjoyable. Using uBlock Origin I don't see ads on most sites. Youtube
premium means no ads there as well. Spotify premium, so no ads for music.
Video streaming services like netflix don't have ads.

My only ad interaction left is reddit(which offers premium) and podcasts or
youtube video in-content ads where they are narrated by the creators whose
content I am happy to support and in some cases I find the way they switch to
ads quite funny.

------
dafoex
This would potentially destroy any hands free benefit a virtual assistant
would have, back in the day of the secretary (the roll these talking Pringles
tubes are trying to emulate) the bossman CEO or whatever would press a button
on the office intercom to talk with his secretary. I think virtual assistants
would benefit from a simple push-to-talk style input so they don't
misinterpret a pause as a stop.

------
gundmc
What's the point of posting this analysis that is over 2 years old in a young
and rapidly iterating space?

~~~
outtatime
Have human voices changed that much in two years?

~~~
leafboi
No But recognition and analysis has...

------
Catsandkites
Half on-topic: Is there an API on Android that allows you to write your own
assistant with your own keyword set without sending everything over the wire?

An app with microphone permissions would obviously work-ish, but is it
possible to listen in the background like the assistants do?

------
mlang23
My main criticism about Alexa at least boils down to bad usability. To me, it
feels like learning a bunch of magic spells by heart. It feels so awkward. I
am used to exact commands as a long-time CLI user. However, when it comes to
speech, I somehow don't want to accept having to remember the other end of a
pattern matcher. All the skills I have tried so far had a similar feel to it.
I feel overwhelmed when I realize I have to remember how to launch the skill.
I should probably give GA a try one day. But for now, I give the voice
assistant thing 5 more years, hopefully it will be something I want to use
then.

~~~
lolinder
I took a class in college on voice interface design, and the professor named
this as the biggest hurdle that's yet to be overcome. Screens and keyboards
are the computer's domain, and we're comfortable adapting to them. But speech
is our domain, and we expect the computer to adapt to us. The problem is that
the tech isn't quite there for real understanding of language.

