
Voice UI Is the Future. But When? - zdw
https://mondaynote.com/voice-ui-is-the-future-but-when-747fe1f74cf3
======
onion2k
Voice as a user interface doesn't scale. If you have a lot of people in a
small space (an office, a coffeeshop, a metro train, etc) then voice just
won't work. From a technical perspective it _could_ work with good direction
microphones to isolate the user, and excellent voice recognition to eliminate
accidental activations from other people, but the cost of those things would
be prohibitive, and it'd still fail if the background noise level was too
high. Where voice _really_ fails is the simple fact that a lot of people would
have issues dictating a sensitive or intimate message with their voice in
public.

There will always need to be a second screen- or keyboard-based UI. That
doesn't mean voice won't be hugely important, but it does mean it'll never be
the whole story. Consequently, I don't see it being 'the future' of UIs.

~~~
ocdtrekkie
Indeed, back in my Google Glass days, I could "ok glass" a voice command, but
heck if I ever actually _wanted_ to. It's also super awkward in public... and
I still look at people strangely today when they "ok google" something or the
like.

In a home, I do totally get it. If it wasn't for the always listening
microphones, I'd be up for it. But even then... I might ask my house to do
something, I'm still going to manually want to type this comment.

~~~
ben_w
(This is unedited output of Apple’s voice input):

Voice recognition when I’m out and about don’t want to get my phone out it’s
for the best use for it for me but as this text shows voice recognition isn’t
very good you had even indoors

------
taneq
Anyone who's tried to explain to another person how to do something on a
computer (or what to do with a computer) and eventually, in exasperation, said
"here just let me drive", knows that voice UI is not the (only) UI of the
future.

~~~
clort
As I work with schoolkids, I've found the best way is to direct with long thin
stick like a baton. You can keep your hand well out of their vision and
indicate where you want the mouse to go. If you drive for them, they will not
know what you did. (to be fair, when I point and tell, they don't always know
what they did either)

------
simias
TFA briefly mentions it but I can't imagine how "Voice UI" can be even decent
without strong AI. In order to be more efficient than inputting commands with
a keyboard or mouse you want an "assistant" that will be able to infer what
you want from context. And of course you want to be reasonably certain that
it'll get it right without having to double check everything all the time.

Also, make it available offline. I don't want $bigdata to know everything I do
on my computer. Amazon Echo & friends are way too orwellian for my taste. I
don't know if the general public will care about that however.

~~~
hailk
Wouldn't having a strong AI require lots of user data? And having the option
to make it work offline means letting go of all the possible precious tidbits
that could make your AI strong.

~~~
simias
Maybe, but I don't really care to use Big Brother AI. I mean, you and I are
pretty clever and yet we don't require an internet connection to function, do
we? Maybe it'll take longer to perfect but I hope we'll get there eventually
when it comes to artificial intelligence.

~~~
hailk
Certainly. Internet is just a means to an end, a product is successful if it
solves the problem it intends to solve, with or without net.

A tangential question. What steps can an organisation take to assuage your
worries about your data being mis-used?

------
laurieg
Voice UI has some use cases where it is great and some use cases where it is
terrible.

"Fire and forget" commands are great. "Hey alexa, set an alarm for 8am". Once
you trust alexa to do these tasks you don't need any feedback.

But tasks where you need constant feedback are not well suited to voice UI. If
you accidentally blast music through your hi-fi asking alexa to "turn it down
a bit" 4 times and then "turn it up" finally to get to a nice listening volume
takes much much longer than turning a dial while listening to the auditory
feedback you get.

------
baxtr
This might be the beginning of the “Trough of Disillusionment” for Voice

[https://en.wikipedia.org/wiki/Hype_cycle](https://en.wikipedia.org/wiki/Hype_cycle)

------
CoolestBeans
I'm less interested in voice assistants and more interested in how speech
recognition will interact with existing interfaces. One advantage GUI has over
CLIs is the "graphical" component. Graphics can output to the user so much
more information in so much less time than text (colloquially "a picture says
a thousand words"). Text-to-speech output has the bandwidth problem a CLI has
and it isn't persistent. If you didn't hear the voice assistant the first
time, you have to make it repeat the entire sentence.

What I think is really the killer feature of speech recognition is its role as
an input. Anecdotally, I find the only times I use any voice UI is for setting
alarms (because iOS's time input makes you scroll thru every minute) and
asking a sufficiently long question to Google (because touchscreen keyboards
are still imprecise and I've got fat fingers). Even though both Android and
iOS have speech recognition built into their keyboards I think there's
something someone is missing on the input side.

~~~
kumrr
Couldn't agree with you more. So much so we actually are building a company to
help developer do just that - [http://slanglabs.in](http://slanglabs.in) :)
Provide tools to developers to build a voice interface on top of existing
mobile apps that in our mind would marry the best of voice (as an input
mechanism) with GUI (as an output mechanism).

------
guy98238710
No it isn't. Huge screens are, whether they come as standard monitors, virtual
reality headsets, or 3D displays. People consume orders of magnitude more
information that they produce. When compared to large display, voice output is
just painful.

Furthermore, moving eyes is way faster than any kind of navigation input, even
voice. That's why displays contain way more information than what people are
able to process at any given moment.

And then there's this old principle of GUIs: making choices visible. Screen
can list all available options. Voice UI cannot. Even command-line UIs support
tab-complete for this reason.

If I were to be very cynical, I would say that voice UIs shine when people are
illiterate enough to avoid typing and reading. A less cynical answer is that
voice UI fits where big screen doesn't.

------
coldtea
Voice UI without general AI is not the future. Forget about it.

Even for long form text entry it's tiresome (ever tried speaking for 1 hour?)
-- never mind the noise (e.g. in an office), privacy issues, discoverability
of options, and all that.

~~~
ksec
I guess most people dont work in a Sales Job? Or they have such a decent
package that they work 9-5 on the dot in office and get home very early?

Because most of my days when I am tired, the last thing i want to do is to
open my month.

------
zeofig
Yep, it should integrate nicely with virtual reality and driverless cars! Just
around the corner...!

~~~
coldtea
And all powered by Nuclear Fusion under the supervisor of a singularity AI.

------
Mononokay
It'll happen when the average user stops caring even moreso about privacy, and
when an always-recording microphone doesn't drain battery intensively.

------
mnm1
It won't be for awhile yet. Add to all the other issues mentioned here its
drastic reduction in accuracy when dealing with accents, let alone non English
languages. We're far, far from it working universally. And that's before you
even get to semantics or whether it even make sense as a UI.

------
ttflee
Bitrate of voice UI is miserable.

------
otabdeveloper1
> Voice UI Is the Future.

No it isn't.

~~~
pymai
its the future in the same way that 'smart' watches were the future

------
jlebrech
NOW

There should be a keyboard with integrated voice recognition, and a "voice
assist" button.

Remove any superfluous keys (anything you might have to look down for) and
just keep the typewriter keys.

and add a voico-correct to operating systems, not "auto". you mistype and you
can dictate the correct word for it to fix for you.

once we have this paradigm the os will evolve for it.

I think FULL voice has privacy issues, and annoyance too. I want to quietly
instruct my pc, not talk to it all day (and I don't want to have to hear
someone else do that either)

------
WillReplyfFood
Really. Just like the Gesture UI Tom Cruise used in minority report. If it
looks good on TV is usually strainfull in reality.

I have yet to see a classic gui with a decent NN- which learns your default
behaviour in every step, applying those by ctrl+left-click.

------
dingo_bat
Maybe not exactly voice, but natural language UI is definitely the future. It
would be awesome if I could talk to git and tell it to do exactly what I want
in English. Instead of rote-learning arcane incantations.

~~~
andrewfong
It depends on the complexity and dangers of the command. Git actually seems
like a very poor candidate for natural language.

Consider: Legal documents initially began as natural language evaluated by
very effective natural language processors (other people). Over time, as the
edge cases and ambiguities built up, the language of law diverged to the point
that legalese is hardly considered "natural" today.

~~~
rimliu
Natural language is very poor for anything requiring precision.

