
The Rise of Voice Interfaces - pmcpinto
http://www.wired.com/2015/09/voice-interface-ios/
======
douche
Hate people typing on noisy mechanical keyboards? You're going to love
everyone babbling into their devices all the time.

I really don't want to talk to my computer/phone/coffee pot/house. I would
rather that software have its normal interface polished and made closer to
something that is usable, than have another UI paradigm bolted on that is
going to be badly implemented 80% of the time by application developers.

~~~
Labyrinth
To add on to that most of the voice enabled apps have trouble even recognizing
speech when it comes to different dialects of English. For example, Southern
dialect, "I am fixing to go to the bar, are you ready" would translate to "I
am fixing the bar are you there yet?".

~~~
kylebgorman
Plenty of technology exists for adaptation to speaker and/or dialect, it's
just not considered particularly sexy. To phrase it in economic terms, you can
spend your time doing algorithmic work to improve recognition slightly for
your existing English users, _or_ you could just deploy a new system for the
70 million or so affluent, tech-literate South Koreans or the maybe a half a
billion or so upwardly mobile Hindi/Urdu speakers.

But, give it a little time and you can expect that authenticated users will be
getting speaker adaptation.

------
netcan
A UI is basically a trade tongue. We don't speak machine and machines don't
speak human.

A UI is a least common denominator that everyone can learn quickly. But it's
nowhere near native quality. You can't get much more than simple one liners in
trade talk. Good enough for negotiating a chest of cloves, but you'd be hard
pressed to explain the difference between marxist-leninist and labour
socialist perspectives on international commerce in a trade tongue.

When people hear "voice interface" what they think is a machine that can
understand human so we can stop speaking goddamned pidgin all day. That is
essentially a turing test, or some subspecies thereof.

I'm somewhat skeptical about this kind of thing. I'm confident that mechanical
issues like translating sounds to sentences can be solved (the voice part).
The translation of human to machine code… That's a different challenge.

It's an interesting case though, to have out in the wild. Siri and whatnot. An
application (as opposed to an average starship computer) has a limited raison
d'être, therefor a much tighter search space.

Meanwhile, we _have_ been making consistent progress on turning our clunky
pidgins into nice creoles, more elegant and powerful. Machines can understand
higher level programming languages that people can get more fluent in. Average
humans are generally becoming more computer literate (or fluent in UI, to keep
my analogy straight). UIs are getting more intuitive, which basically means
closer to human language. The common ground is expanding from just common
ground and a wider language is being developed that accommodates us both
better.

The question to answer is what part of that common ground is voice is open, I
guess. But, I would wager that the common ground we find will not be human
sounding anytime soon. It will be a UI metaphor, like the desktop. We will
need to learn it.

~~~
cmaury
Absolutely right.

We refer to this as the Scarlett Johanssonn problem. Any system that is smart
enough to understand what we mean by the Scarlett Johanssonn problem, and the
reference to Artificially Intelligent virtual assistants, would itself need to
be a strong AI.

We've already solved the problem of translating human to machine code for
graphical user interfaces. It's the desktop with the keyboard and mouse. It's
the app and the touchscreen.

You're exactly right that the question now is how can we do the same
translation for voice-based systems.

So far the approach has been Virtual assistants, which is really hard, see
again the Scarlett Johanssonn problem.

We've been working to find the right design/interaction metaphor, and the app
metaphor seems to work really well.

Not only is the search space limited, like you mentioned, but users have a
more intuitive since of how to interact with the app. When you open a shopping
app, you know that you can likely search for something to buy ("I'm looking
for a new pair of shoes"). When you interact with a general virtual assistant,
that interaction is less clear.

The app model also works for scaling conversational interaction beyond a
single context. Individual apps can support their own language for interaction
which best suits their function. This of course necessitates design standards,
just like we have for GUIs.

The limitations for fully voice-based applications aren't technical at this
point. They are design problems with known solutions. The break out for voice
will happen when the right form factor comes about with the right UI design.

~~~
Isamu
>The break out for voice will happen when the right form factor comes about
with the right UI design.

Chris Maury? If so, hoping to see your progress on this with Conversant Labs
in the coming year or so. After I read the article I was wondering if I'd see
you on HN.

Also my daughter (who also has Stargardt's) found it very inspiring to chat
with you at the accessibility meetup.

~~~
cmaury
You caught me :). It was great meeting you daughter! She seems to be doing
great and taking things on with the right attitude.

Feel free to email me anytime, if you ever need anything:
Chris@Conversantlabs.com. Happy to talk through things in a more private
setting than the comments section of HN.

------
roel_v
I've been thinking about voice interfaces for home automation lately as I've
been tinkering with my HA setup, so let me slightly highjack this topic: does
anyone have an idea of how to set up a system where you don't have to talk
directly into a microphone to get a computer to do stuff? Most current
applications (e.g. voxcommando) take care of the 'speech recognition' part
just fine, but I don't want to have to go hunt for my phone to turn on the
light - I just want to say 'computer turn on light' and have it 'hear' me all
the time. But I don't want mics in every corner either, of course.

So a sensitive mic per room, preferably one I could put deploy in severval
ways (just put it down, hang at the wall, powered from mains or batteries,
...) that would send whatever it picks up to a central processing server seems
the way to go.

Do such things exist? And beyond the hardware, how does one tackle e.g. the
privacy problem of having everything you say recorded all the time? I'd love
to have a way to say 'computer lights off' when I'm in bed just before I go to
sleep, but for obvious reasons I'm hesitant to open myself up to the
possibility that any dude with a high-gain antenna would be able to listen in
on _any_ sound made there...

~~~
falcolas
> So a sensitive mic per room

Definitely exists, but it won't be inexpensive, particularly if you want only
one to cover a large room. You'll be looking for a directional microphone,
with a roughly 90 degree cone. Having more than one would be a boon, the
narrower the cone, the greater the range (in theory).

> how does one tackle e.g. the privacy problem

If you control the hardware doing the voice processing, it's quite simple:
don't log the transcripts or save the audio files. If you can have a set of
"this is a command" keywords it makes it even easier, since you can shut off
the recognition while a conversation is going on, so long as it didn't start
with the command keywords. If you don't control the hardware, you have to
place your trust in the person who does control the hardware.

~~~
kaffeemitsahne
Doesn't have to be that expensive, you can get one or two cheap stereo mics
per room and work on the data. Tascam's portable music/voice recorders support
a very wide range of input levels.

------
32bitkid
Until I can have a conversation with a machine–like I would a person–then I
feel like voice interfaces will languish in the uncanny valley of HCI. I
notice the problem the most when listening to voice commands from a GPS; there
are times when the computer gives me an instruction, and I really want to say
, "That's not going to happen."

It would be nice if it could pre-emptively start to recompute–like a person
navigator would–rather than wait until the GPS detects that I'm not on the
course that it thinks I'm going to be on to make a correction that will
essentially go the same way. But we can't have a _conversation_; just an
exchange of commands. The machine tells the meatbag to do something, the
meatbag tells the machine to do something.

Tangentially, I don't _really_ want voice interfaces like in Star Trek/Jarvis;
I want a digital doppelgänger. Something that can act on my behalf and
convincingly sound/act/respond like I would. I don't want to talk to a
machines more, I want to talk to other humans less.

------
wahsd
There are so many things wrong with this article. Not even mentioning that,
unless I am totally screwing up as I am on 0% coffee, the Space Needle is
nowhere near Washington DC.

I have a feeling we are going to head into a new era of hype and bubble that
is far less defined by hardware, but more by constant promises and probably
frequent disappointment about the value of virtual assistants.

------
rasz_pl
Show me to buy milk at this opportunity!

