
Google voice search: faster and more accurate - gok
http://googleresearch.blogspot.com/2015/09/google-voice-search-faster-and-more.html
======
buss
Google voice search also has anaphora, or backreferences to previous searches.
For example voice search for "Who was the first president of the United
States?" then after you get the result do another voice search for "Who was
his vice president?" and Google will infer you are still talking about George
Washington.

One step closer to conversational interfaces!

------
msoad
I'm doing a lot voice searches recently. It's just easier and feels more
natural. It's amazing how accurate Google voice recognition is. I'm non-native
English speakers and sometimes I'm not even sure if I pronounced something
correct but it gets it!

~~~
knodi123
I've used it to figure out the spelling of a word I couldn't understand.

There's a reporter on NPR who sounds like he always introduces himself as "han
zhi lu wong". I say that to google voice search, and it corrects it to "Hansi
Lo Wang".

So google voice recognition works _even if you don 't know what words you're
saying!_

~~~
gok
Well, sure, that's precisely how speech transcription works. When you say the
words out loud you aren't speaking in letters, you're speaking phonetic
sequences. It's a recognizer's job to decode the best word sequence spelling
(from its list of known word spellings and their pronunciations) from the
input pronunciation.

~~~
knodi123
My point is, it's amazing that it's better at decoding phonetic sequences
(which are presumably garbled by passing through a human being (me) who
doesn't understand them), than a human being who has evolved to use language
and has over 30 years of fluent experience.

~~~
oska
Any language that uses the latin alphabet has different rules for how
pronunciation is derived from or encoded with the letters.

e.g. with your example, the letter pair ‘si’ is pronounced differently in
Mandarin that it is in English. So it's not surprising that you couldn't write
it down properly in pinyin as you don't know how to transcribe Mandarin into
pinyin. But Google does.

Another example - without knowing how French spelling works there is no way
that as an English speaker you could work out how to correctly spell ‘peut’
(can) just from hearing it.

~~~
minot
The last time I checked, it says Minot as minnow whereas the North Dakota town
rhymes its name with "why not?" So you'd say it as my knot.

Now Google recognizes I'm talking about Minot but it says Minnow back to me.

------
nl
The paper describing this work:
[http://arxiv.org/pdf/1507.06947v1.pdf](http://arxiv.org/pdf/1507.06947v1.pdf)

They trained on 3 million "utterances" of average duration of 4 seconds. These
were distorted by noise to get 20 variations (so the training set was 60
million utterances total).

I don't understand if these were labeled somehow There's a section on
clustering into 9287 phones, but it isn't clear to me if these were used as
labels.

~~~
afsina
AFAIK current ASR systems uses grapheme transcriptions for training their
acoustic models. So they have the speech and the transcription "Hello World"
and during training, they are automatically converted to phones kind of "hələʊ
wɜ:ld" using extensive phonetic dictionaries and some algorithms. 9287 is
context dependent phones. like "a" but there is a "b" on the left and "r" on
the right. Theoretically for 40 phones you end up 40^3 context dependent
phones. But in practice this number is much lower.

------
jasonellis
This is great and all, because I love Google voice search, but the problem I
REALLY wish they'd solve is false triggers of "Okay, Google."

I often listen to podcasts on my car bluetooth and on a bluetooth speaker at
home. On my commute, I'll get at minimum 5-15 "Okay, Google" triggers in a 50
minute drive just from people on the podcast saying things like "and" or
"okay" or phrases that sound nothing like "Okay Google". I have even done the
voice training so it's only supposed to listen for my voice. On the other side
of the coin, I'll sit in my car screaming "Okay Google!" over and over with no
response.

~~~
chrisfosterelli
You should consider trying to retrain your voice model. When I first set up my
phone to only recognize my voice, there was people talking in the background
and it actually made it behave _exactly_ how your describing.

Shortly after I retrained the voice model in a quiet room by myself and now it
works flawlessly.

~~~
jasonellis
I've retrained it in a quiet room in my house and it didn't help, but it's
worth trying again. I'll give it a shot, thanks.

------
aruggirello
Wow! Now it would be great to have this speech recognition service available
as an API.

~~~
teraflop
Chrome provides it to in-browser applications via the Web Speech API. However,
Google doesn't allow any other browsers or services to use that endpoint
(except for Chromium development, and that only with an extremely limited
quota).

~~~
haldean
There's an API for it on Android as well that app developers can use.

~~~
Wingman4l7
I've seen Android Wear apps on the Play Store that utilize it as well.

------
happytrails
Google voice search is impressive and anecdotal, for me, more accurate that
siri and cortana in how it interprets my voice to text. Is there any insight
into the hardware needed to run their neural network and store the learned
material?

------
melling
Will these improvements be used when using voice input in Google Docs?

[http://gizmodo.com/you-can-now-type-with-your-voice-in-
googl...](http://gizmodo.com/you-can-now-type-with-your-voice-in-google-
docs-1728339420)

~~~
trevorstrohman
Yes.

~~~
nomel
While I'll appreciate the technological advancement that this will signify, I
absolutely think that I sound like an idiot if I talk like I write.

~~~
trevorstrohman
It takes some practice to learn how to dictate a response the way that you
would type it. However, it has a lot of advantages. For instance I was able to
dictate this response to my smartphone in a lot less time than would have
taken to type it.

------
ryenus
> now used for voice searches and commands in the Google app (on Android and
> iOS)

Is this available offline or one must be connected?

~~~
ianburrell
It looks like they added offline speech recognition to Android. After the
Google search app updated today, it downloaded speech files for default
language. There is now a settings section for downloading languages. The voice
recognition works when in airplane mode. Some voice actions, line opening
apps, work offline.

------
vram22
>It's amazing how accurate Google voice recognition is.

I can't say how accurate it is, since I've used it very little so far. But
adding my 2c:

I first tried it some ago (2 years+) on my mid-range (at the time) Android
phone, and it was not really usable. Set it aside for a while. Then tried it
recently - on the same phone, mind - which is 2 or more years older now, so
not recent at all. Surprisingly, it worked a lot better than earlier (based on
a small sample of tests, note.) Going to experiment with it more.

Something that might be known to many readers here, but mentioning it:

Peter Norvig, Director of Research at Google, has said in the past that by
training the voice recognition software on huge amounts of data (at Google
scale), they have managed to improve it a lot, by using statistical
algorithms. (Similarly for spelling correction suggestions in Google Web
Search.)

Related: A couple of simple experiments by me with voice recognition (speech-
to-text) and speech synthesis (text-to-speech) using Python:

1:

[https://code.activestate.com/recipes/578839-python-text-
to-s...](https://code.activestate.com/recipes/578839-python-text-to-speech-
with-pyttsx/?in=user-4173351)

[http://jugad2.blogspot.in/2014/03/speech-synthesis-in-
python...](http://jugad2.blogspot.in/2014/03/speech-synthesis-in-python-with-
pyttsx.html)

2:

[http://jugad2.blogspot.in/2014/03/speech-recognition-with-
py...](http://jugad2.blogspot.in/2014/03/speech-recognition-with-python-
speech.html)

------
Animats
Google Voice Search: broken.

Some time in September, Google made some server-side change to Voice Search
which causes the Android Google Search client, at least some versions, to
crash. Android handsets get a pop-up with "Unfortunately, Google Search has
stopped."[1][2][3][4]. This also breaks voice dialing and texting. Some people
who had voice input as the default found they could no longer text at all,
until they disabled Google Voice Search. It's not a change on the client side;
it's happening even for phones that don't have over the air updates enabled.

The usual suggestions, involving clearing caches and resetting various
settings, have been made, and they're as useless as usual. The problem
appeared a few weeks ago, and has been reported for at least T-Mobile and
AT&T, and for at least ZTE and LG phones. So it's not carrier-specific or
handset-maker specific.

Did this "faster and more accurate" change involve a change to the wire
protocol? A recent change is clearly crashing the client side in the phone.

[1]
[https://productforums.google.com/forum/#!topic/websearch/0ZM...](https://productforums.google.com/forum/#!topic/websearch/0ZM-g7v5Au8)

[2] [http://forums.androidcentral.com/general-help-
how/582873-why...](http://forums.androidcentral.com/general-help-
how/582873-why-does-my-phone-keep-displaying-sorry-unfortunately-google-
search-has-stopped.html)

[3] [https://forums.att.com/t5/Android/Google-Search-has-
stopped/...](https://forums.att.com/t5/Android/Google-Search-has-
stopped/m-p/4315687#M63316)

[4]
[https://support.t-mobile.com/message/518061#518061](https://support.t-mobile.com/message/518061#518061)

~~~
Animats
The rating on this posting has been going up and down every few minutes. It's
amusing to see what happens when you criticize Google or Apple. Apple seems to
have a response time of about an hour before criticisms get down-voted. Google
is faster.

------
felixgallo
Google Search unfortunately removed a great feature on Android: "okay google,
search <blah> on Spotify." Now instead of opening the native app with the
search intent, it goes to the web result. :/

~~~
Wingman4l7
When was this removed? I'm running Android v5.1.1, just tested this and it
works fine. Thanks for the tip!

------
pgrote
I wish there was an easier way to report outliers or wrong results. For
instance, I asked to see photos of Tony Cruz. It showed me photos of Toni
Croos. Understandable that a soccer player may be more popular than a baseball
player, so I restated and asked for photos of Tony Cruz of the St. Louis
Cardinals.

It took the query and showed the same photos. lol

Collecting these sort of results into a larger data set could help refine the
results.

~~~
ramblerman
Hah, That's a shame.

I had the same happen when I asked it to play Bulerias. Which it kept
understanding as some common variation of that word, Blue rays, etc..

As soon as I provided context and said flamenco bulerias that fixed it though.

------
Pxtl
I've been trying to use the voice-search feature, but my phone is a moto G 2
and maybe the phone is too slow or my internet connection is too weak, but I
find the long delay after "OK Google" makes it just too clumsy to use
naturally.

------
gok
I'm curious how much of the accuracy gains come from only having to run the
decoder when the LSTM emits a phoneme rather than for each 10 ms frame, which
presumably allows the language model search to be much more aggressive.

~~~
nl
Where are you seeing that? In the paper[1] it says:

 _Acoustic features are generated every 10ms, but are concatenated and
downsampled for input to the network: 8 frames are stacked for unidirectional
(top) and 3 for bidirectional models (bottom)._

[1]
[http://arxiv.org/pdf/1507.06947v1.pdf](http://arxiv.org/pdf/1507.06947v1.pdf)

~~~
gok
I'm not talking about the input, but the output:

"…predicted word sequence where the word with highest prob- ability is taken
ignoring repetitions and the blank label with no language model or decoding."

Which I took to mean: when the acoustic model emits a blank symbol, they don't
run the decoder again until a non-blank symbol comes out.

~~~
afsina
I think that is just a technology showcase that this technique is so good that
for some small-mid size pronunciation dictionary, it can work quite well
without using a language model. Similar works were reported recently (ASR
systems without language models or with character models). But seems like real
system is still using a graph generated with a language model and context
dependent phones.

------
hagope
Please Google release an "Echo" and I'll buy it right away...

------
newmotors
Is there a way to run this without an internet connection?

~~~
nl
Yes, the model can be loaded onto your phone. Details are here:
[http://stackoverflow.com/a/21329845](http://stackoverflow.com/a/21329845)

------
hudixt
I think this is great, previously it took long time to open after saying "OK,
Google".

------
amelius
Does this mean that voice recognition is a solved problem?

If not, what problems are still left to be solved?

~~~
nly
It's heavily optimised for quick contextual queries. I hit the microphone and
said "<supermarket name> <my town> opening hours" today and it simply replied
(aloud) "<supermarket> is open until 21:00". This is great stuff, but it still
feels like a voice interface to Google vs a personal assistant like Cortana or
Siri

~~~
trevorstrohman
Feel free to say "Hello" to get a quick tutorial of the assistant features.
The Google app understands searches as well as assistant-like features (like
"send a text" or "open Facebook")

------
ausjke
how to test this? just speak to android-search without any update needed?

~~~
notatoad
yes. the speech processing is done server-side, there's nothing to update in
the app.

------
deegles
It will be great once we can run the generated model locally. It would save a
bunch of latency and bandwidth, not to mention the privacy implications of not
having speech saved in the cloud.

~~~
thrownaway2424
What do you mean? Google Voice Search works for me with my phone in airplane
mode.

------
jrcii
Great, now can we please have Select All for our Google Voice inbox so those
of us who forward our texts and calls to Gmail don't have a Google Voice page
that says "Inbox (8821)"?

