

Google Voice Speech Recognition Not Really Working - nader
http://rashmash.com/2009/07/22/google-voice-speech-recognition-not-really-working/

======
andr
You have a bit of an accent (no offense, so do I), and that would normally
throw the voice recognition software off.

~~~
Torn
Yeah, listening closely you can see how it gets "55" from his "bye bye". The
strong accent makes it quite hard for the software to guess the phonemes being
spoken.

You can't have a "one size fits all" approach in voice recognition if there
are people pronouncing exactly the same words in completely different ways.

~~~
nader
Thinking about the United States and the variety of cultures / accents
(mexican, chinese, etc.) I guess errors will happen very often.

~~~
joshfinnie
This brings up an interesting problem with the future coding of Google Voice.
How is it suppose to determine whether it's someone with an accent saying bye
bye or someone without an accent saying 55. Even if the dielects are taken
into consideration, that is still going to be quite a difficult problem!

~~~
joeyo
Context. You could compute the log likelihood ratio of "bye bye" given the
previous word versus the log likelihood ratio of "55" given the previous word.
If that's not good enough you can generalize this to use the previous N words.

Google should have a large enough corpus of text at it's disposal to be able
to approach the problem this way, and they very well may already be doing so.

------
tumult
Anecdote: the speech recognition works unsettlingly well for me.

~~~
IsaacSchlueter
I use my Google Voice number on my resume, to prevent recruiters from calling
my cell phone directly. I've noticed that clear speakers with the classic "TV
American" dialect are transcribed pretty well. However, if they have even the
slightest bit of a different dialect, especially Indian or Southeastern US, or
if they mumble a bit, then it's hilariously bad.

Stuff like this:

    
    
        hi  zack  this  is  anyhow  calling  from  a  car  
        and  i  actually  i  was  looking  part  of  the
        apartment  but  i  might  come  across  your  profile
        on  the  google  search  in  hi  i  understand  you're
        looking  for  approximately  400  caveman  income
        which  is  definitely  i  can  talk  to  you  but  i
        was  just  wondering  that  if  you  know  anyone  and
        some  yesterday  and  so  and  relatives  so  of
        because  the  anyway  i  was  looking  for  a  job
        for  their  business  hey  can  you  please  a
        deficit  to  me  on  my  email  address  is  and  and
        it's  shannon  N  T  S  as  in  sam  but  i  am  E  T
        oddity  of  I  T  dot  com  i  repeat  it  and  and
        at  the  car  dot  com  on  if  you  can  divert  my
        call  at  84  so  i'm  fine  i  need  to  little
        photo  a  photo  extension  8  if  i'm  not  i  repeat
        841  598  double  total  of  4  extension  8811  that
        would  be  great  thanks  so  much  for  your  time

~~~
mbrubeck
My experience is that recruiters (and other people who leave a lot of voice
mail) seem to be relatively easy for Google Voice to transcribe:

 _"matt hey this is todd johnson i work over dates 10 capital hey i wanted to
see if you knew of anyone open for new projects we've got up i've done
contract role that just popped in if you have a chance feel free to give me a
call (206) 300-2120 to seattle based company python project 6 months see if
you might be interested in that again this is todd johnson over it's 10
capital thanks"_

Whereas casual callers talk faster and tend to get transcribed like this:

 _"hello it's sarah this is christal donna considering it's jamie at coming
thanks bye"_

What I really like is the feature in the new Android app that highlights each
work of the transcript while playing the audio, _and lets you skip to any
word_ just by touching it.

------
braindead_in
Does anyone know whose engine they use? There are only a couple of Speaker
Independent Speech Recognition Engines out there. Have they developed their
own or licensed it from someone?

~~~
e1ven
Google wrote their own engine after hiring away a bunch of the top guys from
Nuance.

They started it with Goog-411, training it on requests like "Pizza in Oregon"
where they'd know if they were right or wrong because people would request
another listing if they weren't quite correct.

This is the next step of that, so far as I understand. They're giving it a
broader base of phrases to work on. I imagine the ultimate goal is to throw it
up against Youtube, so you can search for a phrase, and Google will give you
YouTube videos where someone speaks it.

------
jaydub
As Voice matures and more people use it Google will continually get a larger
dataset to draw information from. Perhaps it could feed into a better
probabilistic transcription approach in the veins of Google Translate.

~~~
TomOfTTB
You are probably right but there's a certain enjoyable irony to Google getting
dinged for this being they, more than anyone else, are responsible for
devaluing the word "Beta"

Hopefully incidents like this will make them realize the word had an important
use after all and that maybe it isn't the best idea to tag every product with
it for a virtually indefinite period of time.

------
eli
The technology being used to power my G1's "Search by Voice" feature is really
surprisingly good. I would assume it's the same stuff. Maybe it was just the
unusual names that threw it.

~~~
nader
You mean everybody should have a name like "Bob", "Bill" ? :) I have a feeling
that the Search by Voice Feature on the iPhone works better than on the
website but that might just be by accident

~~~
eli
Perhaps audio quality makes a difference? On the G1/iPhone it can sample audio
at whatever rate it needs, rather than just having to take what comes over the
line.

