
Google has developed speech-recognition technology that actually works. - pelf
http://www.slate.com/id/2290516
======
orijing
Sorry to be a killjoy, but the premise of this article--that _Google_
developed the speech-recognition technology--hurts my feelings (to say the
least) and underestimates the contributions of the NLP community.

Speech recognition, like machine translation, is academic in origin, and much
of the work is still carried out in academia. For example, Google did not
"invent" machine translation. No, Google Translate is an adapted version of
academic systems. Perhaps the phrase tables are sharded and so is the language
model, but the general algorithms are the same. Sure, one of Berkeley's NLP
grads is working there, but it's basically an adapted version of what's
available. They publish papers like "Stupid Backoff" [1], but that makes them
as much a contributor as any other member of the NLP community.

Speech recognition is the same thing. Google is the company that takes
existing research and adapts it.

To claim that Google developed the speech recognition technology is to
discredit the contributions of _everyone else_ in the NLP community. Google
has been generous at funding NLP research at the university level. Do you
consider the results of those research "Google"'s?

 _Ultimately, the main difference is that Google has magnitudes more data and
the physical capacity to handle that, not that it solved some systems or
architectural bottleneck that has been limiting us._ Someone once said that
all you need is a crappy model and great data to build a good ML-based
algorithm...

[1] <http://acl.ldc.upenn.edu/D/D07/D07-1090.pdf>

~~~
joe_the_user
Having done only the smallest amount of work trying to apply academic
research, I have to say that the standard approach through a lot of AI is to
develop good ideas about as far as needed to get a few papers done (though I
don't know NLP in particular).

The work of putting this stuff together into a system which works consistently
and at-scale is _hard_.

Basically, it is unfortunate that the standard academia is publishing _papers_
(or pdf files) rather than publishing _libraries_. With this standard, the
academics can't even readily use each others algorithms with this situation.

But when academics have this standard, it seems dumb to hear the complaint "uh
no, you didn't do anything but apply our ideas..."

~~~
orijing
I am not an expert on speech recognition, but I am somewhat involved with
machine translation. For MT, it's very architectural systems issues that limit
us, first from trying different models and algorithms, but also in general.

How do you know ideas are "good" enough to be publishable unless you do plenty
of experiments involving billion (or trillion) word corpora? I have a hard
time imagining that research in other fields don't require validation.

~~~
joe_the_user
I'm not saying that the papers that get published aren't good or valid. "Good
enough to publish a paper" is indeed good.

It's just that once the paper is published, it becomes a cul-de-sac, a nice
little city with no roads leading in or out, etc. Other researchers can only
use the result by reproducing the idea by-hand (or at best through crufty
Matlab code).

Yes, I'm sure the papers I've scanned involved considerable work and data (I
worked in computer vision). But that work is often if not generally
unavailable to the reader of the paper.

The point is that in creating a working system, Google has to do more than
_extend_ academic research, even if academic research involved good ideas that
had been given some thorough tests in isolation.

------
JesseAldridge
I tried it up. It doesn't work went well services dept.

Maybe this is my voice doesn't give me a couple of words wrong in all 50
states. Did welcome center for dental bar kansas city missouri.

\----

I tried it out. It doesn't work quite as well as the article suggests.

Maybe it's just my voice, but it gets at least a couple of words wrong in
almost every statement. "It", "welcome", and "thoughts", for example, are
consistently misheard.

~~~
jerf
You can really see the "search term bias" in that snippet.

~~~
nostrademons
I've found the Android speech recognition works great for voice search and
sucks for everything else. I've written a HN post or two with it; it either
needed much hand editing or an outright translation to be intelligible.

------
IDisposableHero
It got " _how much wood would a woodchuck chuck if a wouldchuck would chuck
wood_ " right for me. I am impressed now.

But it couldn't handle " _Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn_
" Maybe I'm not pronouncing it quite right.

~~~
ppod
"how much wood would a woodchuck chuck if a wouldchuck would chuck wood"

That isn't really the best example, considering how these systems tend to
work. If their system has a giant bank of text it's using to predict the
statistical likelihood of your next word, once you've said "how much wood ..",
'would' is a very high probability candidate for the next word, and once you
get to 'woodchuck' the rest is statistically almost inevitable.

A better 'difficult' test would be something along the lines of 'colorless
green ideas sleep furiously', although we can't actually use that one since
that example is so famous it would likely turn up in a web-derived corpus many
times.

~~~
Cyndre
How is that even a test for it? If you input garbage into a system like that
your going to get garbage out. And for fun, walk up to a stranger and say
something like that and see if the stranger can provide useful information.

------
antiterra
It seems even strange to me that the article makes no attempt to survey other
current examples of speech-recognition technology in order to support the
unsaid implication of their lede. That is: "speech-recognition technology
developed by those other than Google does not work."

I just downloaded the Bing app for iPad last night, and noticed it has a
pretty decent speech-recognition engine from a company they acquired a few
years back: TellMe. I tried all the examples given in the Slate article, and
they were recognized just fine.

This makes me curious, are there a number of current-generation speech-
recognition technologies that work at the level of Google's?

I should note that I didn't receive the desired behavior once my speech was
recognized. When I asked the math question, a link to Wolfram Alpha was given,
but I would have to click that link to get the answer. I had to go to maps to
get any kind of relevant answer for "Directions to McDonalds," and I had to
just say McDonalds and then click Directions to get the actual information.
This failing appears to be a trait of the iOS app itself. Hand typing the math
query into Bing on a proper browser did give me Google Calculator style
results.

~~~
ezy
That's because it's Google PR (<http://www.paulgraham.com/submarine.html>).
Nuance Communications[1] (who makes Dragon and N other speech rec systems), is
really the leader in this space. They've done server-based recognition
solutions ~forever, and is the solution Siri uses for their app, for example.

[1] Note: this is not the same as the Nuance Mike Cohen started, but a
conglomeration of speech companies purchased over the the years which
inherited the name.

EDIT: Added some more detail

~~~
Tycho
Yeah, but... i've tried Google Voice Search and it works incredibly well;
whereas Dragon on the iPad did not work well enough to actually use for its
intended purpose (although it was fairly impressive - i had low expectations).

------
MatthewPhillips
I wonder at what point Apple is going to have to start building these kinds of
technologies. Purchasing Siri was supposed to be a step in that direction but
nothing has come of it yet.

I worry that it's not in Apple's DNA to build products they can't directly
charge for. Apple doesn't do freemium. The hope would be that third parties
would pick up the slack.

However I'm not sure that startups can match Google in bigdata. So how will
Apple catch up in voice recognition? In mapping (which also uses Android data
to improve things like traffic and rerouting)?

~~~
jokermatt999
I honestly can't see Apple doing speech recognition. It's good, but it's only
99% good, and Apple are known for being perfectionists.

I'm not sure about mapping, it also doesn't seem like an Apple product to me,
but I can't really articulate why.

~~~
mdemare
That is a cliche, and an incorrect one. Take autocorrect for instance - hardly
a perfect product. (<http://damnyouautocorrect.com/>)

------
micah63
Whatever anyone wants to say negatively about this article, it's bang on. I
just got my first android phone, it has 2.2 and the little microphone is my
new best friend. I talk out todos, write emails, texts, search youtube, search
everything, it blows my mind. The only thing it's not good at is uncommon
names and places (like my name "Micah" and "Quebec"). Rock on Google.

------
martythemaniak
"It even works if you've got an accent."

I have an indeterminate accent and my voice is on the low side of Bass so I
trip up google voice pretty badly - it's mostly useless for me. FWIW, Rockband
also can't make sense of what I say. I do wonder when it'll be good enough to
understand me.

~~~
sagarm
I hate to break it to you, but Rockband doesn't score you based on what you
say. It just scores you based whether you're hitting the right notes. :P

------
kajecounterhack
Does anyone happen to know if there are significant companies in the speech-
recognition space besides Dragon and Google?

~~~
CWuestefeld
One assumes that the NSA is doing a lot.

~~~
asuth
When I took a speech rec class last semester, we had a guy from BBN
(subsidiary of Raytheon) give a talk about large-scale, extremely fast audio
transcription. As in, systems that could process audio 30-40,000 times faster
than real-time. They traded off recognition accuracy to get this, so their
accuracy was around 50-60%, as I recall. I asked why something like that would
be useful, and he said if you're looking at a lot of data (which I heard:
eavesdropping on an entire telephone network) then all you need is a general
idea of what people are saying and a few keywords before you can zoom in on
specific clips for more thorough analysis.

So yes, I'm sure NSA is interested.

------
jodrellblank
What are the chances that Google can pick up enough from our voices to
biometrically identify us from a crowd in future?

E.g. On a Google phone conversation or when licensed to a surveillance
company.

------
PonyGumbo
Hopefully they'll be able to apply this to Google Voice. The voicemail
transcriptions are almost always hilariously wrong.

~~~
georgemcbay
Google Voice actually uses the same technology and same dataset, AFAIK, which
is why I was so confused by reading this article.

Google's stuff is pretty good some of the time, but they've hardly solved this
problem to the degree the article suggests (as anyone who has actually used
this for more than 5 minutes could tell you).

~~~
asuth
long-form transcription is a pretty different problem for language models than
parsing search queries. There's lots of audio-processing overlap sure, but
parsing a voicemail definitely has different, harder challenges.

