
Google opens access to its speech recognition API - jstoiko
http://social.techcrunch.com/2016/03/23/google-opens-access-to-its-speech-recognition-api-going-head-to-head-with-nuance/
======
blennon
This is HUGE in my opinion. Prior to this, in order to get near state-of-the-
art speech recognition in your system/application you either had to have/hire
expertise to build your own or pay Nuance a significant amount of money to use
theirs. Nuance has always been a "big bad" company in my mind. If I recall
correctly, they've sued many of their smaller competitors out of existence and
only do expensive enterprise deals. I'm glad their near monopoly is coming to
an end.

I think Google's API will usher in a lot of new innovative applications.

~~~
BorisEm
Other "state-of-the-art" speech recognition solutions already exist. For
example, Microsoft has been offering it through its Project Oxford service.
[https://www.projectoxford.ai/speech](https://www.projectoxford.ai/speech)

~~~
hardwaresofton
Also, CMUSphinx and Julius:

[http://cmusphinx.sourceforge.net/](http://cmusphinx.sourceforge.net/)

[http://julius.osdn.jp/en_index.php](http://julius.osdn.jp/en_index.php)

It is amazingly easy to create speech recognition without going out to any API
these days.

~~~
ytjohn
I first learned about CMUSphinx from the [Jasper
Project]([https://jasperproject.github.io/](https://jasperproject.github.io/)).
While Jasper provided an image for the Pi, I decided to go ahead and make a
scripted install of CMUSphinx. I spent something like 2 frustrating days
attempting to get it installed by hand in a repeatable fashion before giving
up.

This was 2 years ago, so maybe it's simple now, but I didn't find it
"amazingly easy" back then.

I do have a number of projects where I could definitely use a local speech
recognition library. I have used [Python
SpeechRecognition]([https://github.com/Uberi/speech_recognition/blob/master/exam...](https://github.com/Uberi/speech_recognition/blob/master/examples/microphone_recognition.py))
to essentially record and transcribe from a scanner. I wanted to take it
further, but google at the time limited the number of requests per day.
Today's announcement seems to indicate they will be expanding their free
usage, but a local setup would be much better. I'd like to deploy this in a
place that might not have reliable Internet.

~~~
uberi
In my experiences, the issues with building CMU Sphinx are mainly unspecified
dependencies, undocumented version requirements, and forgetting to sacrifice
the goat when the MSVC redistributable installer pops up.

We've written detailed, up-to-date instructions [1] for installing CMU Sphinx,
and now also provide prebuilt binaries [2]!

If you're interested in not sending your audio to Google, CMU Sphinx and other
libraries (like Kaldi and Julius), are definitely worth a second look.

[1]
[https://github.com/Uberi/speech_recognition/blob/master/refe...](https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst)
[2]
[https://github.com/Uberi/speech_recognition/tree/master/thir...](https://github.com/Uberi/speech_recognition/tree/master/third-
party)

~~~
dr_zoidberg
Yeah I'm gonna leave a reply here just in case I need to find this again
(already opened tabs, but you never know). This might be big for a stalled
project at work. If this can un-stall that, I'll sure owe you a beer ;)

------
CaveTech
> To attract developers, the app will be free at launch with pricing to be
> introduced at a later date.

Doesn't this mean you could spend time developing and building on the platform
without knowing if your application is economically feasible? Seems like a
huge risk to take for anything other than a hobby project.

~~~
sologoub
I was really hoping they would post at least a preview of what the pricing
will be.

AppEngine pricing change resulted in a nasty surprise for a lot of people and
a lot of vitriol towards Google. Just one article on this from many:
[http://readwrite.com/2011/09/02/google-app-engine-pricing-
an...](http://readwrite.com/2011/09/02/google-app-engine-pricing-ange/)

While the platform has moved way ahead of where it was in 2011, the memory of
this is still in the back of people's mind and it would be a good move on
Google's part to at least try to engender more trust by giving directional
pricing.

~~~
bduerst
Wasn't that five years ago? When it came out of preview (and prices were
expected to change)?

~~~
sologoub
It was - 2011 is mentioned in both article and my comment.

I just wish Google wouldn't bring back memories of that by not disclosing
pricing of a very promising API.

GAE has moved quite far ahead since then, but many people still won't consider
it after the bad experience. Perceptions die hard...

~~~
dragonwriter
They probably are using usage data in the free period to drive decisions on
pricing.

~~~
sologoub
Sure, would expect nothing less, but I'm also sure they have some sense of the
range they are looking for. Such a range could help developers to establish
whether their use case is viable or not.

As is, the ambiguity is a deterrent from too much time investment.

~~~
bduerst
Or the data determines that they have to price outside the hypothetical early-
release range, and then you have people complaining that you lied.

------
zkirill
I came across CMU Sphnix speech recognition library
([http://cmusphinx.sourceforge.net](http://cmusphinx.sourceforge.net)) that
has a BSD-style license and they just released a big update last month. It
supports embedded and remote speech recognition. Could be a nice alternative
for someone who may not need all of the bells and whistles and prefers to have
more control rather than relying on an API which may not be free for long.

Side note: if anyone is interested in helping with an embedded voice
recognition project please ping me.

~~~
mark_l_watson
There is a project for Rapberry Pi to use Sphinx to roll your own Amazon Echo
like device. You might want to take a look at that.

~~~
depingus
Mycroft [https://mycroft.ai/](https://mycroft.ai/)

~~~
mark_l_watson
Thanks for the link, but this is the Raspberry Pi project:
[https://github.com/jasperproject](https://github.com/jasperproject)

------
hardik988
Tangentially related: Does anyone remember the name of this startup/service
that was on HN (I believe), that enables you to infer actions from plaintext.

Eg: "Switch on the lights" becomes

{"action": "switch_on", "thing" : "lights" }

etc.. I'm trying really hard to remember the name but it escapes me.

Speech recognition and <above service> will go very well together.

~~~
ar7hur
Our service Wit.ai (YC W14) does just that.

Demo:
[https://labs.wit.ai/demo/index.html](https://labs.wit.ai/demo/index.html)

~~~
adyus
I've been meaning to ask someone at Wit.ai this for a while:

Since your service is completely free, how do you plan on surviving? Would you
open source any parts of Wit.ai should you go under?

I feel these are important questions to ask before investing time & energy
into using your otherwise awesome service...

~~~
ragebol
They've been acquired by Facebook, the main guys behind are now making
Facebook M.

~~~
JustFinishedBSG
> They've been acquired by Facebook

Ah so it will probably be discontinued

------
hardwaresofton
In case you're not interested in having google run your speech recognition:

CMU Sphinx:
[http://cmusphinx.sourceforge.net/](http://cmusphinx.sourceforge.net/)

Julius:
[http://julius.osdn.jp/en_index.php](http://julius.osdn.jp/en_index.php)

------
melvinmt
If you're having trouble (like me) to find your "Google Cloud Platform user
account ID" to sign up for Limited Preview access, it's just the email address
for your Google Cloud account. Took me only 40 minutes to figure that one out.

------
josephcooney
I wrote a client library for this in C# by reverse engineering what chrome did
at the time (totally not legit/unsupported by google, possibly against their
TOS). I have never used it for anything serious, and am glad now there is an
endorsed way to do this.

[https://bitbucket.org/josephcooney/cloudspeech](https://bitbucket.org/josephcooney/cloudspeech)

------
theseatoms
Key sentence:

> The Google Cloud Speech API, which will cover over 80 languages and will
> work with any application in real-time streaming or batch mode, will offer
> full set of APIs for applications to “see, hear and translate,” Google says.

------
jaflo
Pretty impressive from the limited look the website
([https://cloud.google.com/speech/](https://cloud.google.com/speech/)) gives:
the fact that Google will clean the audio of background noise for you and
supports streamed input is particularly interesting.

I don't know I should feel about Google taking even more data from me (and
other users). How would integrating this service work legally? Would you need
to alert users that Google will keep their recordings on file (probably
indefinitely and without being able to delete them)?

------
robohamburger
Unless I have gone crazy google has had a STT available to tinker with for
awhile. It is one of the options for jasper [1]. Hopefully this means it will
be easier to setup now.

Would be nice if they just open sourced it though but I imagine that is at
crossed purposes with their business.

[1]
[https://jasperproject.github.io/documentation/configuration/](https://jasperproject.github.io/documentation/configuration/)

------
jonah
SoundHound released Houndify[1], their voice API last year which goes deeper
than just speech recognition to include Speech-to-Meaning, Context and Follow-
up, and Complex and Compound Queries. It will be cool to see what people will
do with speech interfaces in the near future.

[1] [https://www.houndify.com/](https://www.houndify.com/)

------
amelius
Why isn't speech recognition just part of the OS? Like keyboard and mouse
input.

~~~
christianmann
Because speech recognition occurs on Google servers, not locally.

~~~
cryptoz
Speech recognition can occur either locally or on Google's servers. Since
about 2012 [1], Android has been able to do some types of speech recognition,
like dictation, on local devices. Additionally, Google Research has recently
expanded on this functionality and it seems like much more of the speech
recognition will be done locally [2].

[1] [http://www.androidcentral.com/jelly-bean-brings-offline-
voic...](http://www.androidcentral.com/jelly-bean-brings-offline-voice-typing)

[2] [http://www.zdnet.com/article/always-on-google-ai-gives-
andro...](http://www.zdnet.com/article/always-on-google-ai-gives-android-
voice-recognition-that-works-on-or-offline/)

~~~
amelius
> always-on-google-ai-gives-android-voice-recognition-that-works-on-or-offline

Since Android is open-source, would that mean that the voice recognition
software (and/or trained coefficients) could, in principle, be ported to
Linux?

~~~
kuschku
> since Android is Open Source

You haven't used Android since 2010, have you?

In the latest versions, there is no more Open Source anything.

Calendar, Contacts, Home screen, Phone app, Search, are all closed source now.

(btw, all of them, including the Google app, used to be open in Gingerbread)

You can't do TLS without going through Google apps (or packaging
spongycastle), you can't do OpenGL ES 3.2, you can't use location anymore, nor
use WiFi for your own location implementation.

Since Marshmallow, you are also forced to use Google Cloud Messaging, or the
device will just prevent your app from receiving notifications.

To "save battery power" and "improve usability", Google monopolized all of
Android.

~~~
kuschku
To whomever downvoted the above post: Please clarify why you think it isn’t
relevant to the discussion, or provide counterarguments. All the points I made
can be easily sourced (if you wish, I can even post them in here), and are all
verifiable.

~~~
amelius
It could also have been the tone of your first sentence.

Not that I mind personally.

~~~
kuschku
It wasn’t meant aggressively, just as a question. It’s quite possible that the
author of the comment I answered to had not used Android for a few years, or
had never cared – or had just missed the announcements of the official apps
not being supported anymore.

Oh, wait, there were no announcements, they were dropped silently.

------
mobiledev88
Houndify launched last year and provides both speech recognition and natural
language understanding. They have a free plan that never expires and
transparent pricing. It can handle very complex queries that Google can't.

------
timbunce
FWIW I'd just finished a large blog post researching ways to automate podcast
transcription and subsequent NLP.

It includes lots of links to relevant research, tools, and services. Also
includes discussion of the pros and cons of various services
(Google/MS/Nuance/IBM/Vocapia etc.) and the value of vocabulary uploads and
speaker profiles.

[http://blog.timbunce.org/2016/03/22/semi-automated-
podcast-t...](http://blog.timbunce.org/2016/03/22/semi-automated-podcast-
transcription-2/)

~~~
metastew
As a hard of hearing aspiring software developer, this would be a godsend for
me if someone came up with a reliable automated transcription service. I'm
often dismayed by the amount of valuable information locked in podcasts and
non-transcribed videos and have to rely on goodwill of volunteers to give me
transcripts.

Pycon did a admirable effort to live caption their talks last year but some of
those transcripts never got uploaded along with the talks which is puzzling,
but I suppose it could be due to lack of timecodes.

I've subbed to your blog and hopefully I can contribute whatever I can to make
this work out.

------
vram22
For anyone who wants to try these areas a bit:

My trial of a Python speech library on Windows:

Speech recognition with the Python "speech" module:

[http://jugad2.blogspot.in/2014/03/speech-recognition-with-
py...](http://jugad2.blogspot.in/2014/03/speech-recognition-with-python-
speech.html)

and also the opposite:

[http://code.activestate.com/recipes/578839-python-text-to-
sp...](http://code.activestate.com/recipes/578839-python-text-to-speech-with-
pyttsx/?in=user-4173351)

------
danso
FWIW, Google followed the same strategy with Cloud Vision (iirc)..they
released it in closed beta for a couple of months [0], then made it generally
available with a pricing structure [1].

I've never used Nuance but I've played around with IBM Watson [2], which gives
you 1000 free minutes a month, and then 2 cents a minute afterwards. Watson
allows you to upload audio in 100MB chunks (or is it 10 minute chunks?, I
forgot), whereas Google currently allows 2 minutes per request (edit:
according to their signup page [5])...but both Watson and Google allow
streaming so that's probably a non-issue for most developers.

From my non-scientific observation...Watson does pretty well, such that I
would consider using it for quick, first-pass transcription...it even gets a
surprising number of proper nouns correctly including "ProPublica" and "Ken
Auletta" \-- though fudges things in other cases...its vocab does not include
"Theranos", which is variously transcribed as "their in house" and "their
nose" [3]

It transcribed the "Trump Steaks" commercial nearly perfect...even getting the
homophones in " _when it comes to great steaks I just raise the stakes the
sharper image is one of my favorite stores with fantastic products of all
kinds that 's why I'm thrilled they agree with me trump steaks are the world's
greatest steaks and I mean that in every sense of the word and the sharper
image is the only store where you can buy them_"...though later on, it messed
up "steak/stake" [4]

It didn't do as great a job on this Trump "Live Free or Die" commercial,
possibly because of the booming theme music...I actually did a spot check with
Google's API on this and while Watson didn't get "New Hampshire" at the
beginning, Google _did_ [4]. Judging by how well YouTube manages to caption
videos of all sorts, I would say that Google probably has a strong lead in
overall accuracy when it comes to audio in the wild, just based on the data it
processes.

edit: fixed the Trump steaks transcription...Watson transcribed the first
sentence correctly, but not the other "steaks"

[0] [http://www.businessinsider.com/google-offers-computer-
vision...](http://www.businessinsider.com/google-offers-computer-vision-
tech-2015-12)

[1] [http://9to5google.com/2016/02/18/cloud-vision-api-beta-
prici...](http://9to5google.com/2016/02/18/cloud-vision-api-beta-pricing/)

[2] [https://github.com/dannguyen/watson-word-
watcher](https://github.com/dannguyen/watson-word-watcher)

[3]
[https://gist.github.com/dannguyen/71d49ff62e9f9eb51ac6](https://gist.github.com/dannguyen/71d49ff62e9f9eb51ac6)

[4]
[https://www.youtube.com/watch?v=EYRzpWiluGw](https://www.youtube.com/watch?v=EYRzpWiluGw)

[5] [https://services.google.com/fb/forms/speech-api-
alpha/](https://services.google.com/fb/forms/speech-api-alpha/)

~~~
nfriedly
It's 100MB chunks, and you can compress it with opus or flac to squeeze in
more audio per chunk :)

------
ocdtrekkie
"Google may choose to raise those prices over time, after it becomes the
dominant player in the industry."

...Isn't that specifically what anticompetition laws were written to prevent?

~~~
KyleBrandt
As a developer I might be more worried about it _not_ becoming at least
amongst the dominant players because they might just drop it.

But maybe they only do that with consumer facing items?

~~~
ocdtrekkie
Google kills off APIs often. Remember the whole Translate API fiasco? Though
that was for overuse, not underuse.

~~~
skj
This is a Google Cloud Platform service, and is subject to long deprecation
policies at the very least.

------
j1vms
I would say that Google's main goal here is in expanding their training data
set, as opposed to creating a new revenue stream. If it hurts competitors
(e.g. Nuance) that might only be a side-effect of that main objective, and
likely they will not aim to hurt the competition intentionally.

As others here have pointed out, the value now for GOOG is in building the
best training data-set in the business, as opposed to just racing to find the
best algorithm.

~~~
jrv
Question from a machine learning noob: how would they use an unlabeled dataset
for training? Would Google employees listen to it all (if the ToS would even
allow that) and transscribe it, then use the result for training? Or is there
another way to make it useful without being labeled?

~~~
TuringNYC
[https://en.wikipedia.org/wiki/Unsupervised_learning](https://en.wikipedia.org/wiki/Unsupervised_learning)

~~~
jrv
While aware of unsupervised learning, I thought I had read somewhere that it
wasn't yet a solved problem to use it for this kind of learning. But that
might be wrong, and the Wikipedia article mentions some interesting examples.

~~~
IanCal
Quite a few methods of running recognition work in two stages (as a very large
simplification).

First, you take the huge input (because something like sound has a huge amount
of data in it, similarly with images there are a _lot_ of pixels) and learn a
simpler representation of it.

The second problem of mapping these nice dense features to actual things can
be solved in different ways, even simple classifiers can perform well.

This doesn't actually need any labelled data. I just want to learn a smaller
representation. For example, if we managed to learn a mapping from bits of
audio to the phonetic alphabet then our speech recognition problem becomes one
of just learning the mapping from the phonetic alphabet to words which is a
far nicer problem to have.

Some ways of "deep learning" solve this first problem (of learning neater
representations) through a step by step process of what I like to refer to as
laziness.

Instead of trying to learn a really, really high level representation of your
input data just learn a slightly smaller one. That's one layer. Then once
we've got that we try and learn a smaller/denser representation on top of
that. Then again, and again, and again.

How can you learn a smaller representation? Well a good way is to try and get
a single layer to be able to regenerate its input. "Push" the input up, get
the activations in the next layer, run the whole thing backwards and see how
different your input is. You can then use this information to tweak the
weights to make it slightly better the next time. Do this for millions and
millions of inputs and it gets pretty good. This technique has been known
about for a long time, but one of the triggers for the current big explosion
of use was Hinton working out that this back and forth only really needs to be
done once rather than 100 times (which was thought to be required beforehand).

Hinton says it made things 100,000 times faster because it was 1% of the
computation required and it took him 17 years to realise it in which time
computers got 1000 times faster. Along with this, GPUs got really _really_
fast and easier to program. I took the original Hinton work that took weeks to
train and had it running in hours back in 2008 on a cheap GPU. So before ~2006
this technique would have taken years of computer time, now it's down to
minutes. Of course, that's then resulted in people building significantly
larger networks that take much longer to train but would have been infeasible
to run before.

But you still need a lot of unlabelled data. While I doubt google is doing
that with this setup, they have done something before, where they setup a
question answering service in the US that people could call I think for free
to collect voice data.

TL;DR

You need labelled data. But it turns out you can learn most of what you need
with unlabelled data, leaving you with a much simpler problem to solve. That's
great because labelled data is massively more expensive than unlabelled data.

------
zkhalique
Has anyone tried adding OpenEars to their app, to prevent having to send
things over the internet from e.g. a basement? Is it any good at recognizing
basic speech?

~~~
alexcaps
Pretty bad except for keyword spotting.

------
szimek
In the sign-up form they state that "Note that each audio request is limited
to 2 minutes in length." Does anyone know what "audio request" is? Does it
mean that it's limited to 2 minutes when doing real-time recognition, or just
that longer periods will count as more "audio requests" and result in a higher
bill?

Do they provide a way to send audio via WebRTC or WebSocket from a browser?

------
amelius
Nice. But what I want is open-source speech recognition.

~~~
nostrademons
Like most machine-learning applications, the source code isn't the interesting
part, the _data_ is. Google started by training on millions of phrases from
Google 411, and then they've been able to continue training anytime someone
issues a voice command to an Android device. They have orders of magnitude
more data than you could fit into a GitHub repository.

~~~
tomp
Couldn't you just download subtitles for old movies and train using those?

~~~
oh_sigh
Doesn't even need to be old movies. Certain types of video content in the US
is legally required to have subtitles(e.g. a lot of youtube content). You
could programmatically download them and use that as your training set. And,
since it is a transformative work, you can train your models even on
copyrighted works freely.

~~~
nostrademons
Much of the YouTube content has auto-generated subtitles, i.e. Google is
running their speech-recognition software on the audio stream and then using
that to caption the video. If you used that as your training set, you're
effectively training on the output of an AI. Which is kind of a clever way to
get information from Google to your open-source library, but will necessarily
be lower-fidelity than just using the Google API directly.

------
z3t4
At least offer a self hosted version. Maybe it's just me, but I'm not
comfortable sending every spoken word to Google.

~~~
kuschku
Exactly this.

Especially because Google's version is trained with illegally obtained user
data (no, changing your ToS doesn't allow you to use previously collected data
for new purposes in the EU).

We, as a society, should discuss if software trained on user data should be
required to be available to those who have provided that data. If for software
developed by training neural networks even any copyright can exist — or if
it's by definition public domain.

Software trained with neural networks provides an immense advantage for
existing monopolies, and makes it extremely hard for competitors.

If this trend continues, startups will become impossible for software that
depends on being trained with huge datasets.

------
yeukhon
I thought I read open source, then I realized open access. I believe in the
past there was a similar API, or maybe it was based on Google Translate. But I
swear at one point people wrote hackathon projects using some voice APIs.

------
dominotw
Nice! Curious how it compares to amazon's avs that went public this week.

[https://github.com/amzn/alexa-avs-raspberry-
pi](https://github.com/amzn/alexa-avs-raspberry-pi)

------
saurik
I think this more directly competes with the IBM Watson speech API, not
Nuance?

~~~
jonknee
Why? This is almost exactly what Nuance provides:

[https://developer.nuance.com/public/index.php?task=memberSer...](https://developer.nuance.com/public/index.php?task=memberServices)

~~~
swah
I wanted to see if there was good speech recognition and could Google about
Nuance... Now that its going to be "disrupted" I found out about it.. ironic.

Try Googling "speech recognition api"...

------
Negative1
I would be hesitant to build an entire application that relied on this API
only to have it removed in a few months or years when Google realizes it sucks
up time and resources and makes them no money.

~~~
visarga
It seems there are a few options. Devs could use an abstraction layer and
change providers with very little effort. Also, by the way things seem to be
going, voice will become more and more prevalent.

------
hans
cool, next up is a way to tweak the speech API to recognize patterns in stocks
and capex .. wasn't that what Renaissance Technologies did ?

really GooG should democratize quant stuff next .. diy hedge fund algos.

------
alfonsodev
I'm reading many libraries here, I wonder what's the best open and multi
platform software for spech recognition to code with vim, Atom etc. I only saw
a hybrid system working with dragon + Python on Windows. I would like to
train/ customize my own system since I'm starting to have pain in tendons, and
wrists. Do you think this Google Api can make it? Not being local looks like a
limiting factor for speed, lag.

~~~
lovelearning
Nothing out of the box as far as I know. You'll have to DIY.

Have a look at CMUSphinx/Pocketsphinx [1]. I wrote a comment about training it
for command recognition in a previous discussion[2].

It supports BNF grammar based training too [3], so I've a vague idea that it
_may_ be possible to use your programmming language's BNF to make it recognize
language tokens. I haven't tried this out though.

Either way, be prepared to spend some time on the training. That's the hardest
part with sphinx.

Also, have you seen this talk for doing it on Unix/Mac [4]? He does use
natlink and dragonfly, but perhaps some concepts can be transferred to sphinx
too?

[1]: [http://cmusphinx.sourceforge.net/](http://cmusphinx.sourceforge.net/)
[2]:
[https://news.ycombinator.com/item?id=11174762](https://news.ycombinator.com/item?id=11174762)
[3]:
[http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/...](http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/jsgf/JSGFGrammar.html)
[4]:
[https://www.youtube.com/watch?v=8SkdfdXWYaI](https://www.youtube.com/watch?v=8SkdfdXWYaI)

------
zelcon
Great, now when will Google let us use the OCR engine they crowdsourced from
us over the last decade with ReCaptcha. tesseract is mediocre.

~~~
scotth
[https://cloud.google.com/vision/](https://cloud.google.com/vision/)

------
willwill100
Will be interesting to compare with
[http://www.speechmatics.com](http://www.speechmatics.com)

~~~
spdustin
There's no hint of pricing that I could find on their page. Do you know about
their pricing model?

~~~
willwill100
If you sign up you can scroll down and see the pricing on
[https://www.speechmatics.com/account/](https://www.speechmatics.com/account/)

------
chair-law
What is the difference from a speech recognition API and [NLP
libraries]([https://opennlp.apache.org/](https://opennlp.apache.org/))? This
information was not easily found with a few google searches, so I figured
others might have the same question.

~~~
jbkkd
NLP libraries process written text. This new API processes speech and extracts
text from it.

------
infocollector
What is the best speech recognition engine, assuming one has no internet?

~~~
sounds
At this time, CMU Sphinx or Julius are good choices, not great but worth a
look.

------
vincent_s
Don't get too excited:
[https://www.google.com/search?q=google+shuts+down+api](https://www.google.com/search?q=google+shuts+down+api)

------
flanbiscuit
I hope this opens up some new app possibilities for the Pebble Time. I believe
right now they use Nuance and it's very limited to only responding to texts.

------
mysticmode
I'm not sure, what will happen to Google's webspeech API in the future.
Whether it will be continued as a free service.

------
mark_l_watson
I think they are pushing back against Amazon's Echo speech APIs, which I have
experimented with.

I just applied for early access.

------
omarforgotpwd
Fuck. Yes. IBM has a similar API as well as part of their Watson APIs but I
really wanted to use Google's.

------
sandra_saltlake
Sounds like this is bad news for Nuance,

------
E4life
Finally, this is something that will be the main way for communication in the
future.

------
jupp0r
Anybody got the api docs yet? I wonder if I can stream from chrome via webrtc.

~~~
zmodem
Can't you just use the Web Speech API for that?
[https://dvcs.w3.org/hg/speech-api/raw-
file/tip/speechapi.htm...](https://dvcs.w3.org/hg/speech-api/raw-
file/tip/speechapi.html)

~~~
jupp0r
I want to use cloud speech api for various reasons

------
braindead_in
How well does this work with conversational speech? Any benchmarks?

------
BinaryIdiot
So this was very, very exciting until I realized you have to be using Google
Cloud Platform to sign up for the preview. Unfortunately all of my stuff is in
AWS and I _could_ move it over but I'm not going (far too much hassle to
preview an API I may not end up using, ultimately).

Regardless this is still very exiting. I haven't found anything that's as good
as Google's voice recognition. I only hope this ends up being cheap and
accessible outside of their platform.

~~~
oaktowner
I don't think there's any requirement that you use Google Compute Engine in
order to use this API. Yes, you sign up for an account, but of _course_ you
have to sign up for an account to use it. This API is part of the platform.

Similarly, you can use the Google Translate API without using Compute Engine,
App Engine, etc.

Note: I work for Google (but not on any of these products).

~~~
BinaryIdiot
Ah that makes sense. It gave me the impression I had to run my services on
there. I don't know if I'm the only person to screw that up or not. Thanks for
clearing that up!

