
Bringing Google Live Transcribe's Speech Engine to Everyone - walterbell
https://opensource.googleblog.com/2019/08/bringing-live-transcribes-speech-engine.html
======
forgotmyhnacc
Title is misleading, they're not bringing the speech engine to everyone, you
still have to pay for Google cloud API requests. They're merely open sourcing
the Android app that calls out to the API. See the GitHub link in the article:
[https://github.com/google/live-transcribe-speech-
engine/blob...](https://github.com/google/live-transcribe-speech-
engine/blob/master/README.md)

~~~
wlesieutre
That's disappointing, the one exciting feature on the Pixel 4 is its offline
transcription feature. From the headline I figured that's what this was about.

~~~
walterbell
Google published research papers on offline recognition, the feature is
shipping on Android phones (i.e. one can inspect a working device), there is
an active OSS community for TensorFlow and lots of public work on speech
recognition. Many building blocks are public for motivated researchers.

------
Someone1234
I'm still waiting for meeting transcripts that understand who is speaking. I'm
legitimately surprised with how far we've come with speech recognition, how
this fairly common use-case is omitted.

I'm not even saying it needs to name the people in the meeting. Just
understand, contextually, if it is from "person 1" or "person 2." Then as it
records associate it with that name.

Maybe this can help? But Google's existing APIs might be able to do this.

~~~
rahimnathwani
Have you tried using the 'speaker diarization' feature of any of the
commercial TTS APIs (from Google, Microsoft etc.)?

------
Amicius
While I appreciate the audio codec discussion and bandwidth-to-accuracy
tradeoffs, how much of the speech recognition could be done on-device rather
than shipping it off to the cloud? It's my understanding that it's a matter of
installing pattern files for analyzing the audio without needing to fail over
to the cloud; how many GB are we talking to be able to cover normal daily
speech, assuming a minimum of jargon? For the hearing impaired, not having to
hit the cloud at all seems like the best option (and you don't need to
compress the audio at all or worry about cloud-trip bandwidth).

~~~
walterbell
Google published research [1][2] on offline recognition and it was rolled out
earlier in 2019. Model size for English is claimed to be under 100MB,
[https://techcrunch.com/2019/03/12/googles-new-voice-
recognit...](https://techcrunch.com/2019/03/12/googles-new-voice-recognition-
system-works-instantly-and-offline-if-you-have-a-pixel/)

[1]
[https://arxiv.org/pdf/1603.03185.pdf](https://arxiv.org/pdf/1603.03185.pdf)
(2016)

[2]
[https://arxiv.org/pdf/1811.06621.pdf](https://arxiv.org/pdf/1811.06621.pdf)
(2018)

------
hentrep
Can anyone attest to how accurate this transcription is for technical
subjects? I've attempted to integrate transcription into my work life
(pharma), but correcting errors related to tech jargon or
acronyms/abbreviations always outweighed any benefit.

~~~
myu701
I can't speak to Google's, but Dragon Professional supports legal and medical
jargon out of the box. Pricey, but for powerful offline speech recognition,
that's understandable.

~~~
MaupitiBlue
Dragon legal’s primary advantage is that it handles citation formatting. I
don’t think I’ve ever come across a legal term it didn’t know (maybe
dépeçage).

------
exikyut
This still requires an API key; it isn't local.

So I decided it would be a good idea to open an issue in the linked repo, to
find out what the costs would look like.

Turns out someone else already did that! [https://github.com/google/live-
transcribe-speech-engine/issu...](https://github.com/google/live-transcribe-
speech-engine/issues/14)

------
heyoni
This is cool but a bit worrisome...remember the days when it was too expensive
to log all audio transmissions on any platform/communication device, so you
thought you had some level of privacy? Projects like prism might be able to do
more than simply log metadata.

------
Schiphol
I've been looking for some way to transcribe my own talks---sometimes I find a
turn of phrase or example during the talk that strikes me as useful while I'm
giving it, but then forget it. Perhaps this can be coaxed into providing this
service for me.

~~~
ghaff
The ML transcription services work for giving you the gist of what's been said
if the recording is of decent quality. I should probably consider doing the
same thing. If it's a recording that I want to be "perfect" (e.g. a posted
podcast transcription), I still use human transcription; cleaning up the
machine transcription isn't worth my time. But if the transcription is mostly
to jog your own memory it's probably fine and much cheaper.

------
IshKebab
Presumably this requires a cloud API key of some kind? Google's speech
recognition API isn't free.

~~~
partiallypro
I'm not sure about pricing, but if it's like Azure's it's free up until a
certain amount of requests.

------
ptah
Is this used on YouTube?

