
Real-time Continuous Transcription with Live Transcribe - iron0013
https://ai.googleblog.com/2019/02/real-time-continuous-transcription-with.html
======
jawns
This app is cool and useful, but a major piece of the puzzle is missing.

Communication between a Deaf person and a hearing person is a two-way street,
and this tool really only addresses one of those streets.

The tool transcribes the audible speech of the hearing person, allowing the
Deaf person to read the transcription.

But if the Deaf person wants to sign a response, they're out of luck. Instead,
they need to type their response on the device.

That's OK, but in a "this'll make do in a pinch" kind of way. The ideal is
that both the Deaf person and the Hearing person are able to communicate
without typing.

There is some pretty cool research around signing-to-text translation -- Matt
Huenerfauth [https://huenerfauth.ist.rit.edu](https://huenerfauth.ist.rit.edu)
is doing some really interesting stuff, for example -- but as far as I know
it's not ready for prime time.

~~~
chipchesterton
There is a keyboard button on the toolbar (portrait mode). When you press it,
a keyboard shows up so that the user can communicate back to the speaker.

~~~
adrianmonk
I read about it and tried it out, and I couldn't figure out how to use it.
After typing my text, I kept looking for the button to trigger text to speech.

But, it just dawned on me that the idea is to type something and then show it
to the other person. Which works, it just wasn't what I was expecting.

~~~
chipchesterton
Ah yes. Speech to text would be cool, but I guess the speech would get heard
by the app again.

------
lbacaj
Although there is a lot of hype around AI/ML, there are a ton of pragmatic
things coming out of the research much like this Live Transcription by Google
that I think will be incredibly helpful to people around the world.

The thing I am most excited about is that most of this work is being done in
the open, in the very least much of this is being open sourced by Google,
Facebook, and other giants. For all the heat they have been taking lately I do
think they deserve to be applauded for this and Of course this is mostly
happening in order to sell us cloud services but it cannot be understated how
helpful this will be to many around the world.

Another thing I’m excited about is what I can build with these things, and
even more what others will create with these models as building blocks in
their applications.

As an example, and complete self promotion here, I was able to use some open
source models by Google on TensorFlow to build a cross platform App that can
read Articles to you using these neural networks. The amazing thing is I built
it mostly on nights and weekends, which shows how easy some of this is to work
with now, you can check it out here if you like
[https://articulu.com](https://articulu.com)

~~~
lucidrains
Great job with the app! Which open source model did you use to generate the
speech?

------
elicash
Really wish they let people save a transcript.

~~~
Someone1234
Agreed.

I love them doing accessibility stuff, and I can see this being useful for
specific people in my life. But when I initially read the announcement I
thought about using it to transcribe business meetings.

Obviously the accuracy of this won't be "court reporter"-levels, but for
casual note taking it would be "good enough."

~~~
clintonb
Google Docs has a voice typing functionality [1]. I used Soundflower [2] to
send audio playback to a digital input device, and had Google Docs listen to
this device. This worked "okay" for transcribing a blog post from a voice
memo.

[1] [https://qz.com/work/1087765/how-to-transcribe-audio-fast-
and...](https://qz.com/work/1087765/how-to-transcribe-audio-fast-and-for-free-
using-google-docs-voice-typing/) [2]
[https://github.com/mattingalls/Soundflower](https://github.com/mattingalls/Soundflower)

------
ChuckMcM
So GPDR limits having your Google Home device listen in all the time because,
well that is creepy, but maybe you want a transcript so let us listen in on
your conversations please.

Given that the service is free, you have to know that Google has a plan to
monetize that data with someone you don't know yet. When you find out who that
is you won't be able to 'undo' the transcriptions you have done in the past.

I also think it is ridiculous that this needs "the cloud". I had pretty good
speaker dependent transcription working on an Intel 486 processor, and so find
it difficult to believe that given literally 500x the compute power and 2000x
the memory on a typical phone you can't do this all locally?

------
glalonde
now they just need to bolt it onto Glass or some other HUD. IRL subtitles.
Could put a mic array on the frame too, and then get an indication of where
the speaker is so you can be alerted even if you aren't facing the person.

~~~
VikingCoder
Plus, this:

[https://www.youtube.com/watch?v=zL6ltnSKf9k](https://www.youtube.com/watch?v=zL6ltnSKf9k)

Have the HUD have a dot. When you're wearing the glasses, you aim the red dot
on the speaker. The transcription could come just from that speaker.

Just a thought.

------
jkravitz61
how long until we start seeing computer vision applications that translate
sign language? It's a pretty hard problem, but definitely something that's
feasible with the right methods.

~~~
dontreact
Curious: how do you imagine people using this?

~~~
mosselman
Wouldn't this be used in the same way as with any other language? I.e.
customer only knows english sign language, but I don't, luckily my cash-
register has a camera that can translate sign language to my language?

------
supermatt
What is the current state of open-source CSR/transcription?

------
xchip
Not sure why this requires to be connected to the cloud... ah probably to bill
people.

~~~
amelius
Also not sure why this requires an ad-ware laden OS.

~~~
jmathai
Probably because it wasn't cheap to develop.

