
Show HN: Lipreading with Deep Learning - irsina
https://github.com/astorfi/lip-reading-deeplearning
======
ConfusedDog
What you need a couple CNN layers to identify the most funny possibilities of
translations and make a YouTube channel like the Bad Lip Reading, then profit!
Even now, [https://www.youtube.com/watch?v=5Krz-dyD-
UQ](https://www.youtube.com/watch?v=5Krz-dyD-UQ) still cracks me up.

~~~
kakarot
Combine it with some parametric voice tech and you've got yourself a delicious
automated stew.

~~~
lwansbrough
Using AI to make fun of AI, in a nutshell.

------
samstave
How many secret efforts are there to accomplish this already for the MIC?

I can't imagine that there arent already some Palantir-like efforts to
accomplish this.

Imagine a REALLY good zoom lens on a very small drone that can not be
seen/heard by a target and that drone is doing something like this to gain
info.

Imagine the same zooming through windows as well.

This will be the next big ML-military step towards Total Information Awareness
taken, if its not already available in the wild.

~~~
asdfasgasdgasdg
For windows, they already have this, IIUC. You bound a laser beam of the
window and measure the vibrations. Random guys can just do this in their
garage.

[https://www.youtube.com/watch?v=1MrudVza6mo](https://www.youtube.com/watch?v=1MrudVza6mo)

~~~
conistonwater
The Applied Science guy is most definitely _not_ a random guy in a garage,
though, he's incredibly skilled and talented. The rest of his youtube channel
is pretty amazing also.

~~~
Shish2k
I am a random guy in a garage, and I made a functional laser microphone using
random bits of electronics I had in my spare-bits box ($1 laser pointer, old
pair of earphones, snip off the earphone and wire in a light dependent
resistor) -- admittedly the quality was awful (you could only just make out
voices if people in the room talked abnormally loud), but it was great for a
fun weekend science project :D

~~~
samstave
A write-up or vid of the components and build would be interesting..

------
gok
Maybe I'm misunderstanding the code, but it looks like it's matching audio to
video, not actually recognizing speech given a video. That is, it could answer
"does this audio line up with this video?" but not "what is being said in this
video?"

~~~
derimagia
I didn't take a deep dive of the code but in order to train it's going to need
to be fed audio files with the actual video/mouth shapes/etc. Essentially it
needs it to tell the reward to give back (if it was right). Once it "learns"
it wouldn't need the audio file.

------
sgt
Open the pod bay doors, HAL.

~~~
snakeboy
This scene would actually make a really cool test case!

------
meow_mix
This is fascinating. Has anyone considered repurposing this for something like
sign language?

------
Havoc
That's actually a really good application with some real potential for
improving lives. High five mate

~~~
PowerfulWizard
Yeah it is interesting, and it could also be a big boost to plain olde speech
to text in cases where you have video if the errors were non-correlated (which
I wasn't able to determine from skimming the readme.)

edit: now I see it is being used to match audio samples, not to generate text
so it wouldn't create an independent value from the audio in this arrangement.
Other than i.e. speaker attribution which they mentioned.

------
anotheryou
no demonstration video?

~~~
ehsankia
Not the same project, but here's one from Oxford + Deepmind:

[https://www.youtube.com/watch?v=fa5QGremQf8](https://www.youtube.com/watch?v=fa5QGremQf8)

------
orasis
OK. But WHY? All technology has moral implications. Did you create this to
actually help people? Do you care if it is weaponized? Think before you
create.

~~~
orasis
It reflects poorly on this community that any comment that questions the
ethics of technology gets downvoted.

