

Extracting audio from visual information - jeremynixon
http://newsoffice.mit.edu/2014/algorithm-recovers-speech-from-vibrations-0804

======
6stringmerc
Very interesting, and while the title isn't what I expected, I'm pleased to
have the chance to read through it. Will just share my notes:

1 - Initial thought process was that the experiment / development was
extracting audio from a static image. This is very different. The complexity
of the process and method fascinated me, and while I like to think I was
following along okay, I'm not bold enough to jump out and say I grasp it.

2 - I guess there's a bit of a language difference in the title here
(extracting) versus the dominant term used in the article (recovering). My
understanding is a simplified one but I grasp that it's a way of, well,
translating a visual disturbance (vibration) into its corresponding sonic
origination (from sound to vibration back to sound).

3 - As a joke for guitarists: Maybe now we can finally get some really
excellent guitar-to-MIDI translation stuff! (Disclosure: I haven't tried the
Fishman $400 thingy yet)

4 - This kind of reminds me of how neat Ableton Live's WAV-to-MIDI idea is,
but personally I haven't had very good luck mastering it.

5 - Potential applications from brief contemplation: A) Environmental support
for the blind, B) Potential oceanic application (reading waves from space?),
C) Clandestine communication system using something like morse-code in visual
form.

Anyway, had a lot of fun with this article.

------
infogulch
The limiting factor with this is it requires 2-6000 fps video. I think this
restriction can be lifted.

Sound travels 2.8m in 1/120th of a second, a frame rate that is available in
smartphone cameras today. Therefore, if you have a surface that can be
analyzed for sound which is at least 2.8m long in the direction the sound is
travelling, you have a snapshot through time of the effect the sound waves
have on that surface for the whole ~8ms of the frame's duration (if you ignore
the problem of progressive scanning).

Does this sound reasonable? Maybe you'd need more than 2.8m. To get over the
progressive scanning problem perhaps you could orient the camera so the
direction of the scanning is orthogonal to the direction of sound propagation,
then the image can be skewed to counteract the effect of the scanning.

