Extracting Audio from Visual Information

nighthawk454 · on Aug 17, 2022

Related works have been using visual processing to help with audio source-separation. Either by augmenting a separation AI [1] or by capturing the recording visually in the first place [2].

[1] https://ai.googleblog.com/2018/04/looking-to-listen-audio-vi...

[2] https://newatlas.com/music/optical-microphone-sound/

jefftk · on Aug 17, 2022

When this is eventually mainstream I wonder how it will interact with laws around recording consent. In two party consent states like MA you'd need my consent to record me talking, but not to record video. If video essentially encodes audio, however, then recording high resolution video may also require consent.

synthpop · on Aug 17, 2022

I get that this was developed for use as a practical tool with lots of potential benefits for analyzing videos shot by amateurs, but I'm rather more interested in what kind of audio it might interpolate when presented with really mangled or artificial visual content. I wonder what it thinks is happening when looking at a cartoon or computer generated film, or footage captured from a distorted VHS tape or a modern FPS game...

How much has this project developed since 2014? Can regular people download the tools to play around with it yet? Would love to see it try to get glitchy for creative purposes.

l1n · on Aug 17, 2022

Needs a [2014] in the title.

bmitc · on Aug 17, 2022

Interesting work but seems to come with scary applications.

One thing I was wondering from the video. The narrator mentioned that some of the movement is hundreds of times smaller than a pixel. If that's the case, how are they detecting the movement then in those cases? If something moves but stays within a pixel, how are you knowing how it moves since the pixel is the smallest bit of information you have? Or is it because although the physical movement in space is smaller than a pixel, the resulting "color" information for the given pixel changes in proportion to the movement in large enough ways that can be measured?

mgdlbp · on Aug 17, 2022

There is indeed research in doing that -- "motion magnification"

visualizing vibrations in machinery: https://www.youtube.com/watch?v=rEoc0YoALt0&t=121s

overview of research, and discussions on HN: https://hn.algolia.com/?query=Eulerian%20Video%20Magnificati...

TED talk: https://www.youtube.com/watch?v=fHfhorJnAEI

edit: TFA:

> from the change of a single pixel’s color value over time, it’s possible to infer motions smaller than a pixel.

> the researchers borrowed a technique from earlier work on algorithms that amplify minuscule variations in video,

bmitc · on Aug 17, 2022

Thanks for the references! And yea, I watched the video, commented, and then read the article. Thanks for pointing to those quotes.

notlukesky · on Aug 17, 2022

Here's the video:

https://www.youtube.com/watch?v=FKXOucXB4a8&t=10s

cma · on Aug 17, 2022

It seems like event cameras are going to rip this wide open and not require crazy cooled down slowmo cameras:

https://en.wikipedia.org/wiki/Event_camera

Could end up in a world where every cellphone can turn anyone's windows into a microphone or something.

teddykoker · on Aug 18, 2022

See also Radio2Speech [1] which uses a UNet to recover audio from a RF beam.

[1] https://zhaorunning.github.io/Radio2Speech/

radicaldreamer · on Aug 17, 2022

Wonder if intelligence agencies already utilize something similar…

planewave · on Aug 17, 2022

Theremin (yes, the same) invented a device using an infrared beam directed at windows which was able to capture speech from inside. [1] This is separate from his RF powered Thing bug that was installed in the US Embassy in Moscow. [2]

Similar techniques have been shown using lasers vibrometers to achieve a similar effect.

Both of these are perhaps not visual in the same way as this method in the article. But, do illustrate the long history of non-traditional microphone eavesdropping techniques.

[1]Albert Glinsky (2000). Theremin: Ether Music and Espionage. University of Illinois Press. p. 10. ISBN 9780252025822. Retrieved 2013-12-28. theremin family huguenot.

[2]https://en.wikipedia.org/wiki/The_Thing_(listening_device)

mhh__ · on Aug 18, 2022

Peter Wright's book (Spycatcher) about his (story of his) adventures reverse engineering the thing is a good read.

He had another trick of using simple phasing to make cocktail party type problems easier for analysts. Just play it out of phase in each ear and the brain works it out apparently.

AnotherGoodName · on Aug 18, 2022

If you haven't already seen tom Scott's video on background noise giving the location of a recording due to power line frequencies being subtly different I recommend it. https://youtu.be/e0elNU0iOMY

snvsn · on Aug 17, 2022

Previous discussion: https://news.ycombinator.com/item?id=8131785