Related works have been using visual processing to help with audio source-separation. Either by augmenting a separation AI [1] or by capturing the recording visually in the first place [2].
When this is eventually mainstream I wonder how it will interact with laws around recording consent. In two party consent states like MA you'd need my consent to record me talking, but not to record video. If video essentially encodes audio, however, then recording high resolution video may also require consent.
I get that this was developed for use as a practical tool with lots of potential benefits for analyzing videos shot by amateurs, but I'm rather more interested in what kind of audio it might interpolate when presented with really mangled or artificial visual content. I wonder what it thinks is happening when looking at a cartoon or computer generated film, or footage captured from a distorted VHS tape or a modern FPS game...
How much has this project developed since 2014? Can regular people download the tools to play around with it yet? Would love to see it try to get glitchy for creative purposes.
Interesting work but seems to come with scary applications.
One thing I was wondering from the video. The narrator mentioned that some of the movement is hundreds of times smaller than a pixel. If that's the case, how are they detecting the movement then in those cases? If something moves but stays within a pixel, how are you knowing how it moves since the pixel is the smallest bit of information you have? Or is it because although the physical movement in space is smaller than a pixel, the resulting "color" information for the given pixel changes in proportion to the movement in large enough ways that can be measured?
Theremin (yes, the same) invented a device using an infrared beam directed at windows which was able to capture speech from inside. [1] This is separate from his RF powered Thing bug that was installed in the US Embassy in Moscow. [2]
Similar techniques have been shown using lasers vibrometers to achieve a similar effect.
Both of these are perhaps not visual in the same way as this method in the article. But, do illustrate the long history of non-traditional microphone eavesdropping techniques.
[1]Albert Glinsky (2000). Theremin: Ether Music and Espionage. University of Illinois Press. p. 10. ISBN 9780252025822. Retrieved 2013-12-28. theremin family huguenot.
Peter Wright's book (Spycatcher) about his (story of his) adventures reverse engineering the thing is a good read.
He had another trick of using simple phasing to make cocktail party type problems easier for analysts. Just play it out of phase in each ear and the brain works it out apparently.
If you haven't already seen tom Scott's video on background noise giving the location of a recording due to power line frequencies being subtly different I recommend it. https://youtu.be/e0elNU0iOMY
[1] https://ai.googleblog.com/2018/04/looking-to-listen-audio-vi...
[2] https://newatlas.com/music/optical-microphone-sound/