
Algorithm recovers speech from a potato-chip bag filmed through glass (2014) - MrJagil
http://news.mit.edu/2014/algorithm-recovers-speech-from-vibrations-0804
======
cma
At some point it is (controversially) hypothesized that we may be able to pull
imprinted recordings off of ancient artifacts:
[https://en.wikipedia.org/wiki/Archaeoacoustics#Past_interpre...](https://en.wikipedia.org/wiki/Archaeoacoustics#Past_interpretations_controversy)

~~~
MrQuincle
I also once thought of that, but maybe there are other ways in which large
vibrations have left fingerprints on materials. I don't mean gravitational
waves, but really acoustic phenomena. Like a comet impact or a vulcano
outburst. What's a delicate material which would be able to record sound but
that doesn't get destroyed by time? Clay that dries up is a logical one.

~~~
eganist
Old glass or porcelain might be the most viable candidates for a first look as
they would be cooling off from a heated, more malleable state, potentially
capturing sound from the immediate vicinity as they cooled off.

I have no science to back this up. It's just a hunch.

~~~
ianai
That may be complicated from old glass never fully hardening. But it sounds
plausible enough.

Possibly mortar in walls could record the workers taking. All sorts of pottery
start out malleable, so they might be candidates. Cave paintings might be an
option - wonder whether being finger painted would leave biological
fingerprints behind - heart rate, for instance.

Pretty wild idea. Here’s hoping it has some legs.

Maybe someday we’ll learn about places like Stonehenge this way.

~~~
ghusbands
Old glass doesn't flow and is solid. Google it to see plenty of debunking.

------
femto
If interested in this, you might also be interested in the lead author's other
research papers

[http://abedavis.com/research.html](http://abedavis.com/research.html)

and his 2016 PhD thesis, which includes the "chip bag" research.

[http://abedavis.com/thesis.pdf](http://abedavis.com/thesis.pdf)

I'm interested in this research, from the angle of "Can passive surface
measurements tell us what is happening inside the human body?"

------
rjplatte
Ah, yes, I saw this in 2010. They didn't have high enough resolution to detect
actual vibrations, so they used an algorithm based on per-pixel color shifts.
Really clever.

------
emilfihlman
[2014]
[https://news.ycombinator.com/item?id=8131785](https://news.ycombinator.com/item?id=8131785)

------
an4rchy
Definitely some interesting applications.

Sort of orthogonal but is there a similar way to get sound from seeing
someone's lip's move in a video -- would it then be possible to recover
audio/conversations from old video only files?

Edit: Guess they can read lips via AI [https://www.techemergence.com/machine-
learning-that-learns-m...](https://www.techemergence.com/machine-learning-
that-learns-more-like-humans-an-ai-lip-reading-machine-and-more-this-week-in-
artificial-intelligence-11-11-16/)

Now getting the high quality sound from just video would be amazing.

~~~
spyder
What about lip-reading with reflected Wi-Fi signals :) :

[https://people.eecs.berkeley.edu/~guanhua/paper/WiHear_MobiC...](https://people.eecs.berkeley.edu/~guanhua/paper/WiHear_MobiCom14.pdf)

 _" WiHear aims to detect human speech by analyzing radio reflections from
mouth movements. It requires individual user to train the system extensively,
and can recognize only a limited number of words (6 words) with high
accuracy."_

------
Patient0
"If I'm sitting next to a swimming pool, and somebody dives in - and she's not
too pretty, so I can think of something else - I think of the waves and things
that have formed in the water. And, uh, when there's lots of people have dived
in the pool there's a very great choppiness of all these waves all over the
water and to think that it's possible, maybe, that in those waves there's a
clue as to what's happening in the pool. That some sort of insect or something
with sufficient cleverness could sit in the corner of the pool and just be
disturbed by the waves, and by the nature of the irregularities and bumping of
the waves have figured out who jumped in where and when and where what's
happening all over the pool. And that's what we're doing when we're looking at
something. Uh, the light that comes out is ... is waves, just like in the
swimming pool except in three dimensions instead of the two dimensions of the
pool it's they're going in all directions. And we have a eighth of an inch
black hole into which these things go ... which, uh, is particularly sensitive
to the parts of the waves that are coming in a particular direction it's not
particularly sensitive when they're coming in at the wrong angle which we say
is from the corner of our eye. And if we want to get more information from the
corner of our eye we swivel this ball about so that the hole moves from place
to place. Then ... uh, it's quite wonderful that we can see ... figure out so
easy. That's really because the light waves are easier than the ... the waves
in the water are a little bit more complicated it would have been harder for
the bug than for us but it's the same idea. Figure out what the thing is that
we're looking at at a distance."

Transcribed from footage included in the documentary "The Last Journey of a
Genius" (1989) by Christopher Sykes, a BBC TV production in association with
WGBH Boston and Coronet/MTI Film and Video.

[https://www.youtube.com/watch?v=1qQQXTMih1A](https://www.youtube.com/watch?v=1qQQXTMih1A)

------
booleandilemma
_Because of a quirk in the design of most cameras’ sensors, the researchers
were able to infer information about high-frequency vibrations even from video
recorded at a standard 60 frames per second._

That’s a pretty convenient quirk if you ask me.

~~~
zeta0134
It's known as the Rolling Shutter effect. It's most commonly observed when
attempting to take still imagery of very fast moving objects like rotors on
airplanes, but affects pretty much any image sensor that isn't designed
specifically to work around it. The following Wikipedia article goes into far
more depth, but the basic problem is that an image sensor doesn't sample every
pixel instantly, but instead reads them out serially, and thus there's a tiny,
tiny fraction of a second in-between each pixel being sampled. It's small
enough to not matter for most purposes, but it's still there, and can be
useful / detremental depending on the application.

[https://en.wikipedia.org/wiki/Rolling_shutter](https://en.wikipedia.org/wiki/Rolling_shutter)

~~~
Sharlin
And it is why mechanical shutters are still a thing in dedicated cameras. It
is still much faster to move two pieces of metal or carbon fiber over the
sensor than it is to read out a multi-megapixel CMOS sensor. When shooting
video mechanical shutters cannot be used so the rolling shutter effect can
cause artifacts such as fast-moving subjects appear slanted.

~~~
namibj
Err, modern global shutters like those made by CMOSIS/ams are _much_ faster
than that.

Their fast one is an APS-C sensor with 4k*3k Px, a shutter closing time of
1s/~120000 and a minimum shutter open time of ~1s/50000\. The shutter closing
time might be even faster, just reconstruction from frame overhead time values
I remember, and adjusting for the share the row skew had. Check the datasheet
if you like to.

If you know a mechanical shutter that can do such, I'd like to know.

This, by chance, allows laser-flash illumination of objects that are behind a
close wall of fog, as you can keep the shutter closed while the light travels
to the further away object of interest. You will have the blur, but no longer
the massive constrast loss due to light pollution. If it's not as bad as fog,
and just e.g. normal rainfall or such, you lack the reflection artifacts that
would be common, and only retain the refraction artifacts from the light
passing through it by necessity.

These sensors do lack a little dynamic range, but you can compensate with some
slight trickery, see the datasheet, to get ~15 stops out of this particular
device.

Rolling shutter has applications, but videos for users of low knowledge and
high ambitions is not one of them.

~~~
nickparker
Could you expand on or link the bit about time of flight trickery in fog or
rain?

c * 1 s / 50,000 is still kilometers, so I’m not sure I understand how this
shutter can do what sounds like using time of flight to selectively illuminate
stuff at a specific depth

~~~
namibj
No, the part that can solve this is the fast time from insensitive to
sensitive, which is shorter than the time form sensitive to no longer
sensitive. You will need a pulse with a short enough duration, e.g. a
Q-switched Nd:YAG with frequency doubling giving you up to about a dozen
Joules at up to a few hundred Hz with pulse durations of under 50 ns, and,
while limited in some ways by breakdown peak power in components, a lower
limit of about .5 ns.

Most of the shutter time is used to copy the data to the shadow pixel, but
just releasing the dark pull won't take long.

------
joewee
Makes me wonder why this is even possible...video of Oval Office argument...
[https://m.youtube.com/watch?v=Ija-VZwcznE](https://m.youtube.com/watch?v=Ija-
VZwcznE)

------
jwilk
Discussed 6 months ago:
[https://news.ycombinator.com/item?id=16201745](https://news.ycombinator.com/item?id=16201745)

------
wiz21c
pushing the argument to its limit : what happens if we can recover any speech
from this or that material (moving leaves, bags, lips movements (AI would be
good at that), changes in magnetic fields whatever). should we assume that
there is no such thing as a "safe place" anymore ? One could argue there never
was (because, for example, police can eavesdrop) but, this article implies it
can reach a whole new level... How is that kind of technology controlled by
those in powers ? Considering Apple is valued at gazillions of dollars, I
guess those technologies will be worth a lot and be available to some others
very powerful people... James Bond looks quite outdated to me now :-/

~~~
xapata
Microphones are already quite good. If you can be seen, you can be "heard".
That has been true for many years. I think there was a scene in the movie
Enemy of the State involving cone mics. The difference with this technology is
that someone can "hear" you even if you're around a corner.

------
Digit-Al
Forgive me, as someone with zero actual knowledge in this field. This may be a
naive question, but wouldn't this be fairly easily defeated by playing songs
in the background?

~~~
mvid
It might be possible to cancel that noise out. But something like a pink noise
generator or an office grey noise generator would probably fool it, the same
way it would fool a microphone at distance

~~~
joewee
This doesn’t work anymore, if it ever did. There is free software capable of
isolating audio like a single voice from background noise.

~~~
grkvlt
Not sure how this can't work, it's just audio jamming, same as for EM waves.

You just need to broadcast sufficient noise to cover the signal, i.e. the S:N
ratio is such that the receiver (crisp packet, microphone, etc.) can no longer
see the signal above the noise floor. Now, this might mean broadcasting a
_loud_ noise signal, which could overwhelm your (or other friendly) receivers
(ears, etc.) so optimal placement of the jamming noise source becomes an
issue.

Generally, you want it closer to the threat than you, so that distance
attenuation keeps it bearable for you but still jams possible recording
devices. Or, introduce high bandwidth noise vibrations into the surfaces of
the area you are in, such as windows or walls. Anyway, it's very possible and
in use today in secure facilities that must be protected from audio
eavesdropping.

------
xtrapolate
A different approach to decoding vibrations into speech, involving a laser
microphone:
[https://en.wikipedia.org/wiki/Laser_microphone](https://en.wikipedia.org/wiki/Laser_microphone)

------
test6554
So if I eat my sour cream chedar ruffles in a one-party consent state could I
get in trouble?

------
dbg31415
Are there any applications for this tech that aren't a massive violation of
privacy? Seriously... why develop this? It's super creepy.

------
Jaruzel
Tangential... but why do the US call them 'Chips' whereas in the UK we call
them 'Crisps'?

------
adrianhel
This would also be really cool for recording vocals. I liked the glitchiness
of potato chips!

------
mirimir
Rule of thumb is never having sensitive conversations where such exploits are
feasible.

~~~
toomuchtodo
With enough processing power, you can listen to anyone as long as you can get
an object in frame that’s vibrating with speech (the UK and Chinese CCTV
networks comes to mind). Depending on resolution, I’d expect this to be the
case for any footage already stored. It’s just a matter of having a
distributed computing job kicked off to comb through video data and add the
additional audio metadata.

Will you only speak of sensitive subjects in rooms with no windows? Will we be
silent in public? These are issues where technology, politics, and human
rights intersect.

~~~
maxerickson
Don't sweat the CCTV or recordings:

 _Reconstructing audio from video requires that the frequency of the video
samples — the number of frames of video captured per second — be higher than
the frequency of the audio signal. In some of their experiments, the
researchers used a high-speed camera that captured 2,000 to 6,000 frames per
second. That’s much faster than the 60 frames per second possible with some
smartphones, but well below the frame rates of the best commercial high-speed
cameras, which can top 100,000 frames per second._

~~~
RugnirViking
The article quite literally is about a technique to bypass that limitation and
do it with a regular 60fps camera.

~~~
maxerickson
It sounds like that technique doesn't quite recover the audio though, just a
portion of it:

 _While this audio reconstruction wasn’t as faithful as that with the high-
speed camera, it may still be good enough to identify the gender of a speaker
in a room; the number of speakers; and even, given accurate enough information
about the acoustic properties of speakers’ voices, their identities._

Saying _it may still be good enough_ pretty strongly implies that the quality
is quite low.

~~~
RugnirViking
There is a video on the webpage that includes the recovered audio. It is low
quality, but enough to understand the speaker most of the time, and definately
enough to pose a security risk when analysed by a specialist

~~~
maxerickson
The youtube clip in the article? The voice reconstruction there is from high
speed video. If you mean some other video, I'd appreciate a link.

------
anonu
I think this has shown up on HN previously... Cant find the link though.

------
StanislavPetrov
>In one set of experiments, they were able to recover intelligible speech from
the vibrations of a potato-chip bag photographed from 15 feet away through
soundproof glass.

Its hard enough to discern intelligible speech from many people who are
standing right in front of you.

------
jordache
TIL don't leave snack bags out, and the window shades open.

------
gh0zt
“I’m sure there will be applications that nobody will expect. I think the
hallmark of good science is when you do something just because it’s cool and
then somebody turns around and uses it for something you never imagined. It’s
really nice to have this type of creative stuff.”

Thinking atomic bombs :(

~~~
fooker
>Thinking atomic bombs :(

The same ones which have prevented large scale open military conflicts
involving superpowers for the last 60+ years? Atom bombs have likely saved
more lives than they have taken, if we had conventional wars with modern
technology without MAD.

~~~
freewilly1040
That’s one read on history. Another is that the world has been tremendously
lucky to not have suffered a nuclear holocaust since WWII. Many powerful
people have done everything they could to advocate for nuclear bombings in the
past 60 years, and we’ve had some extremely close calls with near accidental
launches.

~~~
fooker
You are right. It might be too soon to definitively say which take is correct.
It might be the case that a disaster (natural, economic, alien invasion, etc)
might spark yet another world war.

------
devoply
Now imagine doing this with a high res satellite anywhere on earth.

~~~
mynewtb
Satellites will never get such resolution due to optics and atmosphere. Drones
however...

