Optical microphone can separate multiple instruments from afar

anigbrowl · on June 28, 2022

I've been doing pro audio stuff or 25 years and this is a landmark paper, biggest breakthrough I've seen in years. I'm astonished at the quality of extracted signals. Biggest thing I've seen since deconvolution became good enough for realtime or near-realtime adaptive noise reduction.

ackbar03 · on June 28, 2022

I thought something similar was done with the laser on window pane thing already

TrainedMonkey · on June 29, 2022

Laser stuff was a single point. The advantage here is capturing a set of 2d pixels which let you see how signal travels through space. This essentially means a large array of microphones which are next to each other and in perfect sync (which I hear is hard when you try to do this with multiple microphones).

williamcotton · on June 29, 2022

It is hard due to phasing, that is, overlapping audio signals from the same source when summed during mixing can cancel out various parts of the signal. Channels on some mixers have a phase inverter and audio engineers will also move microphones around while looking at a phase monitor.

cma · on June 29, 2022

There was this in 2014: https://newatlas.com/visual-microphone/33222/

rcxdude · on June 29, 2022

This isn't how the system works. It basically can give you a small number of contact microphones (you need to aim a laser at each point you want to record).

pishpash · on June 29, 2022

Interesting interpretation of this as a high-density high-resolution array! But does it need to be isolated well from movement? Since sound wavelengths are long, so maybe not?

infogulch · on June 29, 2022

This is better quality (see comparisions in the video demo) and it's done with mostly normal cameras, not specialized high framerate cameras.

anigbrowl · on June 29, 2022

It has been, but the quality here is much better. Like, that example might have been good enough for intelligence or legal purposes, this is goon enough for commercial/entertainment purposes.

The main headache I see is that it still requires a somewhat expensive and complex camera setup, but I can see that coming within the realms of affordability/standardization quite soon.

de6u99er · on June 28, 2022

I hope this can be used to objectively distinguish between real audiophile equipment and audiophile snake oil.

capableweb · on June 28, 2022

Then please read the submitted article, the press release (https://www.cs.cmu.edu/news/2022/optical-microphone) or the paper itself (https://www.marksheinin.com/_files/ugd/a41a28_7d370603fafd41...) again, as it has nothing to do with audio quality.

Dan_Sylveste · on June 29, 2022

'Real' audiophile equipment is the stuff that sells to recording studios like powered studio monitors and rackmount solid state transports. It costs a lot less than the consumer audiophile stuff, too. If you're buying 'consumer' audiophile products like huge floorstanding speakers made of whatever wood of the day is in favour then you're buying status symbols, not audio reproduction quality.

justinclift · on June 29, 2022

> It costs a lot less than the consumer audiophile stuff, too.

That's the first time I've heard anyone say that studio monitors cost _less_ than consumer audiophile stuff.

Saying that as a person that buys and uses studio monitors, they are not cheap in any way.

Even "affordable" entry level stuff (eg JBL Pro LSR 305P/310S) is going to set you back a good chunk of change compared to consumer gear.

Dan_Sylveste · on June 30, 2022

Hmm from my perspective a pair of JBP LSR 308P MkII and a transport matches the price of entry-level consumer audiophile speakers and a cheap (Topping/SMSL) AMP/DAC pairing yet will outperform it any day.

justinclift · on June 30, 2022

Heh Heh Heh. You also need to add the matching 310S sub (https://jblpro.com/products/lsr310s) for things to sound "right". :)

They're engineered to work together, and while you _can_ go without the sub for a while... it's a shit setup and only useful in the short term.

Wouldn't want to be doing production stuff (or even end user setup) without the sub. ;)

Dan_Sylveste · on July 1, 2022

I thought that the sub was less necessary with the 8" monitors? Maybe I misread or misremember.

One of the thing that's interesting to note with monitors is how 'biased flat' is very much a thing and that consumer audiophile speakers are very much not always biased flat, I think the varying companies have a sound profile which they aim to target.

totetsu · on June 28, 2022

This is for espionage not hifi audio

macgyverismo · on June 28, 2022

I love the creative use of the rolling shutter, instead of seeing it as a downside, they turned the line-by-line nature of the sensor into sample rate multiplier.

dougabug · on June 29, 2022

The use of rolling shutter to increase effective sampling rate was present in the original SIGGRAPH 2014 Davis et al paper from Bill Freeman’s group at MIT, “The Visual Microphone: Passive Recovery of Sound from Video” (https://dspace.mit.edu/handle/1721.1/100023).

The authors of the current paper cite this and other prior work. The key innovation is the use of both a rolling shutter and a global shutter reference.

okwubodu · on June 28, 2022

That blew me away before I even finished the paper.

anfractuosity · on June 28, 2022

Ooh that's really cool that they're using the laser speckle pattern. I like the fact they exploit the rolling shutter too. Something which https://people.csail.mit.edu/mrub/VisualMic/ also does.

There are devices which are called laser doppler vibrometers, which might also be able to do this by pointing at the strings/base of the guitar?

There do seem to be videos of laser doppler vibrometers being used with guitars on youtube, but I'm not sure if the soundtrack that goes along with them is just from a normal mic.

I had a little play with laser speckle patterns to detect keypresses, as they can help find very subtle changes to a surface - https://www.anfractuosity.com/projects/fun-with-speckle-patt... (by 'diffing' the patterns)

zbrozek · on June 29, 2022

I came here to post about LDVs but was beaten to it.

https://en.m.wikipedia.org/wiki/Laser_Doppler_vibrometer

They're sensitive to very small vibrations. A friend of mine used them while working at a hard drive manufacturer to better understand head and platter vibrations.

ssfrr · on June 29, 2022

LDVs are the gold standard for noncontact vibration measurement and are widely used in acoustics. The main problem is that they’re pretty expensive (I think on the order of a few 10s of k$)

klabb3 · on June 28, 2022

Combined with sophisticated noise cancelation and other relatively mature tech, this could make intentional focus listening possible, analogous to looking at something, as well as closing your eyelids, but for hearing.

Imagine being able to shut off specific ambient noises (and sometimes.. people) without losing spatial awareness. Or tune in a source you're paying attention to (the cocktail party problem).

The issue with super-hearing would be to re-adjust expectations of who can reasonably hear us. Could be used for creepy things, obviously..

TheHideout · on June 28, 2022

This seems like an interesting optical solution to the "cocktail party problem", which can be solved using Independent Component Analysis [0]

[0] https://en.wikipedia.org/wiki/Independent_component_analysis

38_9503-94_5201 · on July 1, 2022

check your PI mail, syn9; the game is on

thanatos519 · on June 28, 2022

I'm imagining Q delivering this technology to 007, except that it would shatter the audience's suspension of disbelief. Truly astounding sorcery, yet at the same time completely straightforward.

orbital-decay · on June 28, 2022

Incredible separation, I don't think it's attainable by any other means. Should be super useful for speech in noisy environments.

I have a question though, is capturing lateral movements of a single spot on the instrument enough to represent how it sounds for a human ear? I think it's equivalent to a polarizer filter as it doesn't seem to be capturing depth axis vibrations.

jcims · on June 28, 2022

These example audio at the bottom. Seems the answer is no. But you could potentially use it as a means to isolate the sound from the recorded audio. You would have to synchronize the phases though because the optical speed of sound is c.

pishpash · on June 29, 2022

Be careful what you wish for, I can see a future where it's like EM today, the entire environment saturated with spread-spectrum (audio) noise and everyone has to wear a filtering+array gain device in their ear.

infogulch · on June 28, 2022

> it doesn't seem to be capturing depth axis vibrations

Good point, the paper mentions x-axis and y-axis, but doesn't mention z-axis. Maybe depth vibrations could be resolved as changes to the interference pattern?

pbhjpbhj · on June 28, 2022

Do human ears resolve sound in more than one dimension? I've always considered that, per ear, we only get a sequence of compressions and rarefactions on the ear drum, that other aspects of hearing are through combination or 'cheating' (skin sensitivity and such). So, would it matter?

_kb · on June 29, 2022

There’s binaural localisation from phase and amplitude differences, but importantly also monoaural cues from frequency response. Your pinna act as a set of directional filters - which is part of why some binaural recordings can quite literally feel as though they’re in your head.

What’s interesting with this is that while you can’t get those directional filters, you could use a system like this to provide 3 or more ‘sensors’ (eg the chip packet demo) within a scene and isolate signals the same way as an array mic.

infogulch · on June 29, 2022

No, but a thing vibrating in any dimension will still produce sound waves.

taeric · on June 29, 2022

You can pinpoint with your two ears where sound is coming from, fairly well. Takes some training, of course.

That said, I get the impression this is more complicated than the autocorrelation of multiple sensors?

pontifier · on June 29, 2022

I was just wishing for something like this yesterday.

I wanted to figure out how to detect some very very low infrasound reliably, and no conventional microphone technology seemed like it could do what I needed.

This feels like it could form the basis of a new wave of scientific vibration measurement systems.

tnjm · on June 29, 2022

Not only is the tech here astonishing, but full credit to the authors for producing such a clear, concise video accessible to just about everyone explaining what they did, how they did it, and why it matters. Excellent storytelling.

323 · on June 29, 2022

Can this be used for crowd audio surveillance?

Lasers on light poles next to the already there high resolution cameras and you could record what everybody is speaking while walking on street.

its_bbq · on June 29, 2022

An interesting thought experiment. Are there vibrations on our throats or is it all taking place hidden inside our bodies?

yjftsjthsd-h · on June 29, 2022

Throat mics exist, so probably? (If I'm understanding your question correctly)

imchillyb · on June 29, 2022

When you speak, even your chest vibrates with the sound. Our bones conduct the vibrations in our skulls. This system could eventually be tuned to read those vibrations.

ben_w · on June 29, 2022

Ok, this is bizarre. A few days ago, and with no connection at all to any of this work, I had a similar idea. Not the same idea, just similar, and mine is just at the “hmm” stage (which also means mine may turn out to not work), but this still feels really weird.

Anyway, my thought was: laser goes to semi-silvered mirror between camera CCD and camera lens, passes through lens to diverge outward into environment, reflects off environment in the same way as a normal laser microphone, separate return signal now exists for all pixels on CCD.

Point this at a wall, do the right transformation (is a Fourier transform sufficient?) and the entire wall can be used as a computational phased array of microphones to listen to a specific (possibly moving) target.

Possibly mix with the original laser light to get beat patterns due to red/blue shift, like radar speed guns, but that feels like a separate application entirely.

I’m really impressed by this result in particular:

> Combining 63-fps video from two cameras, one with a global shutter and one with a rolling shutter, allows the researchers to recover a sound signal at 63,000 Hz

Because shutter speed was my first concern about possible limitations when I had my other idea.

kazinator · on June 30, 2022

> a 63-fps limit on input data would seem to place a 63-Hz upper limit on the sound this device can "see."

Obviously not because it's not taking one amplitude sample. It's analogous to taking 63 sliding FFT's per second, which may be based on on thousands of samples and capture high frequency content. This speckle pattern being sampled is some kind of FFT-like transform of the signal containing lots of information.

But the 63 Hz sampling will have to show up as a limitation. I would expect it to be excellent for periodic signals, but to struggle with transients, like the attack of a percussion instrument such as a snare drum.

wanderingstan · on June 28, 2022

This tech could be applicable to ticketing loud vehicles, as discussed here earlier this year: https://news.ycombinator.com/item?id=30364669

iguana_lawyer · on June 28, 2022

Could an optical microphone be used to identify materials using resonance frequencies?

crispyambulance · on June 29, 2022

I just watched the video and am a little confused about the laser illumination.

Does the laser have to be "aimed" at each object of interest or is it just laser illumination of the whole scene?

The graphics suggest there's a laser point on each object, but that means it has to be aimed and thus follow the subject, right?

Moreover many objects, like guitars, have complex oscillation modes so if you are "listening" to just "one point" on the surface, you're not picking up the sound from the other parts of the guitar which are oscillating differently.

rcxdude · on June 29, 2022

The laser needs to be aimed at each point of interest. The system can track multiple laser points simultaneously with one pair of cameras, but there's a tradeoff with quality (because essentially the camera's field of view is being divided into slices, one for each laser point). So you've got a small number of virtual contact microphones you can point at a surface, not a video where you can see the vibration at each point.

IndySun · on July 3, 2022

Since childhood I have been waiting for vinyl turntables that use a laser instead of a needle to 'play' records, without any wear and tear. This may be the breakthrough.

yodon · on June 28, 2022

Is technique this significantly different from the laser microphones[0] that intelligence services have been using for many decades?

[0]https://en.m.wikipedia.org/wiki/Laser_microphone

medler · on June 28, 2022

It builds on the technology behind laser microphones. According to the paper, visual vibrometry has historically required expensive cameras, and their method removes this need and appears to have other advantages over using a high-speed camera. They say they contribute “a novel method for sensing vibrations at high speeds (up to 63kHz), for multiple scene sources at once, using sensors rated for only 130Hz operation. Our method relies on simultaneously capturing the scene with two cameras equipped with rolling and global shutter sensors, respectively.“

anewpersonality · on June 29, 2022

Does the laser have to track the instrument in space?

knolan · on June 29, 2022

They discuss this in their video:

https://youtu.be/_pq0d1oxtA0