Hacker News new | past | comments | ask | show | jobs | submit login
Optical microphone can separate multiple instruments from afar (newatlas.com)
222 points by sohkamyung on June 28, 2022 | hide | past | favorite | 52 comments



I've been doing pro audio stuff or 25 years and this is a landmark paper, biggest breakthrough I've seen in years. I'm astonished at the quality of extracted signals. Biggest thing I've seen since deconvolution became good enough for realtime or near-realtime adaptive noise reduction.


I thought something similar was done with the laser on window pane thing already


Laser stuff was a single point. The advantage here is capturing a set of 2d pixels which let you see how signal travels through space. This essentially means a large array of microphones which are next to each other and in perfect sync (which I hear is hard when you try to do this with multiple microphones).


It is hard due to phasing, that is, overlapping audio signals from the same source when summed during mixing can cancel out various parts of the signal. Channels on some mixers have a phase inverter and audio engineers will also move microphones around while looking at a phase monitor.



This isn't how the system works. It basically can give you a small number of contact microphones (you need to aim a laser at each point you want to record).


Interesting interpretation of this as a high-density high-resolution array! But does it need to be isolated well from movement? Since sound wavelengths are long, so maybe not?


This is better quality (see comparisions in the video demo) and it's done with mostly normal cameras, not specialized high framerate cameras.


It has been, but the quality here is much better. Like, that example might have been good enough for intelligence or legal purposes, this is goon enough for commercial/entertainment purposes.

The main headache I see is that it still requires a somewhat expensive and complex camera setup, but I can see that coming within the realms of affordability/standardization quite soon.


I hope this can be used to objectively distinguish between real audiophile equipment and audiophile snake oil.


Then please read the submitted article, the press release (https://www.cs.cmu.edu/news/2022/optical-microphone) or the paper itself (https://www.marksheinin.com/_files/ugd/a41a28_7d370603fafd41...) again, as it has nothing to do with audio quality.


'Real' audiophile equipment is the stuff that sells to recording studios like powered studio monitors and rackmount solid state transports. It costs a lot less than the consumer audiophile stuff, too. If you're buying 'consumer' audiophile products like huge floorstanding speakers made of whatever wood of the day is in favour then you're buying status symbols, not audio reproduction quality.


> It costs a lot less than the consumer audiophile stuff, too.

That's the first time I've heard anyone say that studio monitors cost _less_ than consumer audiophile stuff.

Saying that as a person that buys and uses studio monitors, they are not cheap in any way.

Even "affordable" entry level stuff (eg JBL Pro LSR 305P/310S) is going to set you back a good chunk of change compared to consumer gear.


Hmm from my perspective a pair of JBP LSR 308P MkII and a transport matches the price of entry-level consumer audiophile speakers and a cheap (Topping/SMSL) AMP/DAC pairing yet will outperform it any day.


Heh Heh Heh. You also need to add the matching 310S sub (https://jblpro.com/products/lsr310s) for things to sound "right". :)

They're engineered to work together, and while you _can_ go without the sub for a while... it's a shit setup and only useful in the short term.

Wouldn't want to be doing production stuff (or even end user setup) without the sub. ;)


I thought that the sub was less necessary with the 8" monitors? Maybe I misread or misremember.

One of the thing that's interesting to note with monitors is how 'biased flat' is very much a thing and that consumer audiophile speakers are very much not always biased flat, I think the varying companies have a sound profile which they aim to target.


This is for espionage not hifi audio


I love the creative use of the rolling shutter, instead of seeing it as a downside, they turned the line-by-line nature of the sensor into sample rate multiplier.


The use of rolling shutter to increase effective sampling rate was present in the original SIGGRAPH 2014 Davis et al paper from Bill Freeman’s group at MIT, “The Visual Microphone: Passive Recovery of Sound from Video” (https://dspace.mit.edu/handle/1721.1/100023).

The authors of the current paper cite this and other prior work. The key innovation is the use of both a rolling shutter and a global shutter reference.


That blew me away before I even finished the paper.


Ooh that's really cool that they're using the laser speckle pattern. I like the fact they exploit the rolling shutter too. Something which https://people.csail.mit.edu/mrub/VisualMic/ also does.

There are devices which are called laser doppler vibrometers, which might also be able to do this by pointing at the strings/base of the guitar?

There do seem to be videos of laser doppler vibrometers being used with guitars on youtube, but I'm not sure if the soundtrack that goes along with them is just from a normal mic.

I had a little play with laser speckle patterns to detect keypresses, as they can help find very subtle changes to a surface - https://www.anfractuosity.com/projects/fun-with-speckle-patt... (by 'diffing' the patterns)


I came here to post about LDVs but was beaten to it.

https://en.m.wikipedia.org/wiki/Laser_Doppler_vibrometer

They're sensitive to very small vibrations. A friend of mine used them while working at a hard drive manufacturer to better understand head and platter vibrations.


LDVs are the gold standard for noncontact vibration measurement and are widely used in acoustics. The main problem is that they’re pretty expensive (I think on the order of a few 10s of k$)


Combined with sophisticated noise cancelation and other relatively mature tech, this could make intentional focus listening possible, analogous to looking at something, as well as closing your eyelids, but for hearing.

Imagine being able to shut off specific ambient noises (and sometimes.. people) without losing spatial awareness. Or tune in a source you're paying attention to (the cocktail party problem).

The issue with super-hearing would be to re-adjust expectations of who can reasonably hear us. Could be used for creepy things, obviously..


This seems like an interesting optical solution to the "cocktail party problem", which can be solved using Independent Component Analysis [0]

[0] https://en.wikipedia.org/wiki/Independent_component_analysis


check your PI mail, syn9; the game is on


I'm imagining Q delivering this technology to 007, except that it would shatter the audience's suspension of disbelief. Truly astounding sorcery, yet at the same time completely straightforward.


Incredible separation, I don't think it's attainable by any other means. Should be super useful for speech in noisy environments.

I have a question though, is capturing lateral movements of a single spot on the instrument enough to represent how it sounds for a human ear? I think it's equivalent to a polarizer filter as it doesn't seem to be capturing depth axis vibrations.


These example audio at the bottom. Seems the answer is no. But you could potentially use it as a means to isolate the sound from the recorded audio. You would have to synchronize the phases though because the optical speed of sound is c.


Be careful what you wish for, I can see a future where it's like EM today, the entire environment saturated with spread-spectrum (audio) noise and everyone has to wear a filtering+array gain device in their ear.


> it doesn't seem to be capturing depth axis vibrations

Good point, the paper mentions x-axis and y-axis, but doesn't mention z-axis. Maybe depth vibrations could be resolved as changes to the interference pattern?


Do human ears resolve sound in more than one dimension? I've always considered that, per ear, we only get a sequence of compressions and rarefactions on the ear drum, that other aspects of hearing are through combination or 'cheating' (skin sensitivity and such). So, would it matter?


There’s binaural localisation from phase and amplitude differences, but importantly also monoaural cues from frequency response. Your pinna act as a set of directional filters - which is part of why some binaural recordings can quite literally feel as though they’re in your head.

What’s interesting with this is that while you can’t get those directional filters, you could use a system like this to provide 3 or more ‘sensors’ (eg the chip packet demo) within a scene and isolate signals the same way as an array mic.


No, but a thing vibrating in any dimension will still produce sound waves.


You can pinpoint with your two ears where sound is coming from, fairly well. Takes some training, of course.

That said, I get the impression this is more complicated than the autocorrelation of multiple sensors?


I was just wishing for something like this yesterday.

I wanted to figure out how to detect some very very low infrasound reliably, and no conventional microphone technology seemed like it could do what I needed.

This feels like it could form the basis of a new wave of scientific vibration measurement systems.


Not only is the tech here astonishing, but full credit to the authors for producing such a clear, concise video accessible to just about everyone explaining what they did, how they did it, and why it matters. Excellent storytelling.


Can this be used for crowd audio surveillance?

Lasers on light poles next to the already there high resolution cameras and you could record what everybody is speaking while walking on street.


An interesting thought experiment. Are there vibrations on our throats or is it all taking place hidden inside our bodies?


Throat mics exist, so probably? (If I'm understanding your question correctly)


When you speak, even your chest vibrates with the sound. Our bones conduct the vibrations in our skulls. This system could eventually be tuned to read those vibrations.


Ok, this is bizarre. A few days ago, and with no connection at all to any of this work, I had a similar idea. Not the same idea, just similar, and mine is just at the “hmm” stage (which also means mine may turn out to not work), but this still feels really weird.

Anyway, my thought was: laser goes to semi-silvered mirror between camera CCD and camera lens, passes through lens to diverge outward into environment, reflects off environment in the same way as a normal laser microphone, separate return signal now exists for all pixels on CCD.

Point this at a wall, do the right transformation (is a Fourier transform sufficient?) and the entire wall can be used as a computational phased array of microphones to listen to a specific (possibly moving) target.

Possibly mix with the original laser light to get beat patterns due to red/blue shift, like radar speed guns, but that feels like a separate application entirely.

I’m really impressed by this result in particular:

> Combining 63-fps video from two cameras, one with a global shutter and one with a rolling shutter, allows the researchers to recover a sound signal at 63,000 Hz

Because shutter speed was my first concern about possible limitations when I had my other idea.


> a 63-fps limit on input data would seem to place a 63-Hz upper limit on the sound this device can "see."

Obviously not because it's not taking one amplitude sample. It's analogous to taking 63 sliding FFT's per second, which may be based on on thousands of samples and capture high frequency content. This speckle pattern being sampled is some kind of FFT-like transform of the signal containing lots of information.

But the 63 Hz sampling will have to show up as a limitation. I would expect it to be excellent for periodic signals, but to struggle with transients, like the attack of a percussion instrument such as a snare drum.


This tech could be applicable to ticketing loud vehicles, as discussed here earlier this year: https://news.ycombinator.com/item?id=30364669


Could an optical microphone be used to identify materials using resonance frequencies?


I just watched the video and am a little confused about the laser illumination.

Does the laser have to be "aimed" at each object of interest or is it just laser illumination of the whole scene?

The graphics suggest there's a laser point on each object, but that means it has to be aimed and thus follow the subject, right?

Moreover many objects, like guitars, have complex oscillation modes so if you are "listening" to just "one point" on the surface, you're not picking up the sound from the other parts of the guitar which are oscillating differently.


The laser needs to be aimed at each point of interest. The system can track multiple laser points simultaneously with one pair of cameras, but there's a tradeoff with quality (because essentially the camera's field of view is being divided into slices, one for each laser point). So you've got a small number of virtual contact microphones you can point at a surface, not a video where you can see the vibration at each point.


Since childhood I have been waiting for vinyl turntables that use a laser instead of a needle to 'play' records, without any wear and tear. This may be the breakthrough.


Is technique this significantly different from the laser microphones[0] that intelligence services have been using for many decades?

[0]https://en.m.wikipedia.org/wiki/Laser_microphone


It builds on the technology behind laser microphones. According to the paper, visual vibrometry has historically required expensive cameras, and their method removes this need and appears to have other advantages over using a high-speed camera. They say they contribute “a novel method for sensing vibrations at high speeds (up to 63kHz), for multiple scene sources at once, using sensors rated for only 130Hz operation. Our method relies on simultaneously capturing the scene with two cameras equipped with rolling and global shutter sensors, respectively.“


Does the laser have to track the instrument in space?


They discuss this in their video:

https://youtu.be/_pq0d1oxtA0




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: