Hacker News new | past | comments | ask | show | jobs | submit login
xMEMS announces the first monolithic MEMS speaker (anandtech.com)
148 points by rbanffy 35 days ago | hide | past | favorite | 76 comments

Just to be clear, this seems to be the first monolithic MEMS speaker, which I take to mean that the speaker array is fabbed as one die, as opposed to each speaker being a separate piece of silicon. There are already MEMS speakers available before this. Also, for mobile devices, MEMS have been used for a long time now, specifically for the microphone and IMU (inertial measurement unit -- consisting of accelerometer, gyroscope, magnetic sensor).

MEMS speakers are not used in phones (in general; I'm sure someone will nitpick).

A speaker is very different to a microphone.

> A speaker is very different to a microphone.

Is it? I've definitely had fun little experiments where I turn a microphone into a speaker or use a speaker as a microphone. It's just a matter of whether I'm powering the membrane or whether the membrane is powering me, i.e. its just a matter of which side of the mechanism I sit on. It's a conversion between electricity and mechanical vibration that can be driven from either side.

I'm sure there's a lot of engineering trade-offs that change if you're looking to optimize for recording fidelity vs output fidelity, so the details of their implementation are probably quite different, but at their core the two don't seem all that different.

Maybe they're totally different in the MEMS world?

I think the engineering trade-offs are significantly large enough to make microphones and speakers "totally different" in the conventional, non-MEMS world.

Sure, at the very core of it, it's just trading electricity for mechanical vibration, and this functions in both directions. But the engineering that has gone into specializing both speakers and microphones is staggering.

I'm with you on the speaker versus microphone. Both piezo and magnetic coil driven can be used both ways. Just like an electric motor versus an alternator. They're essentially the same thing, but optimized for different purposes.

Obviously depends on the type of speaker and microphone, but I have actually used headphones as a mic in a pinch before (in a lab setting, not in everyday use).

Just in case it's not clear, I stated that MEMS mics and IMUs are used in smartphones, and that MEMS speakers have been available for a while now. If it seemed like I said MEMS speakers are used in smartphones, that was unintentional.

I find it amazing that litography is considered a "simplification of the manufacturing line" compared to century-old technology that essentially consists of winding a coil through a magnet. Granted, it's not a bleeding-edge fab process, but it's still a very strange view that considers this a simplification.

It's the same way that a photocopy machine is a simplified document reproduction system compared to arranging die metal cast printing blocks onto a plate, etc.

The photocopy machine is complex, but that's the photocopy machine manufacturer's problem, not the problem of someone reproducing a document.

I wonder if those that disagree with you have taken a look into the speaker drivers that this MEMS speaker targets (smartphones and earbuds).

These devices are many times more complex to manufacture than the large scale dynamic drivers you might find in a pair of desktop speakers, and are definitely not possible to wind by hand. The armature is made with incredibly fine guage wire around a form of precise (non circular) geometry, and needs to be coated with epoxy to stay together. The diaphragm, reed and driver body are also precise parts with their own manufacturing difficulties.

In short, I absolutely agree with the article's assertion that replacing these manufacturing methods with lithography is indeed a simplification. Just because the concept of a wound electromagnet has been around for longer does not mean that it is simpler in practice.

If you read the article, there is not much lithography inside. It's mostly piezoceramics

Why wouldn't it be? Winding a wire over a coil requires robotics. Making large, complicated 3D objects is finicky, slow, and expensive. Lithography is relatively simple --- it's a uniform batch process. Lithography nodes are complicated and expensive to research --- new nodes require brand new technology and even science --- but once you have a lithography line set up, you can run it over and over and over and over again, because the actual batch process is relatively simple and uniform.

Batch here is the key. One can make 100-10'000 elements at the same time. This using the same equipment and processes that is used for fabrication of other MEMS sensors, techniques that are proven to deliver sub-dollar end-user costs for high performance microphones, accelerometers, temperature sensors, etc.

I would suggest hitting up "speaker coil winding" on YouTube if you think this requires robotics. Sometimes the simplest thing is "pay a human $8/hour". You don't even need to spend millions on a litho node to make it work.

This is about miniature speakers, like in ear-buds headphones, not Celestions for your guitar cabinet.

Yes but if you want a million an hour, that’s $8 million/hour. Suddenly photolithography seems incredibly cheap.

Does it take an hour to wind a coil? Probably not. At worst, a minute for an experienced worker. That gets you to 130'000 $ per hour and 0.13$ human cost per device. Though a minute is very optimistic, you don't need that many windings, if you get it down to 10 seconds, which is a lot of time to wind a coil, you pay 22'000$ and 0.02$ per product.

That's fairly cheap and if a robot can do it for less becomes questionable because robots need a lot of maintenance and cost a lot.

What if you want 100 billion an hour?

Then you're screwed no matter what process you use. 100 billion 1 gramme devices is 100,000 tonnes. Per hour.

For reference, that's about half the discharge rate of the Rio Grande.

I think OP was referring to the fact that coil speakers are something that Isaac Newton could have made, if he knew how, whereas solid state is just not doable without modern tech.

So winding the coil requires robotics. And any lithography process requires robotics, cleanrooms, aggressive chemistry, vacuum chambers and whatnot.

Yes, but coiling stuff requires more complex robotics and a robot per coil, since you can wind a wire around only one coil at a time. With big lithography processes, your robot can be simpler (since it doesn't need to reach around or inside a complex shape) and it can process thousands of devices (on a big wafer) all at once instead of individually.

Winding a simple coil, like the ones inside speakers, requires just fixing a wire and rotating the core:


You can make the same maching with many parallel winding heads + cutters and make as many coils as you like in parallel.

This is absolutely not true for the miniaturized balanced armature drivers that this MEMS speaker design is competing with.

The coil must be wound in-situ and is usually epoxy potted due to fragility.


Hmm, that's quite a different kind of a beast.

It's clearly not a simplification of the manufacturing line, but I'd consider it a simplification of the final object.

From a personnel point of view it is a simplification. A robotic line that produces 10x the number of devices with 1/100th the number of personnel may well be worth the capital investment and once it runs it's a black box. Vs the complexities that you get with traditional manufacturing where a lot of the work is so complex that it can not yet be roboticized.

If the final product is more geared towards automatic manufacturing that is in fact a form of simplification. I'm quite curious how these sound in practice compared to similar weight/size regular devices.

Headphones and cellphones would be an obvious first application.

It’s an idea that’s older than photolithography.


At least for coil speakers, arrays of smaller speakers can perform better than one large speaker (or at least, better than the sum of their parts).

It looks like the ones from the article are meant for earbuds, but if these MEMS speakers someday become cheap enough to create a large phased array of them it might make for neat hi-fi system.

An array would allow, presumably, for beamforming. Aim the sound precisely at the listeners’ ears. Stick a camera on each speaker that identifies faces and beams sound accordingly, leaving those not in the room in near total silence.

Is that even possible with large part of the hearable spectrum having wavelengths in meters? How precise would that be?

It's absolutely possible, and has been implemented by companies like holosonics.com. The basic idea is that you do phase cancellation of ultrasonic frequencies to produce audio in the hearable range.

Could you transmit ultrasonic carrier waves that intermodulate at the right spot?

Yes you can, with an ultrasonic transmitter you can create sound in specific spots or that only a specific listener can hear.

Holosonics.com, or a lot less comfortably, [LRAD](https://en.wikipedia.org/wiki/Long_Range_Acoustic_Device)

Impractical with multipath. You would have better luck targeting people to not hear any sound than you would making it silent everywhere but one location. Unless you turned all walls into speakers, which would be a fun hobby project.

You could also skip all of these issues if you really want Hi-Fi and just used sealed IEMs.

Remember that xbox patent that would charge video watchers per person instead of per rental?

Perhaps I misunderstand this, but aren’t waveform arrays are meant more for uniformity than output? At least in the case of lightbulbs, adding 10 1800 lumen lights in an array would not result in additional light at any given point unless the light was focused, wouldn’t the same be true of sound?

for light, you are mostly correct.

LED strips over a sheet with a diffuser produces a more useful light. But, as light is in the Thz range, doing any meaningful beamforming is exceedingly difficult.

Because speakers are well within the range of controllability, its perfectly possible to alter the phase to steer the audio output. this is possible because changing the phase moves the peaks and troughs of the waves to "move" the sound to where its wanted.

It claims 20Hz at 90 dB. How in the heck is that possible with such a tiny device?

When sealed to the ear-canal, the volume of air needed to be moved to make a large pressure change is very tiny.

Right. The metric only makes sense it also a distance is provided at which it is measured.

Oh, I hadn't thought of that: woofers are huge because they are quite a distance from the listener's ear and have to move all the air in the room, vs a tiny earhole. Makes sense, thanks.

I'll wait for the 3rd party benchmarks.

Even the first-party benchmarks look pretty trashy. +20dB peaks at 2k and 10k? "A bit bright" might understate things. I suppose they intend to equalize this away with some kind of integrated DSP?


It's honestly very easy to counteract big peaks like those at such high frequencies, especially when you can virtually guarantee that every single driver will have essentially identical frequency response curves

Sure, especially if you control the whole package (like a bluetooth headphone would) and double especially if you can rely on your source material having zero spectral content above 15k (many streaming music formats) or even lower (phone calls). But a 20dB cut is pretty dramatic. I use a 5-band parametric DSP with my headphones and it is only capable of 12dB cuts.

It sounds like "controlling the whole package" is already a foregone conclusion, given the requirement for a fairly non-standard amplifier.

I would expect it to be a straightforward signal processing problem if all that's needed is 20dB of suppression across a relatively narrow frequency range. If the frequency response is consistent from MEMS to MEMS, it should be straightforward to simply create a model, invert it, and apply it to the incoming waveform. DSPs these days can be pretty tiny and power efficient so I bet it wouldn't even take up too much power and area to do something like that.

-20dB is a multiplication by 10%. Not difficult to implement, but usually not available on manually controlled EQs unless you stack two bands.

I haven't actually confirmed this with testing yet, but won't those resonances still increase decay time at those frequencies after equalization? I'd like to see post-EQ waterfall plots.

probably a combination of DSP and deliberate peaks and troughs in the chamber design.

Reminds me of electrostatic loudspeakers.

Without the high voltages, static buildup, failing power supplies, missing low end (ok, 200 Hz in this case) and so on. The main similarity is the driver surfaces are both flat.

The main similarity is that they are both voltage modulated capacitive devices.

What does “risk order” mean in terms of manufacturing? It’s on one of the graphics in the article but for some reason I’m having trouble getting a good hit for it via Google search.

I think it meant to be "risk production." Risk production in chip fabrication process is the step to produce a batch of chips that's after prototype but before production. Basically all the specs and designs have been nailed down and frozen. It's time to go through the manufacture process once to see how things go before committing to mass production.

Sounds a lot like calling dibs on prototype production that may or may not yield functional units. The money paid would be financing a later, safer regular production start date. If they are lucky, the risk buyer can make a lot of money from the temporary exclusivity and despite talking all the risk they can still rely on the producer trying their best because they wouldn't want to risk succeeding too close to the regular production start date because contracts after that surely define fines for being late.

You can opt in to an earlier delivery than the "mass delivery" target date, with the acknowledgement that it's significantly more likely to slip. And sometimes the prices are higher.

Starting at 200 Hz, so you'll need something larger next to it for bass unless you use clever auditory tricks to supply the missing fundamentals.

Hopefully these new MEMS speakers will enable wireless headphones with tens of hours of battery life.

Strange. I have a few sets of TWSs and even as a content-addict I've never come close to draining any of them. The charging cases just have too big a reserve. My personal wish is for a set without miserable lag so they're useful for gaming but so far no luck on that front.

When something is too good I will always ask, what are the trade offs?

On paper at least, it should be much more reliable than Coil Speaker. And this is my second MacBook Pro which has Speaker Crackling problem. ( Although it is more likely an Apple faults judging from Google Search )

Everything in audio engineering is about trade-offs and making the right ones. There is no driver, coil, MEMS, or otherwise, which can efficiently and accurately reproduce the entire range of human hearing at reasonable SPL.

For applications below 200Hz, there really isn't much better choice than a coil-driven speaker. You are limited to very low frequencies, so the relatively coarse mechanical nature of the coil speakers is more-or-less hidden by the fact that any higher frequency harmonics/distortion are designed out (crossovers, cabinet design, etc).

The best speaker is going to be a combination of various technologies. The current dream speaker for me would be:

- large array of plasma or MEMS-based tweeters (2KHz+)

- large array of 2~3" drivers (120-2000Hz)

- a handful of 18" drivers (20-120hz)

- a rotary subwoofer in an infinite baffle (DC-20Hz)

- DSP/equalizer/time-delay hardware+software for tying the whole thing together

Also note that the amount of power required to produce the same amount of perceived acoustic output as you go down in frequency goes up very quickly. 50 watts into a 18" subwoofer is going to sound fairly meek, but pipe that exact same electrical signal into a titanium dome tweeter and you will develop an instantaneous case of tinnitus.

For those like me who hadn't heard of rotary subwoofers: https://en.wikipedia.org/wiki/Rotary_woofer

I remember reading about a LCR speaker array that was built in the 1970s in someone's house, with horns that spanned the entire height of the room made out of cinder blocks. Don't really know how how I would dig that article back up, but it was a pretty interesting system.

A surprising amount of that content is still out there if you have the patience to dig for it. AVSForum used to be my primary reason for internet usage for about 2 years of my life. The DIY community is still extremely strong (probably stronger than it was a decade ago).

If you have a few weekends, basic electronic & woodworking skills, and a $2000 budget to burn through, you can easily put together a 2.1 channel audio reproduction solution far more impressive than you could ever hope to buy off the shelf at any cost. I built a 800 liter subwoofer that plays flat to 13Hz back in 2008 for ~$600. Still works flawlessly to this day. The key is to be inventive with materials and design, and to always be aware of the space in which you will use the equipment. The room is always the most important part of the audio reproduction equation.

Curious why 2-3" for down to 120Hz, as opposed to breaking you your array down into a few more cone sizes for more bands, like say, some 8" for low 100s of Hz, and a 3" for high 100s of Hz into low 1000s of Hz before the tweeters take over.

Or is your goal to minimize the complexity of the crossover network that would be needed to cleanly separate the the bands with minimal distortion and the "fun" of correctly lining up the physical phases of each of the different cones?

I was thinking more along lines of a passive crossover network that would be a pain to build. If the whole thing is digital with dedicated amps per band, going for as many as possible is going to be more ideal as you can just divide things up in DSP software and automate it to a large extent.

You can throw most of that conventional wisdom out for sealed IEMs.

IEM basically eliminates the room from the equation so I would have to agree with that angle.

But, I would also argue that the IEM experience pales in comparison to the experience of having thousands of watts of electricity converted into low pressure acoustic waves for all in the vicinity to enjoy.

Watch the bridge shootout scene from MI3 with some IEM on your smartphone vs at your local cinema with its commercial grade sound system. There is a certain level of physical immersion that is simply not possible with just an IEM/headphone setup. Sure, you could strap haptic feedback vests to yourself and put linear actuators in all your chairs, but for me that is getting to be too finicky to tolerate. I'd much rather tell my spotify app to cast to my HTPC and instantly be greeted with a wall of powerful full-range sound.

I agree with the tactile argument. If audio systems are trying to simulate the sound pressure experience in its entirety, then IEMs fall short. For the OCD-fueled desire for accurate sound pressure reproduction at the human-world audio interface, I think IEMs are where effort should be focused. 50 year old audibility studies imply that the perceptible gains are marginal-to-none-existent, but who says that's always the point? ;)

Here are the negatives:

- unusual amplifier requirement

- poor performance under 200Hz (mid-bass to sub-bass)

- good volume for a headphone driver, terrible volume for an open air speaker driver

It might go into your next AirPod, it won't go into your next MacBook.

It looked like the spl levels at low frequencies were comparable to other IEMs, with a peak in the high frequencies surpassing the other IEMs, and at a lower THD. The peaks could easily be equalized within an active system. These seem ideal for optimizing small Bluetooth audio devices like the AirPod or hearing aids, so I don’t think any of your cons apply for this use case. I agree with the final statement, but it’s not clear that the first 3 are actually non issues in an AirPod scenario.

The peaks could also be tuned away acoustically with some foam with highly specific thickness. Would be pretty easy by high-end IEM standards

Almost all BA IEMs use in-line re-radiator “resistors” to smooth out peaks.

> When something is too good I will always ask, what are the trade offs?

Sometimes there aren't trade-offs. The thing about new technology is that it pushes out the "Pareto frontier", that configuration surface along which trade-offs happen. A classic example is the "fast, good, cheap" trilemma. You can imagine this trilemma as a geometric object (say, a hypersphere) and the particular "fast, good, cheap" trade-off you make as a point on the surface of that object. New technology makes the whole object bigger.

Does the v0 technology have important disadvantages? Sure. LEDs started out being red and dim. Now LEDs are the "too good to be true" option, beating most other lighting sources most of the time in most domains.

What does MEMS stand for? I mean, isn’t it Macro electromechanical system at this point? Isn’t that hilarious?

But the response curve of prev-gen looks just fine to me, I wonder if next-gen Etymotic Research IEM would use a device like this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact