Hacker News new | past | comments | ask | show | jobs | submit login
Making high-fidelity audio sound like it came through the phone (2018) (jonlu.ca)
62 points by jonluca on March 9, 2020 | hide | past | favorite | 28 comments



As an alternative to Fourier Transformations, one can also use classic audio filters applying high pass and low pass filters, or a band pass filter (or similar EQ settings with the aim of boosting the signal around 2KHz, and silencing the signal below about 1KHz and above 4KHz.

Caveat: I love doing this in a musical context for a specific part (e.g. vocals or drums or guitar) amongst many, and so I tend to fine-tune the exact frequencies in the context of the other parts so they evoke the memory and emotion of a telephone line more than the exact specifications of a POTS[0] line.

[0] POTS = plain old telephone service or plain ordinary telephone service


Simple audio filters will have a fairly gentle roll-off above/below their cutoff. Zeroing FFT bins will brickwall the frequencies that are being removed. They sound subtly different.

You can build brickwall emulated filters but they're never perfect, and you have to make a tradeoff between high latency and phase distortions/ringing.

Personally I like FFT filtering because subjectively it can sound more "vintage" than filter emulations.

It's interesting that this is being used to emulate a mechanical limitation - the limited mechanical frequency range of carbon granule microphones and small speakers.

Modern speakers have an amazing ability to record and generate a much wider frequency range from much smaller transducers.


Depends on what you mean by “simple”. If you want a circuit you can build with one op-amp and some passives from some coefficients you have memorized, without simulating it first, you're probably going to go with a second order Butterworth or something and get that gentle roll-off you're talking about. But if you're using SciPy or Octave, you can synthesize a tenth-order elliptic IIR filter and apply it in three lines of code (to me that's “simple”), and it'll be pretty brickwallish.

It's true that you get more latency as you move to lower-phase-distortion IIR and zero-phase-distortion FIR filters. But that's not a reason to filter in the Fourier domain instead, because in the Fourier domain you get all the latency. If you can pay all the latency that way, you can eliminate the IIR phase distortion with filtfilt.

Similarly, ringing in your IIR filter is not a reason to go to the Fourier domain and start bashing bins to zero, because you get all the ringing that way: the Fourier-domain difference between your original signal and the transformed signal is a Fourier-domain impulse, whose inverse Fourier transform is a sinusoid that fills eternity.

Also, just to be clear, and you probably already know this, but ringing and phase distortion are two totally different phenomena.


There's a great plugin called "Engineer's Filter" which can do some more classical filtering (Chebychev, Cauer) to really get that vintage sound.

Zero'ing an FFT bin is neat for the phase artifacts/time domain aliasing.

https://www.kvraudio.com/product/engineersfilter_by_rs_met


Good shout. That sounds roughly like a pre-emphasis filter, as is commonly used in mobile radio applications.

Rolloff for those on the bottom end is usually around 300Hz (the CTCSS block filter), and around 3kHz (channel bandwidth limit) on the top end.

Often those filters were implemented as analog designs, so a FIR with -10dB/dec rolloff would be closer to real-life. An FFT filter can be configured to act like a brickwall, which will sound subtly different.


> Caveat: I love doing this in a musical context for a specific part (e.g. vocals or drums or guitar) amongst many, and so I tend to fine-tune the exact frequencies in the context of the other parts so they evoke the memory and emotion of a telephone line more than the exact specifications of a POTS[0] line.

That is super interesting, now I'm intrigued. What's exactly the difference between the memory and emotion and the real sound of a telephone call? Do you have this kind of approach with other sounds as well?


> What's exactly the difference between the memory and emotion and the real sound of a telephone call?

When mixing a song, one of the many things to consider is, if a specific part is intended to blend in with other parts, effectively creating a merged sound vs. which parts are supposed to stand out. Since I use the telephone effect to make a part stand out a little extra, it becomes more important for it to be focused in a frequency range not already crowded with other parts. And therefore it’s less important if the highlighted frequency is a precise match to a telephone line.

> Do you have this kind of approach with other sounds as well?

Yes, very much so. For example in a very full pop or electronica mix with many parts, a grand piano track can overwhelm and muddy too many frequencies used by other instruments - for example the very low end (left side) of a good grand piano occupies the same frequencies as a bass guitar, upright bass or synth bass. So typically I would make the piano track be less prominent in the competing range, by lowering the EQ settings in that range. And I might increase the EQ settings for the frequency range where I want the piano track to me most obvious to the listener.

— There are fancier versions of that mixing technique via variable multi-band compression with side chaining, but that might go beyond the interest level of the HN audience and take this thread too far off-topic :-)


Just a guess, but I imagine he's talking about how we've been taught certain context clues from media that aren't exactly accurate. For instance, that bubbly low passed sound when a character is underwater or the crunchy punch sounds that aren't what a real fight sounds like.


AMR is a very "cheap" codec to run, if you want something to sound like it's been over the GSM phone system you could just run it through AMR and back again. Possible patent encumberance but implementations are available in ffmpeg.


The "too good" part is due to missing compander - very similar or identical to ADPCM compander for better analog lines. ADPCM itself in form of G.711 is still used in older digital phone exchanges.


muLAW and aLAW are also worth considering, and give a slightly different effect to ADPCM.

The LPC and CVSD series vocoders also have a very unique(ly lossy) sound profile.


A fun thing with LPC is the lpc10 codec built into Sox, making it trivially easy to play with and get a very pleasing distortion.


Coming soon, AI prediction of what untelephonic voices would have sounded like! Starting with a detelephonised One Night in Bangkok.

https://www.youtube.com/watch?v=rgc_LRjlbTU


Hmm... you may be on to something. In the recommended feed was Toto's "Africa" remastered with AI: https://www.youtube.com/watch?v=AqtBdKP_FPs

The lyrics do indeed sound crisp as if they were re-recorded, which is impossible. Anyone know if this was really done with AI, and what this field is called (music restoration?)

The video is uploaded on some random spanish channel and I can't find any information about it by searching.


Now that i hear it, i have to admit, he sang through a telephone. Would really love to here an ai prediction version.


Pots chops the frequency off below 300hz and above 3300hz. The simulation should be almost perfect - but it doesn't play.


It also happened to me. It was a Firefox compatibility issue, it played with Chrome.


Only the first sound file plays for me!? Using Firefox on desktop and mobile.


From the console:

> Media resource https://blog.jonlu.ca/assets/eightkhz_resampled.wav could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)

You can download and play those audios directly: https://blog.jonlu.ca/assets/eightkhz_resampled_unfrequencie... and https://blog.jonlu.ca/assets/eightkhz_resampled_unfrequencie...


Looks like Firefox has issues with playing wav files in HTML5 audio blocks? Not sure actually.

Added an mp3 version, should work across browsers now.


The phone system has been mostly digital "forever." Pretty much by the 1990s, even if you had an analog copper phone line, it was digital by the time it made it to the switchboard.


> EIGHT_KHZ = 8096

Was this deliberate (vs 8192)?


Sampling rates don't need to be powers of two. Usually used sampling rates are e.g. 22050 Hz or 44100 Hz. Maximum signal frequency is half of the sampling rate, and sampling higher frequencies cause aliasing if they're not cut off first. So, 8096 Hz can sample up to to 4048 Hz without aliasing. If their cut-off filter is at 4000Hz, they're on the safe side by sampling a bit more, because the filter isn't perfect.


Every now and then I hear someone call a radio station using a landline phone -in good repair-. The difference in quality (despite the LL bandwidth filtering) compared to mobile-phone callers is often unmistakable. For one thing, the mobile artifactual garbling and missing audio segments stand out.

So it's helpful to stipulate what kind of 'phone audio' you're trying to mangle into 'fidelity'.


Eventide’s classic DSP4000b has algorithms for phone call sonic simulation. The patches are highly modular (and thus can be inspected for analysis - at least down to the blocks used to construct the patch) and it would be interesting to see how that old school hardware DSP approach compares to the author’s design and implementation.


I remember landlines sounding quite different from the author. To me they sound much better than cell phones. It sounds as if the other person is actually on the other end of the line. Cell phones sound very artificial with compression artifacts and such.


He forgot to add in some 60 Hz (or 50 Hz) buzz. Maybe add in a bit of crackling as well since it's now spring and wet around here...


works here, firefox mac, version 74.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: