NPR's Tiny Desk concerts (when they were actually at NPR) regularly blow me away with the production quality. They are so good at it despite all the wonky setups that come through. If you dig around in the comments you'll see my handle in there fawning over them regularly lol.
> If there really is only one mic someone has sold their soul. The sound stage is perfect.
“Note: The secret of the 418-S is that it's two mics in one; a cardioid "mid" condenser capsule facing front, and a bidirectional capsule focused on either "side." It's a mid/side stereo mic that needs to be decoded in post, which allows you to control the stereo width of the image, even after the recording is made.”
> If there really is only one mic someone has sold their soul.
The one mic thing is actually ort of a bluegrass tradition, stemming back from radio and how it was recorded. A lot of bluegrass players learn how to balance themselves around a single mic, moving towards or away with solo and leaning in over the instrument to sing.
Nice! You can tell they know exactly what they are doing, i wasn’t aware of the relationship to bluegrass.
I used to produce a (very) little bit of live music and the difference in ‘babysitting’ required between an professional performer and someone who may have an amazing or even superior talent but little experience was remarkable.
Chris Thile is such a genius, Goat Rodeo (Chris Thile, Edgar Meyer, Stuart Duncan, Yo-Yo Ma) is one of my favourite things in the world
https://youtu.be/O7EcT5YzKhQ
Not exactly Tiny Desk Concert, but close enough, my fave was an appearance by Steven Merritt (of the Magnetic Fields). This NPR series was called "Project Song"-- the challenge was to write and produce a song in two days.
The KEXP stuff also sounds as good if not better than album recordings most of the time. Quite impressive stuff both of them. Of course there's something magical about live sets that often get lost on studio recordings.
Great thing we have sources like NPR Tiny Desk and KEXP so we can discover and enjoy awesome music.
> Of course there's something magical about live sets that often get lost on studio recordings.
I went on a SRV bender a couple of years ago, comparing the studio cut of Lenny to pretty much any bootleg live recording i could find. There’s no comparison to be had, the live ones are just better.
There are a million other examples of course but possibly a surprising one is Miley Cyrus. Her energy during shows just hits highs that seem unavailable in the studio. Her vocals have come into a new era these past few years, especially her covers.
These are all great but holy moly DakhaBrakha is the best thing I've seen in a long time. Just give this a few minutes. Headphones or good speakers please lol:
"We use a simple Neumann U87 microphone as the house-standard microphone at all of our facilities. They’re expensive, but that’s what we’ve used for years."
The problem isn't the mic though, it's that in the earlier days of radio there was a trend towards boosting the bass artificially in the microphones to make the host sound more authoritative. Howard Stern is BIG time guilty of this. NPR doesn't do this, and cuts the bass picked up by the full-range mics using a channel EQ (or mic built-in) to eliminate plosives rather than employ lots of pop screens and boosting the bass. Using a full-range mic for vocals means ultra low-end is preserved, and that's not always desired for replicating the human voice accurately. This is also how most vocals in music are treated, there's no reason for all that low-end mud, so they're high-pass filtered heavily as a matter of course before the rest of the vocal effects chain is applied. Pop vocals in particular are way thinner than people realize.
As far as mics go, if you don't want to pay thousands for a Neumann, the Austrian Audio OC18 is a fantastic mic with a similarly flat response and has a 3-way switch for different levels of high-pass filtering before the signal even leaves the mic. It's fast becoming my favorite mic to use in the studio.
To clarify a bit, I think that, by "artificial", you mean the boost does not correspond to how the human voice actually sounds, which is true.
But in another sense, it's not artificial. It's a natural side effect of the physics of how microphones work. In building microphones to be directional (favor sounds from, say, in front), they've also made it where the amount of bass picked up is heightened when the mic is very close.
So NPR is artificially (with a high-pass filter) removing a natural side-effect (of directional mics) to avoid getting artificial-sounding boomy bass.
Also, this is one of those accidental invention things where what was originally a side effect has turned into a valued, essential feature. Like guitar amp distortion is part of the electric guitar sound. Or like how resonator guitars (Dobro, National) were invented to be louder but now people like the tone.
There could also be a cultural aspect of this. In English, the lower you speak, the more respectable you are. It is borderline ridiculous when you listen as a foreigner (when voices are cutting off or rattling), until I learnt how to use it myself ;) Anyway, I speak with much higher pitch in French, and perhaps the bass mic is important for English speakers, but wouldn’t have had such an effect on European radios, where, maybe in Spanish, high frequencies would be important because the faster you speak, the more interesting you are? Consonants are much more important in latin languages.
I haven't noticed this so much in English, but it's egregious in Spain, where actors in shows speak an octave lower than their normal voice. It sounds very forced to me.
The basic vocal technique for a deep voice requires relaxation. Different parts of your body resonate in different registers, and the “deep” voice is usually a chest resonance. Trying to force it creates tension in your chest that interferes with your voice.
I just realized how true the thing about the faster you speak the more interesting you are is in Spain, France, Italian... Pretty terrible for a language learner though.
> But in another sense, it's not artificial. It's a natural side effect of the physics of how microphones work. In building microphones to be directional (favor sounds from, say, in front), they've also made it where the amount of bass picked up is heightened when the mic is very close.
> So NPR is artificially (with a high-pass filter) removing a natural side-effect (of directional mics) to avoid getting artificial-sounding boomy bass.
He says later on in the article that they try and get people in the studio to not talk directly into the mic but across it. So in some ways they are trying to correct for the issues caused by strong directionality before they get to artificial things like signal filters.
So, this is not quite correct. Talking across it means that you are aiming your voice to the side of the microphone, but the microphone is still aimed directly at your mouth. It’s not the directionality of the microphone that is an issue, it’s the air coming out of your mouth.
Low frequencies carry an awful lot of energy and you will get maximum dynamic range out of a mic by close-micing but HPF'ing off the lows early in the chain. Many condensers have little preamps inside them, and the HPF may be placed before this pre giving it effectively a lot more headroom.
It’s because a HW circuit is easier to engage, more reliable, higher fidelity and requires less maintenance.
I don't know if you’ve worked with modern audio software but its truly a tangled stack of complexity, incompatibility, license management, etc etc. It can sound and work great once set up, but its touchy stuff when it comes time to make changes, update the OS, etc. as we all know software is notoriously buggy which is far from ideal in a live scenario.
Two contributing factors, I suspect. Most importantly software can only deal with whatever hits the digital side; physics and electronics being what they are there are lots of places to lose information on the analog side. If you filter out that range before you add gain you probably get way better SNR and don't have to rely on some ridiculously sampling and high resolution to not lose things. Second - they have been doing this a lot longer than sophisticated software pipelines have been available.
Software pipelines only began to get into radio some 20/25 years ago. NPR started in 71
Also software can't do magic (and you aren't processing each microphone digitally), you want to be your signal to be as best as it can as close to the source as you can make it to be.
Disagree, because the issue here is where the filter is. As others have noted that a condenser mic has an internal preamp, it is most useful to filter inside the mic before amplification to preserve S/N ratio. I suppose you could do that with software in the mic, but that seems like a lot of effort for maybe a worse result.
Further down the audio path it eventually makes sense to digitize, but if you didn't have the HPF in the mic your noise floor will be worse.
I may be wrong (its been a long time since I read it, and it may be just something that fits really well thematically with the book but wasn’t actually in it), but I think that was Michael Crichton’s Congo, not a Steven King book.
I always figured it had more to do with the kind of systems were using back in the days of old - flat and dull sounding by default. Most cars I remember being in, in the 80's and 90's maybe had quad 4" speakers with no tweeters, and a portable radio or walkman was pretty lo-fi sounding as well. So the bass and the highs would be hyped up to make it sound better for the average listener?
You can also just go for the similar Neumann's that are lower down in their line. I have a TLM 103 and recommend it highly, although it doesn't have the high-pass switch so you'd want to do that via software or preamp.
> Pop vocals in particular are way thinner than people realize.
Thanks to near constant use of auto-tune I think most people realize pop vocals are thin.
Edit: clarification to remove accidental contradiction. I initially ended with "... I think most people realize that," which would have essentially translated to, "Most people realize that pop vocals are thinner than most people realize."
Autotune and melodyne are just standard now. Good usage is not really detectable. What people forget to note is that you still need to know how to sing in the first place. Autotune plugins can only do so much...
These plugins really exist to save time for large studios, not make bad musicians better. Time is money for studios, so they don't want to waste it on multiple retakes when someone can be close enough to make small fixes with melodyne. For session work, market effects still pressure people to, well, not make mistakes like that. A great singer is still going to be in higher demand than a decent one, because then the studios don't have to spend much time at all fixing their vocals.
Also, -noticeable- autotune can be desired. It's a musical choice. In that sense it's no different than using a vocoder, etc. I personally do not like it but that's the beauty of music; there's something for everyone.
I used to not like autotune at all in music, but then I think I heard an interview with Grimes (?) who basically said (paraphrasing) "oh, I love autotune. Yes it's artificial and detectable, BUT it brings the vocals even closer to the music, which makes a more powerful impression.
Ever since then, it's not bothered me nearly so much when the vocals are tuned. The track hits harder. Yes: it's true the voice loses some of it's natural beauty, but in turn, you get music and voice that follow perfectly.
I think the big difference there is trying to use it to just hide imperfections vs. consciously making it a conspicuous part of the music. For someone like Grimes, adding in blatantly artificial manipulation fits in perfectly with the rest of her aesthetic.
Autotune used as an artistic style is something I used to dislike out of... snobbery, I suppose. It's a perfectly valid choice for an artist.
Now, I really hate hearing it in kids' songs. Sung by kids, for kids, and sounding so flat and blah. Much like laments against modern "beauty" productions, I think excessive autotune presents kids with an unrealistic expectation for their own voices.
It is totally detectable, unless it truly is a one off tweak. But that is almost never what happens. Maybe a great vocal will get a tweak to save otherwise great take, and that is fine. Good thing.
But then the whole production sees similar things all over the place and it gets cleaned right up technically. Time, levels, the works right?
And the energy is diminished, could be lost.
Like fashion, this will all cycle in and out. Young people hear the humanity in music made prior to these and other tools and it appeals.
Little things, like a change in tempo, small vocal errors, inconsistency, all add up in a track.
I bet some time from now, could be as little as a decade, maybe two, we will look back at all this and chuckle.
Like you say, there is nothing technically wrong with any of this tech. And it could all be used very differently from how it is today too.
Recently, I have been going back through great live shows. Fantastic! And I still get that tingle from the realization someone delivered it live, to a crowd. And yeah, not so perfect, but oh so very human too.
Good application of it is not, no. When we hear obvious autotune vocals, it's a deliberate aesthetic choice.
I believe what you're talking about is how modern production is about producing "perfect" song recordings, and mapping everything to a click track/beat grid. Now that is totally noticeable compared to music made a few decades ago. I do agree that it makes music sound sterile. This is separate to autotune/melodyne being used.
"I bet some time from now, could be as little as a decade, maybe two, we will look back at all this and chuckle."
Maybe the main industry studios will, but music in general isn't determined by what those folks are doing. There are more indie publishers than ever, and so on.
for what its worth T-pain has pretty conclusively proved that he didn't need autotune, he just used it as a gimmick to stand out: https://www.youtube.com/watch?v=CIjXUg1s5gc
A short, but hopefully relevant anecdote: I play the sax. A musician friend called me last summer to get me to do a part on a new song he’d produced. Since it was during the summer surge I said I’d do it at home and send him the part, but he mentioned he had a Neumann mic for me to record with. I was curious, so I packed up and went to his place which he’d set up largely outside. I played my parts, then went home. When he sent me the result I was floored! I’ve never sounded so good - seriously. I asked him what plugins he’d used and he’d just added a touch of reverb, but nothing else. It was all me and the Neumann mic. Those things really do have a magic quality about them. There’s a reason people are willing to pay more than the cost of my sax for one.
Neat, I play sax too and one of the best live mics I've used at jazz fests was the RE-20, the same one they talk about in the article as a next best. Now I'm even more curious to try the Warm Audio imitation U-67!
Thriller (and tons of music before and after) was recorded on the Shure SM7. You can buy that mic for $400. In fact, you've seen this mic used by podcasters everywhere.
The mic cost is almost irrelevant though. A good mic will last decades unless abused. Let's say you want a variety of sounds. You buy a bunch of instrument mics (probably $100 each) and a few matched pairs of all the most popular vocal mics (most of those will run 1-2k per pair). You'll probably not spend over 20k in total. Over 20 years, that's only $1,000 per year or less than $100 per month. In that same 20 years, you will have upgraded your digital equipment several times at an expense far greater than $1,000 per year (upgrading your $3,500 macbook every 4 years is the same amount of money).
If you make your money with those mics, that cost is hardly worth mentioning. It's like people complaining that ergonomic keyboards cost $300. The keyboard will easily last a decade or more (only $1-2 per month to save a lot of future pain). In that same time, you'll probably spend 10k+ on other equipment. Same thing with monitors where $1000 will far outlast that same amount of money put into the computer itself.
Drastic differences between the three! The Røde is certainly acceptable for most circumstances, but the Fifine sounds like absolute crap in my opinion.
The Yeti is popular because it's a USB microphone and it got in early. It's not a bad microphone (at all, somebody telling you it is wants to sell you something) but it's generally misapplied in most settings where it finds itself.
For simple spoken-word stuff like conferences or streams or whatever, something like a Samson Q2U or an AT2005USB/ATR2100 are less sensitive to unwanted noise and easier for an untrained user to get a good sound out of, while moving into the XLR space gets you access to better dynamic microphones and also some pretty reasonably priced condensers that do quite well (though there's some up-front investment in the audio hardware, of course).
I own a yeti (and a yeti pro) among quite a few other mics.
It's actively bad for most people for one reason: capacitive mics pick up everything.
If your room isn't soundproofed, it will be very hard to keep noise out of your recording. Dynamic mics are much less sensitive in this regard.
I would instead recommend a Samson 2Qu or Audio Technica ATR2100-USB on the low end ($70-100) or the Shure MV7 ($250) on the high end for plug-n-play mics.
If you want to move into a cheap audio interface (eg, Focusrite Scarlett Solo + cloudlifter), I'd recommend either the Shure SM7b or the ElectroVoice re20 on the higher end and the Shure sm57 on the cheaper end (good enough for the president to use for the last 40-ish years).
> It's actively bad for most people for one reason: capacitive mics pick up everything.
This is a myth that's popular with podcasters. If you get as close to a condenser mic as you must with a less-sensitive dynamic mic* and crank down the gain accordingly, you'll find that condenser mics don't magically capture more ambient noise than than dynamic mics.
* Using a fist as a measure, your mouth should be between 1-2 fists away from the mic.
With a good preprocessor (I use a Symetrix I got from an old radio station), I can crank my dynamic mic (EV RE320) to levels that will pick up anything happening in my entire house, with my office door closed.
It's just that the levels from condenser mics tend to be hotter, so by default you hear more stuff in the room unless you get in close and turn it down.
There's no way to replicate the 'radio' sound if you're 2 or 3 ft away from the mic.
I have a yeti and XLR mics and lavalier mics. For ease of use without hassle the Yeti is good but you must use the right setting and account for gain. It picks up a decent amount of ambient sound. That extra noise will muddy your vocals. I’ve gone the route of a simple lavalier setup for most of my video calls and presentations.
For what it's worth I'm an audio engineering expert, produced albums, broadcast stuff, and used to review professional studio audio equipment for a living for a national magazine.
People like it because it's simple and it looks cool.
If you want something that has the same basic usability, ie plugs directly into USB and is really easy to use with computer audio, buy the Apogee Mic Plus.
I recently experimented with pretty much everything in this category and was very happy with this model, bought a dozen of them for use in a virtual conference series, where I wanted something I could send to non-technical people who'd never be able to navigate a pro audio interface. I've been very happy with it so far.
Sony C800, anyone? With parts for manufacture being hard to source for Sony, I’ve seen these for nearly 20k, list price (not sure if they actually get that much), second-hand.
But then, professional equipment never had economies of scale.
Almost nobody uses mics like C800gs for podcasting and radio, though. The Sony c800g is one of the best vocal mics in the world, generally it will be found in high end studios. A U87 is very expensive for the purpose of radio/podcasting. The RE20 and the Sure SM7B are very popular for podcasting/radio and are around $400. NPR certainly aren't the only people that use U87s for radio/podcasting but they are in the minority. U87s are probably the most popular studio mic in the world for professional studios recording vocals but for this application it is accurate to call a U87 expensive.
For podcasting yes, but it seems like an insignificant cost relative to the rent, talent, and transmission equipment in an analog radio station. Are $3000 mics actually rare in radio studios?
From my experience yes they are rare. So rare, I've never heard of a radio station that uses U87s. I've only ever worked on one radio show and we had an SM7b. You can watch a lot of radio stations live video streams these days and you will probably not find a single U87. It will be all RE20s and SM7Bs. It isn't as if a radio station has a single mic. So the calculation is ($3000-$400) x number of mics purchases per year. Radio stations use their mics for hours and hours everyday, they get a lot of wear and tear. Additionally, the SM7B is a dynamic mic and is probably more forgiving in noisy environments than a U87. I know Joe Rogan is a podcast not radio but he makes millions of dollars off of his show and he uses SM7Bs.
I will also say, the NPR person interviewed seems to have a negative view of the RE20 and SM7b compared to the U87. Despite the low cost, SM7Bs are actually popular studio vocal mics. One was famously used to record Michael Jackson's vocals on Thriller (actually it was an SM7 but I believe the SM7B reissue is almost identical). When recording voice, there normally isn't a "silver bullet" mic that is the best for all voices.
From Sony, but check Reverb... Sony has been out of stock for over a year now. The diaphragms are hard to produce/source, so second hand just keeps going up without Sony putting new units out.
The bass roll-off is an important factor. If you listen to your other top-40 radio stations, the DJs sound like they are pronouncing the hits from the top of Mount Olympus, with thundery basses and reverb designed to shake you awake and make you pay attention. It's frankly exhausting to listen to, and NPR's attention to this small thing makes it possible to listen to people talking for hours on end.
Top-40 audio engineers are not wrong. Just as you wouldn't format long walls of text and single short phrases the same way in typography, you wouldn't mix these two the same way.
> Just as you wouldn't format long walls of text and single short phrases the same way in typography
Topic drift: I hammer on my students that contracts are much more readable if done in short, single-subject paragraphs without long wall-of-words passages.
I always thought that legal language looks like C code that heavily relies on macros after it has been through a preprocessor.
Don't lawyers have effective ways to include and reference things, create standard definitions and procedures without pasting the same stuff everywhere?
> Don't lawyers have effective ways to include and reference things, create standard definitions and procedures without pasting the same stuff everywhere?
In some fields, yes — but as a class, lawyers: (A) notoriously prefer reinventing the wheel, and (B) sometimes could be suspected of hoping that the MEGO Factor — Mine Eyes Glaze Over — will cause the other side's contract-draft reviewer to overlook something that the drafter buried in a long, wall-of-words provision. I see that happen pretty regularly.
(In the 1990s I initiated and headed up a project for the American Bar Association Section of IP Law to try to standardize the wording of various building-block clauses for software license agreements. [0] The chief IP counsel of a Fortune X company [X being a very-low number], whom I knew pretty well from the Section, said he was opposed to having any kind of standardized language because, he said (paraphrasing), "I want to be free to be an asshole.")
>Don't lawyers have effective ways to include and reference things, create standard definitions and procedures without pasting the same stuff everywhere?
That could actually turn out to be worse. Take a look at a lot of federal bills. They're written like:
'In 8 USC 552(b)(ii) strike the word "foo" and insert "bar baz"'
You then have to go cross reference everything for every line. It's a nightmare. If the bill was written in a computer readable diff format instead, that could be better.
You wouldn't believe the lack of efficiency in law firms (much of the cost of which is passed to clients). When I tried selling SAAS to law firms, there was a degree of resistance because efficiency threatens the charge by the unit model.
> Aren't some contracts designed to not be readable with long drawn out passages?
From the oleaginous Francis Urquhart in the wonderful original (British) version of House of Cards: "You might think that. I couldn't possibly comment." [0]
I was always impressed with how crisp remote guests or hosts often sound. Rather than sounding like they've called in, they sound like they're in-studio. Not terribly difficult to achieve, the remote person likely has a good audio setup and sends that recording to the engineer to mix together. Still, a nice touch.
This is one of the last, best uses of ISDN. Guaranteed latency, ultra low jitter, and plenty of high-quality hardware purpose-built for getting the best possible studio audio over 2 bearer channels worth of capacity.
ISDN still must be in heavy use in radio. Sirius XM has been using hardware IP codecs since the pandemic began, and once a three hour show you might hear garbled audio for a few seconds.
Sample rates in audio hardware aren't like programming constants, where they're the same for everybody. Over 30 minutes, a 0.05% sample rate error gets you 1s of drift over the recording. As a reference, USB 2.0 has a 0.25% frequency tolerance (and is used to clock many audio devices).
Cheap quartz clocks in computers and some USB ADCs especially are prone to slightly changing their rates depending on temperature. So the sample rates can differ relative to each other.
The clock drifts. Something needs to count those seconds. Even when the drift is small, phasing distortions become pretty obvious on lengthy recordings.
There's some interesting work going on in the AES to support synchronised audio over wide area networks, either through better recovery of PTP clocks distributed through WANs or using PTP with GNSS.
Maybe actual clock differences? Not sure if that's the case, but in audio engineering, a separate clock may be used to keep all devices involved in-sync (many pro-level audio devices have a "clock" input for this very reason).
In RF engineering, it's typical to have all of your equipment referencing the same 10MHz clock (or a 1 pulse per second or IRIG-B). If I don't have a GPS receiver or a rubidium source, then I'll just pick the newest, most expensive piece of equipment with a built-in reference clock and fan it out to the rest of the equipment on the bench. Some portable spectrum analyzers have built-in GPS receivers so even out in the field you know you have a good reference.
Huh, I've consciously thought in the past of this as an outsider and concluded that by now it's a common enough task so of course they must've had an algorithm for doing it automatically.
As someone who worked as an audio engineer, solving problems before they can occur saves so much time and headache. There's no reason to faff about with software or complexity-inducing algorithms when the whole problem can be fixed by toggling one switch.
Technically you could accomplish the same thing by applying a parametric eq to the master buss, but then you're no longer software agnostic.
It's like photography; sure one can post-process photos in photoshop. But getting everything right before taking the picture, at a hardware level, simplifies things for everyone involved.
There are plugins for different scenarios, but it turns into one of those problems where hearing and correcting issues is much easier for humans than computers. The tools available make it easier to fix problems, but it still takes a recording engineer to spot-check.
Do you have any insights you can offer on how best to do this? I have to deal with drift issues on signal processing of .wav files, and I have always used a marker pulse every so often.
Interestingly one of the most enduring shows on NPR is Fresh Air with Terry Gross and she traditionally has not had her guests in the studio with her over her 40 plus years hosting the show. She has even spoken about how she's been able to use this to her advantage. This following is quick read on this:
I should say, the shows mentioned in this article are actually produced by NPR but most of what you hear on a given public radio station isn't. And also, NPR doesn't control the broadcast.
I know some people love the NPR sound and find it intimate and comforting, but I often have to turn it off because the mouth and saliva noises drive me crazy. Misophonia is not fun!
This became worse during COVID as many of the presenters work from home and aren’t as savvy with their momentary mute. Lots of swallowing, coughing, and nose whistling. Particularly during Morning Edition.
If it is a female my misophonia suggests it is Mary Louise Kelly. I've gotten used to it. There is also a male voice on one of the weekend shows that might be who you're describing.
I wrote in to a cable TV show a decade ago to call out the nose hair whistling and mouth sounds. They never replied to me, but they rolled off the highs for the remainder of the shows.
There was one NPR broadcaster who used to read the news on weekend mornings for my local station (WHYY), I forget her name, and I haven't heard her in a while thankfully, but she literally whispered the news.
It was like someone I don't know, whispering sweet, unsolicited nothings in my ear. Felt uncomfortably intimate in a way I hated. I was always like, "Lady, I don't know you like that, so cut it out."
Years ago when I moved for my first job out of school I decided to set my clock radio to the local NPR station (KERA Dallas) to wake up to the news. I had to switch to a hard rock station because I'd fall back asleep to their soft voices.
"And they want to talk about the crazy ways that young women are speaking. And the first thing they do is attribute it to young women, even though young men are doing it too. So it's a policing of young people, but I think most particularly young women."
Rather, once you've randomly decided that that's a horrible thing.
Men (and women) have spoken with vocal fry for the past several millennia, but I don't recall reading of anyone being annoyed by it until recently when everybody decided that millennial women speaking like that on the radio was anathema.
Back in the day, whenever Mr. Overby would come on the radio, I would have loads of fun imagining that he had just been rudely interrupted at the dinner table to do his segment. Then, I would guess at what kind of food he had in his mouth as he gave his hurried report. To me at least, the particular tone of his voice made it sound like he was eating mashed potatoes or a sandwich, and he wanted to get back to his dinner with his family. Yes, I have a colorful interior life inside my head. Don't judge me. But seriously, he has a very unique sound to his voice. I tried to look up his background and where he grew up, but couldn't find anything. It's surprising to me that he hasn't done voice work or animation, as he would be perfect for those roles.
It's a lot worse with tiny headset mics. Large diaphragm condensers like the Neumann will give you that level of detail, but some of the tiny headset mics HYPE that level of detail unbearably. The Neumanns will at least fail to exaggerate what is already unbearable for you :)
Speaking of signature sounds -- many local radio stations (including my local NPR station, KUT) seem to be using some new technology. I frequently hear audio artifacts which sound like a ~1.0-1.5s 'skip-back'. It's like what you imagine something might do if it were streaming the audio and hit a gap/buffer underrun.
This all started in the last 1-2 years. It's not extremely infrequent, I hear these during prime driving times and probably around once/week. I know for sure I have heard it on at least one non-NPR FM station. I wonder anyone else has noticed the same in other markets?
That was my immediate guess as well. I almost always encounter this with music radio stations when I leave my parking garage and the signal flips to HD. With music it’s very easy to hear the quality improvement at the same time as the “skip,” and in my car there’s a little “HD” icon somewhere on the radio interface.
Oh, of course! Yes, that's a great explanation. In that case I wonder if there's latency in the signal -- like the embedded HD content is out of phase with the analog content?
So maybe the problem is really just a defect of my car's radio when toggling.
The digital version is delayed, the station is supposed to delay the analog feed an equal amount so that the transition isn't noticeable. In the early HD radios it was common for it to be way out of sync, but I haven't noticed it much in a long while.
I think there is a little bit. In my car the difference in audio quality is really noticeable but if you don't notice an improvement then you can usually turn the HD part off so you don't have to deal with the frequent switching if it is an issue.
Huh, I never knew that he was also a writer/journalist. Come to think of it, that explains why he includes so many interview segments in his video. Thanks for pointing that out!
I love the podcast. I have Pocket Casts set to skip the first 7 minutes or so, while they're opening trading cards they bought on eBay and talking about non-Star Trek things.
Then when they start talking about the episode it's fun and nostalgic, and they make astute observations that I haven't heard elsewhere.
For anyone even vaguely familiar with audio engineering and recording, these tactics are not profound. Not a bad thing because in the end, less is more.
Worth mentioning that a good mic is arguably the 20% input that contributes to 80%+ of the output/audio quality, as supported by the article.
#6 is really the only non-obvious point. Apparently this is a major subject of debate.
1.) If you can afford it, use the Neumann U87 mic (~$3.5k)
2.) High pass filter (~250hz) on the vocal chain
3.) To avoid plosives, don't speak head-on into the mic. Speak off the side, on a diagonal. Use a pop filter.
4.) Design your studio to minimize reverberation. Make sure the recording space is isolated and there "aren't a lot of solid walls." Absorb sound with baffles, sound panels, etc. Counterintuitively, a larger room with more diffusion is better than the opposite.
5.) Minimize ambient sound. Your mic will pick up everything from fans to CPUs to electronic interference off computer screens. This noise will muddy up the recording.
6.) Minimize processing or compression of the signal before streaming, or in the case of radio, sending to the satellite.
BBC Radio 3 uses no dynamic range compression, so might be most comparable to NPR (although it's likely that each local station applies a ton of compression before the signal hits the air).
Most (other) radio stations apply copious amounts of multiband dynamic range compression on their output - with the nickname of "sausage-making", since the process turns waveforms that look like music into waveforms that look like sausages. In the FM days, louder sounding stations were associated with better signals, so got bigger market share...
250Hz high pass seems too high for male voices in the baritone or bass range. And depending on whether the female in question is more of an alto vs soprano 250Hz might still be too high.
The cheap Behringer mixer I use for voice chat, karaoke and so on has a selectable 80Hz high-pass filter, I can't remember ever switching it off on the vocal channels, except to parody that Howard Stern-esque huge bottom end with heavy compression radio host thing.
Using a decent microphone (AKG D5 in my case) and a little bit of tweaking (just a low cut and some compression is a good start) instantly puts your sound quality in voice chats so far above everyone else using cheap headsets or their laptops' built-in mics.
Anecdotally I've found that sounding more authoritative makes people listen a lot more to what you say, instead of zoning out.
It's not a hard cut off, it's an attenuation. If you set the room up right, it should basically undo the proximity effect so you getting something closer to how you would actually hear that persons voice.
Of course if you had it on and were further away from the mic, you'd thin out lower voices. Just goes to show micing people (or instruments) isn't entirely straightforward.
It depends on how close the speaker is. Getting that close creates a large proximity effect. The rolloff filter starts at 1k actually but is around -10db at 150hz [0]. I wouldn't use it unless one is close to eating the microphone.
Shout-out to the audio/mixing engineers who handle NPR Tiny Desk concerts. Every single concert I've ever listened to sounds phenomenal: well-mixed with a surprising clarity and very little noise, considering the venue and how crowded it gets "on stage" with larger ensembles.
Ex-radio engineer here. The secret sauce is certainly not the mic. All radio stations do have excellent mics already.
The NPR trick is no solid walls. Eg the Tiny desk room is no normal soundproof studio box, but a normal office room with lots of bookcases, non-empty tables, not many solid surfaces like walls or screens. Having less mics and natural lighting also helps a lot.
He didn't talk about the light. Glass windows are terrible. A normal studio box has a large window, which causes more reflections, more than normal windows at the side. In their case I think they just put some plants there.
I chuckled at the headline because NPR’s audio is a long-running inside joke in my family, particularly that you can so often hear what we call the “mouth noises” of the host (lip smacking, etc).
I keep wondering why no-one has trained a CNN to turn low quality audio into crisp NPR sounding audio (say). Surely it’s even fairly easy to create test data for such things?
With enough training data I’m sure it can be done... I’ve seen CNNs that fill in 3D scenes and animate them from two images. I would guess this was a simpler problem?
Well, one thing to note is that humans spend more time processing audio than video. Bad audio is immediately noticeable and aggravating, compared to spotty video with clear audio.
I -guess- CNNs can look at e.g. reduced frequency range recordings (like phone calls), and attempt to reconstruct them. However this seems like an arduous mountain to climb, as people's voices are unique. So are their environments and signal chains. I really doubt that something that generalized would work very well at reconstructing a specific person's voice and recording.
This also gets into the problem that it would be constructing a new reality, not recreating it.
Yes, you're probably right - I hadn't thought that the voice + noise is going to be one thing, extracting just the noise will be difficult, unique and maybe not trainable at all.
I have thought about the constructing a new reality thing - I wouldn't be surprised if models ended up being trained to misspeak words which could get confusing...
I think the first step would be to be able to convert a lossy format as MP3 or ogg to a lossless format as FLAC or Wav. Being able to retrieve close in off lost data from lossy compression would be impressive.
MP3s generally have a shelf around 16-18khz. The rest of the audible spectrum data is impossible to retrieve if it doesn't exist. This is why transcoding from a lossy->lossless format is a bad idea.
Yes, I know and that's what would be more impressive for me to see a ML/Algo that could recreate close enough data distribution from lossy to lossless. Not exact reconstruction but close enough in the possible area (so < 16-18kHz). It may or may not be possible but it is more akin to take the inverse of the model of degradation of MP3 (it could be totally impossible).
Yes, it can be. The reason it's built in to the mic is to protect the mic's output transformer from distorting when recording louder sounds. Most newer/cheaper designs of condenser mic use solid state outputs (unless they're deliberately apeing a classic design) which typically are less easily saturated by loud bass sounds.
I'd guess NPR's view is along the lines that, well, the filter's already there, and we like the way it sounds, so we keep using it.
It could, but the results might not be what you expect.
Maybe you want to preserve the bass of interstitial music or program audio jingles or environmental effects or something. Doing the processing after the mixing means it affects the whole mix. Doing it at the input means you can tailor each element.
Worse, because bass has an outsize effect on the total energy in an audio signal, if there's any sort of dynamic range compression while the bass is still included, the presence of the bass triggers that compression to happen. Later on when the bass is removed, the remaining audio has inexplicable fluctuations in its volume, which can sound super uncomfortable.
This "program level bouncing around in response to a signal which is not part of the program audio" effect can also come from side-chain compression, and arguably filtering after compressing may be a form thereof. Once in a while it's done to great artistic effect in music, but in talk settings it's almost always horrible and disorienting.
I had just made a comment about how, yes, the studio engineer could do a high pass at the console during a live session but then I had a thought. Perhaps they train everyone to use the HP switch so when they are doing recordings without an engineer that they still get the same sound...? IDK... To me it sounded more like a "policy" than a true part of their "sound". In other words... it's a training function that makes the engineer's job a little bit easier.