If you still think it's a problem, adding good dithering with ffmpeg's quantizer/resampler flags will make the noise floor 6-10dB smaller.
The article (and numerous other sources I've seen over the years) disagrees with you so I'm curious why you're so certain?
"It's true that 16 bit linear PCM audio does not quite cover the entire theoretical dynamic range of the human ear in ideal conditions."
Now, 24-bit may be overkill, but 24-bit is the next step up from 16-bit among standard encoding formats, and as the article notes, there are no drawbacks with 24-bit encoding except greater use of disk space.
"[...] does not quite cover the entire theoretical dynamic range of the human ear in ideal conditions."
Note the words "theoretical" and "ideal".
In your post it sounds like you're claiming that you can regularly hear a difference under normal listening conditions - which contradicts my reading of that sentence.
My gut feeling is that the difference you're hearing is placebo.
To put it another way - either the article is making an inaccurate statement, you're mistaken - or you've got golden ears and only ever listen to music in specially prepared environments.
Monty's gotta monty, and this argument has been going on from the very earliest days of digital: back when people behaved exactly the same way over digital recordings that are now commonly accepted to be excruciatingly bad for a variety of reasons (generally having to do with bad process and wrong technical choices).
You can get a HELL of a lot out of 16/44.1 these days if you really work at it. I do that for a living and continue to push the boundaries of what's common practice (most recently, me and Alexey Lukin of iZotope hammered out a method of dithering the mantissa of 32 bit floating point (which equates to around 24 bit fixed for only the outside 1/2 of the sample range, and gets progressively higher precision as loudness diminishes). Monty is not useful in these discussions, nor is anyone who just dismisses the whole concept of digital audio quality.
I believe it's a combination of imagined differences and barely perceptible differences elevated to implausible heights of significance.
Even if one can hear the difference between 16 and 24 bits it will be almost imperceptible in most listening conditions and when it is perceptible it will on the threshold - and certainly too subtle to affect the quality of the experience in any meaningful way.
96dB is a lot more than you probably think, it's like the difference between an anechoic chamber (nominally ~0dB) and someone jackhammering concrete right next to you (~90-100dB). Add to this that even a quiet room has a noise floor around 20-30dB, and to even hear the noise floor in CD quality audio, a full-scale peak would hit 130dB!
Try generating a sound at 0dBFS, the attenuate it in steps of 10dB and make note of when you can't really hear it anymore. At -50dB the sound is already extremely low and barely audible, and there would still be 46dB of attenuation available.
In addition to this, noise-shaped dither can push the noise floor towards frequencies where the human ear is less sensitive, giving a perceived noise floor of around -120dBFS. In other words, 24-bit audio for distribution and listening is absolutely pointless and has absolutely no audible difference when compared to 16-bit audio.