In my experience, the weakness of current conferencing systems isn't in the software - it's a hardware problem.
Specifically, their ability to capture and transmit usable speech is really poor. Between echo and poor noise gating, I'm not exaggerating when I say that I can make out fewer than 1/3 of the words in my team's conference calls.
I don't care how efficient the video codec is, if I can't clearly understand what the other people are saying, it's all useless.
Are you sure the problem isn’t the phone system? We use Polycoms with Skype and everything is crystal clear. Our AV guy says Polycom is the best you can buy at that price point, and he’s trying to sell me $15,000 systems.
We only ever get reports about problems when people are actually dialing into something over POTS.
Its likely a network problem causing large jitter buffers (see my other comment). Buying Polycom won't solve this problem, unless the jitter buffer is in adaptive mode and its wildly overestimating the jitter (a problem I've seen on Polycoms and Grandstreams before).
I think the latency is the killer for our remote conference calls. So much extra delay and effort goes into not talking over each other that isn't an issue in a live meeting.
Sounds like your conferencing system has large jitter buffers. If your internet is stable, you should be able to reduce the buffer.
To test your network, go to https://www.dslreports.com/speedtest and see if you have significant jitter or bufferbloat. If your rating is an A, then your good to go, otherwise its time to clean up your local network :P
Echo cancellation and voice activity detection (VAD/what they use for gating), AGC, etc is all done in software in modern VOIP (or firmware? It's on a dedicated processor). And it's quite good if you spend the money on it.
Those cheap Cisco conference phones do sound terrible, and I don't think people replace them very often.
Money isn't a fix all for AEC, frankly Google's WebRTC AEC is the best commercially available AEC. You can see it in action when on a call over Signal Private Messenger, it knocks out background noise (eg: loud car noises if your walking down the street) like a champ.
Like as a standalone product or integrated into conferencing systems? Because there are plenty of solutions made by audio companies with more institutional knowledge and time invested than Google.
Google, Mozilla and all the organizations that have collaborated on WebRTC have spent 7+ years tuning this component. Much like Opus versus AMR-WB, I doubt there is anything notably better in the proprietary software/hardware world.
The orgs that develop the WebRTC standard have spent years implementing and experimenting with nearly every algorithm that is available for each of the various components (resulting in Opus, AV1, WebRTC AEC, etc), combing research papers daily and often working with researchers directly.
Personally, I've heard (and been used for testing) proprietary AEC and it is way better than anything using WebRTC. The artifacting can be really bad, enough so that it's worth disabling (although noise reduction is quite good, it's the echo cancellation that kinda sucks and harms voice quality, giving you the classic garbling sound).
Stuff like WebRTC/OPUS is limited by the fact it runs in the browser or consumer hardware. It's not bad, it's just a different problem domain than commercial audio, and has different constraints. High quality conferencing equipment runs on a dedicated unit that your A/V integrator installs either in the room, or in a server closet and connects over Dante to the analog front end in the room, and usually is a package deal with good microphones (which could be wireless, or the hot thing now is to use beamforming).
Specifically, their ability to capture and transmit usable speech is really poor. Between echo and poor noise gating, I'm not exaggerating when I say that I can make out fewer than 1/3 of the words in my team's conference calls.
I don't care how efficient the video codec is, if I can't clearly understand what the other people are saying, it's all useless.