I don't know of a website that compares audio quality specifically, but I feel this still is the greatest impediment when it comes to ANY video or audio conference system. I get the impression that there is no system out there that allows for decent 'full duplex' audio, equivalent to a normal conversation where both sides might interrupt each other at times. I think this comes down to
a) background noise reduction
b) echo/feedback cancellation
c) input latency
I assume input latency may not be such a big issue anymore, with ultra low latency codecs (presumably) being widespread now.
Feedback cancellation is a problem if any side of the conversation is using speakers, rather than headphones. I guess chat services meet at the lowest common denominator, making fairly conservative assumptions and often cutting audio aggressively.
My personal list, best to worst, in respect to my requirements above:
I haven't tried Jitsi, and WhatsApp seems to work pretty well even for video calls, but I don't think it scales for normal use in a company. In my experience Zoom beats Hangouts and Skype every time.
Software aside, when some attendees will often be in the same location consider using https://www.owllabs.com/meeting-owl. It certainly improves the experience for the remote attendees.
I think zoom is the best for 1:1 and group chats after trying a number of such services. But there's still moments of dropped packets and confusion. You can't blame Zoom for every dropped packet, but what it doesn't do well is tell you when packets are dropped.
I wish video-chat services/clients would be totally up-front about the real-time quality of the connection. There are natural pauses in any conversation, and if you're always wondering whether the pause is natural or a result of a few dropped packets, it makes for a very un-natural conversation. This could be as simple as "last transmission received N ms ago" indicator or something, but I'm sure there are more clever solutions.
I don't think this is an "easy" problem to solve, but it's one that I think most video chat services seem to pretend doesn't exist. Or they implicitly blame outside factors ("we can't fix the network") rather than helping customers live with the realities of the internet ("we show you immediately and in real-time when the network isn't what you expect").
(I've not put Jitsi through its paces - would love to know how Jitsi handles the UX around dropped packets.)
Very granularly: simple tools like the Windows Task Manager Performance tab's Ethernet Throughput graph can provide enough of a clue that a network connection is suffering.
There are many utilities that show this info in useful form in the system tray; perhaps some would be able to superimpose it on top of the video conferencing software (like OnTopReplica).
I often have such OS-level tooling open during VC chats, but it's not natural to have to keep an eye on another tool, especially when you want the "last seen"/latency figure nearly instantaneously so you know the context for lack of signal (human vs machine).
My personal list, best to worst, in respect to my requirements above: