Honest question: why do we need to optimize for <10kbps? It's really impressive what they are able to achieve at 6kbps, but LTE already supports >32kbps and there we have AMR-WB or Opus (Opus even has in-band FEC at these bitrates so packet loss is not that catastrophic). Maybe it's useful in satellite direct-to-phone use-cases?
There’s a section (“Our motivation for building a new codec”) in the article that directly addresses this. Assuming you have >32 kbps bandwidth available is a bad assumption.
The best assumption would be that you either have connection available or not available.
Then, if it is available, what is the minimal data rate for connections which are available in general? If we do statistical analysis for that, is it lower that 32 kbps? How significantly?
For some reason, I would assume that if you have connection, it is faster than 2G these days.
The question isn't really the minimal bandwidth of the PHY rate it's about the goodput for a given reliability. Regardless of your radio there will always be some point where someone is at the edge of a connection and goodput is less than minimal PHY bandwidth. The call then turns choppy/into a time stretched robot you get every other syllable from. The less data you need to transmit + the more FEC you can fit in the goodput then the better that situation becomes.
Not to mention "just because I have some minimal baseline of $x kbps doesn't mean I want $y to use all of it the entire time I'm on a call if it doesn't have to".
Are there really many situations where a 10kbps connection would actually be stable enough to be usable? Usually when you get these kinds of speeds it means the underlying connection is well and truly compromised, and any kind of real-time audio would fail anyway because you're drowning in a sea of packet loss and retransmissions.
Even in cases where you do get a stable 10kbps connection from upstream, how are you going to manage getting any usable traffic through it when everything nowadays wastes bandwidth and competes with you (just look at any iOS device's background network activity - and that's before running any apps which usually embed dozens of malicious SDKs all competing for bandwidth)?
Yes; backhaul connections in telephony applications are often very stable and are already capacity managed by tuning codec bandwidth. Say you are carrying 1000 calls with uLaw (64kbps * 1000) over a pair of links and one fails. Do you A) carry 500 calls on the remaining link B) stuff all calls onto the same link and drop 50% of the packets or C) Change to a 32kbps codec?
It seems you may be imaging the failure case where your "ISP is slow" or something like that due to congestion or packet loss -- as I posted elsewhere in the thread the bandwidth is only one aspect of how a "low bitrate" codec may be expected to perform in a real world application. How such a codec degrades when faced with bit errors or even further reduced channel capacity is often more important in the real application. These issues are normally solved with things like FEC which can be incorporated as part of the codec design itself or incorporated as part of the modem/encoding/modulation of the underlying transport.
Yes; but what is your point? A congested network like you describe isnt ever going to reliably carry realtime communications anyway due to latency and jitter. All you could reasonably due to 'punch through' that situation is to use dirty tricks to give your client more than its fair share of network resources.
6kbps is 10x less data to transfer than 64kbps, so for all the async aspects of Messenger or WhatsApp there is still enormous benefit to smaller data.
> Are there really many situations where a 10kbps connection would actually be stable enough to be usable?
Yes there are. We ran on stable low bandwidth connections for a very long time before we had stable high bandwidth connections. A large part of the underdeveloped world has very low bandwidth, and use 5 - 10 Kbps voice channels.
Are you talking about the general "we" or your situation in particular? For the former, yes sure we started with dial-up, then DSL, etc, but back then software was built with these limitations in mind.
Constant background traffic for "product improvement" purposes would be completely unthinkable 20 years ago; now it's the norm. All this crap (and associated TLS handshakes) quickly adds up if all you've got is kilobits per second.
I assume the general-ish “we”, where it is general to the likes of you and I (and that zeroxfe). There are likely many in the world stuck at the end of connections run over tech that this “general subset” would consider archaic, and that zeroxfe was implying their connections, while slow, may be similarly stable to ours back then.
Also, a low bandwidth stable connection could be one of many multiplexed through a higher bandwidth stable connection.
Let's not move the goalposts here :-) The context is an audio codec, not heavyweight web applications, in response to your question "Are there really many situations where a 10kbps connection would actually be stable enough to be usable?" And I'm saying yes, in that context, there are many situations, like VoIP, where 10kbps is usable.
Nobody here would argue that 10kbps is usable today for the "typical" browser-based Internet use.
> Are there really many situations where a 10kbps connection would actually be stable enough to be usable?
Yes (most likely: that was an intuited “yes” not one born of actually checking facts!). There are many places still running things over POTS rather than anything like (A)DSL, line quality issues could push that down low and even if you have a stable 28kbit/s you might want to do something with it at the same time as the audio comms.
Also, you may be trying to cram multiple channels over a relatively slow (but stable) link. Given the quality of the audio when calling some support lines I suspect this is very common.
Furthermore, you might find a much faster unstable connection with a packet-loss “correcting” transport layered on top effectively producing a stable connection of much lesser speed (though you might get periods of <10kbit here due to prolonged dropouts and/or have to institute an artificial delay if the resend latency is high).
i live in a third-world country, and simplifying a bit, my cellphone plan gives me 55 megabytes a day. i get charged if i go over. that's 2 hours of 64kbps talk time on jitsi but would be 12 hours at 10kbps
Meta's use case are OTT applications on the Internet, which are usually billed per byte transmitted. Reducing the bitrate for the audio codec used lets people talk longer per month on the same data plan.
That said, returns are diminishing in that space due to the overhead of RTP, UDP and IP; see my other comment for details on that.
More than that, in developing countries, such as my own, Meta has peering agreements with telephony companies which allow said companies to offer basic plans where traffic to Meta applications (mostly whatsapp) is not billed. This would certainly reduce their costs immensely, considering that people use whatsapp as THE communications service.
AMBE currently has a stranglehold in this area and by any and every measurable metric, AMBE is terrible and should be burned in the deepest fires of hell and obliterated from all of history.
Internet connectivity tends to have a throughput vs latency curve.
If you need reliable low latency, as you want for a phone call, you get very little throughput.
Examples of such connections are wifi near the end of the range, or LTE connections with only one signal bar.
In those cases, a speedtest might say you have multiple megabits available, but you probably only have kilobits of bandwidth if you want reliable low latency.
Load ratios of > 0.5 are definitely achievable without entering Bufferbloat territory, and even more is possible using standing queue aware schedulers such as CoDel.
Also, Bufferbloat is usually not (only) caused by you, but by people sharing the same chokepoint as you in either or both directions. But if you're lucky, the router owning the chokepoint has at least some rudimentary per-flow or per-IP fair scheduler, in which case sending less yourself can indeed help.
Still, to have that effect result in a usable data rate of kilobits on a connection that can otherwise push megabits (disregarding queueing delay), the chokepoint would have to be severely overprovisioned and/or extremely poorly scheduled.
Correct buffer sizing isn't a good solution for Bufferbloat: The ideal size corresponds to the end-to-end bandwidth-delay product, but since one buffer can handle multiple flows with greatly varying latencies/delays, that number does not necessarily converge.
Queueing aware scheduling algorithms are much more effective, are readily available in Linux (tc_codel and others), and are slowly making their way into even consumer routers (or at least I hope).
Perhaps you know this already (not really clear on what your comment is saying), but Dave Taht is one of the authors of FQ-CoDel, which is what the author of CoDel recommends using when available.
Maybe something like this would be helpful for Apple to implement voice messages over satellite. Also a LOT of people in developing countries use WhatsApp voice messages with slow network speeds or expensive data. It's too easy to forget how big an audience Meta has outside the western world
If you mean for storage, real time codecs are actually pretty inefficient for that use case because they don't get much use of temporal redundancy. Although I'm not actually aware of a non-real time audio codec specialised for voice. They probably exist in Cheltenham and Maryland but for Meta this likely doesn't make a big enough part of their storage costs to bother