Creating a low-latency high-availability network for voice calls

Scramblejams · on Jan 31, 2013

Not enough detail here, but I'm surprised when I see the terms "voice" and "TCP" together. Do they use TCP to set up the call, then UDP to handle the audio?

Edit: Yep, they use UDP for the audio: https://github.com/WhisperSystems/RedPhone/wiki/Signaling-Pr...

bradleyland · on Jan 31, 2013

With the traditional VoIP stacks, yes. There are two components to making phone calls: messaging and media. The messaging part sets up and tears down calls. The media part passes the audio.

I'd imagine they're doing the same.

dkhenry · on Feb 1, 2013

but typically you use something that can have at least some intelegent behaviour like RTP not UDP. Why did they choose that when literally every other pure VoIP service uses RTP ?

moxie · on Feb 1, 2013

We use RTP (actually SRTP and ZRTP), but the transport is still UDP.

jpollock · on Jan 31, 2013

The next problem will be when they get to having large numbers of servers in a region.

Then two problems will manifest themselves.

1) The client will take long enough to work through the list of servers that the wrong server is chosen simply because the connection is initiated first.

2) This design has all servers seeing all calls. The load represented by all the TCP connections hitting all of the servers will consume more and more of a server.

Neither is a problem you can solve by adding more servers. In fact, they are made worse by adding servers!

However, if you're charging per call, that's what they call a "good problem to have".

Nifty solution to the problem though.

moxie · on Jan 31, 2013

Agreed. We haven't run into this yet (and aren't charging at all, this is an OSS project), but I think that would be the point where you have to do one of two things:

1) Architect your DNS response to include a small sample of the total result set for the region, where the sample includes at least one switch from each micro-region.

2) Break down and introduce load balancers, so that there's on load balancer per micro-region, which fronts all of the switches within that micro-region.

Fortunately an individual switch doesn't really do much (just shovel packets around), so the number of simultaneous calls a switch can handle is high enough that redundancy is more about availability than load.

JshWright · on Feb 1, 2013

I don't see how the license the software is released under is related to how much you charge to use the service...?

moxie · on Feb 1, 2013

Fair enough, I should have been more specific.

I was trying to imply that this is not a for-profit project, but you're correct, that's not what a software license communicates.

jpollock · on Jan 31, 2013

If you're not charging per call, how are you funding the servers?

moxie · on Feb 1, 2013

They're grant funded, as is some ongoing development work.

gz5 · on Feb 1, 2013

The server architecture is nice but my favorite part is the simple signaling protocol, as opposed to SIP or any other overly complex (for this use case) telephony signaling protocol. Nice work.

andrewcooke · on Jan 31, 2013

i'm confused (which is not too surprising as i am no expert on this). why can't they hole-punch through nats? then they would avoid the extra bounce-through-server latency and would hugely reduce the load on servers. i thought that was how skype worked, for example.

(i realise this doesn't stop their "fastest first" idea being useful - i got kind of sidetracked by the explanation of how servers are used near the start of the post).

[ah, ok. thanks for the explanation.]

moxie · on Jan 31, 2013

Your NAT traversal strategy is limited by the type of NAT being employed. Traditionally, the worst case scenario for NAT traversal is "symmetric NAT."

Symmetric NAT means that each tuple of (source_ip, source_port, destination_ip, _destination_port) gets its own unique (external_ip, external_port) tuple. This is bad because it means that STUN is ineffective: your STUN server will see a different external port than what your actual destination would see.

Mobile data networks are actually worse than symmetric NAT. Not only will your external port change based on your target, but your external IP likely will as well.

This makes NAT traversal, AFAIK, basically impossible.

vy8vWJlco · on Feb 1, 2013

And there's no real reason to do it either given the availability of global IPv6 addresses and any of the free tunnel brokers/teredo/etc, other than the simple lack of adoption/momentum. Most ISPs will need a push, but from an infrastructure perspective it just boils down to a new gateway router or a firmware upgrade. It doesn't even need replacing the end-user equipment if ISPs simply set up a tunnel server for their own customers as a transitional measure.

mtrimpe · on Jan 31, 2013

So it's basically a very pragmatic tradeoff for getting 80% of the value of geo-based DNS with only 20% of the effort.

Nice work ...