Creating a low-latency calling network

MichaelGG · on July 1, 2015

Very smart to avoid Asterisk, FreeSWITCH and SIP in general for a security oriented program. SIP is moronically complex. As an illustration, the spec talks about using SIP to setup chess games, non-ironically. Like, that's how super-duper-abstract the SIP authors thought they were being. Of course, it's a bit strange to then explain why a mandatory header is "Call-ID", but hey, this is the IETF...

Also, all the major SIP platforms are thousands of lines of C or C++ just to parse this fucked up format. One particular popular project has almost 1M lines of C in its repo. And not a single CVE issued. Anyone taking bets on if it's more likely that there's never been an issue, or if we just don't notice? I mean, with switch statements spanning 1000+ lines, we can be sure it's top quality everywhere, right? One popular system allowed you to put shell commands in SIP headers, and it'd just go and execute them, due to ... creative[1] ... escaping and evaluation rules.

Not to mention that on every SIP network I've tested, even simple, trivial, things like IP-based auth don't actually work. It's mind-boggling. It's just that there's even lower-hanging fruit so no one bothers with remotely fun attacks.

1: Insane and terribly thought-out.

mikebo · on July 1, 2015

Do users attempting to call each other have to be on the same switch? Or do you route data explicitly from one switch to another?

zaroth · on July 1, 2015

I have the same question... If clients choose the closest entry point to the overlay network, are they fully meshed internally?

It's an interesting problem any time you are routing in an overlay network. It's probably tricky enough selecting a single optimal connection point for two clients to connect to minimize the end-to-end latency between the clients.

paulasmuth · on July 1, 2015

The load balancing they describe in their article is only used for inbound requests, similar to a traditional HTTP LB setup. If I understand it correctly they don't implement the outgoing message delivery to the called user themselves, but instead use SMS or a Google Service to deliver push notifications.

So the problem of "how do I route request X to user Y" has to be solved by either Google or the Provider that delivers the short message. -- Actually, with their current setup they don't even need to know where a user is located geographically, since they simply choose a proxy server by asking Route53 for a list of close servers (I presume close to the calling user) and then use their connect-then-disconnect-hack to choose the "best" server from that list.

    > Instead we decided to write our own minimal signaling protocol 
    > and use push notifications (at first SMS, then eventually GCM when 
    > it was introduced) in order to initiate calls.

So I imagine you could just stick the address of the switch that was chosen by the caller into the message that gets delivered to the called device when a call is initiated.

Obviously this doesn't solve the hard part of "how not to drop a call when your switch goes down mid-call" though.

zaroth · on July 1, 2015

The signaling is the first part of the article. The challenge there is avoiding your own persisted connection or polling mechanism. (The solution is to use someone else's persisted connection or polling!)

In the second part they're describing TURN which is a 3rd party packet relay which you bounce your packets through when you can't directly route between two endpoints (usually because of NAT). As in, the call has been signaled, the keys exchanged, and now I just need to get packets of audio between Alice and Bob every 20ms or so and how nice would it be if they could do that between themselves and my servers could stay out of it?!

Broken NAT (the need for TURN) is probably the thing that frustrates me most about the Internet. If any two endpoints could always simply and easily connect just through direct routing it makes a lot of applications lives much easier, and many applications possible which otherwise end up centralized.

This is just one case in point. Just like Skype, the IP isn't even really in the actual "product" but in the hacks it takes to make the product work in the reality of pathologically NAT'd networks.

paulasmuth · on July 1, 2015

Yes, this seems like a bunch of work to keep up and running and I agree that most of the meat of their solution is actually in Google's or Amazon's systems running the GeoDNS/push stuff. Hopefully IPv6 will fix it all ™.

However, until then, GCM [1] seems like a really good workaround. And I believe it is actually free of charge and available for both iOS and Android.

[1] https://developers.google.com/cloud-messaging/

zaroth · on July 1, 2015

Just centralize it :-). Exactly the reason I think IPv6 won't fix it (arbitrary inbound will still be firewalled by upstream either as a ToS/AUP or as a "feature") because too much money is at stake for the centralized services.

I think solid decentralized service discovery and direct routing would be as pivotal as the Blockchain itself. One actually helps solve the other, e.g using an altcoin for anti-spam. But the direct routing (a better NAT hole punch) apparently is not possible without service provider cooperation.

Tor is probably the biggest reliable semi-centralized overlay network, I'm not sure if there are any better options for punching through NAT that don't involve running your own public relay or trusting a 3rd party. But I assume Tor is much too slow to support realtime voice between an arbitrary client and hidden service?

Bitcoin miners faced a similar problem of needing to build a higher speed relay network to shuttle large blocks faster than the existing P2P relay network. In that case there was a group of P2P nodes configured to allow much larger number of peers combined with dedicated fast paths between themselves each located in high-speed hubs. I'm not sure if it was ever deployed.

simoncion · on July 2, 2015

Comcast Residential only filters 25, 135, 139, 445 and -for some reason- 1080.

If Comcast isn't applying an aggressive inbound firewall, pretty much noone will.

pigeons · on July 2, 2015

http://bitcoinrelaynetwork.org/

rdtsc · on July 1, 2015

Regarding the TURN server, if they are already defined a custom protocol they could try to do a hack where they try first to connect via a TURN then while progress, test if peer-to-peer direct media would work. If it does transparently switch to that.

I do like the idea of load balancing on the client, that is pretty cool. I did something similar on the local LAN.

Apple uses TCP multipath for cases where connection might drop out wonder if that would be an option here as well.

toast0 · on July 2, 2015

> Apple uses TCP multipath for cases where connection might drop out wonder if that would be an option here as well.

As far as I've seen, that's for Siri only. I don't think any of the other mobile platforms allow MPTCP for apps, either. (I would love to be wrong though)

rasz_pl · on July 2, 2015

Skype used to do that .. until they joined PRISM.

deathanatos · on July 1, 2015

> However, the NATs generally employed by mobile data networks are pathological, such that it’s virtually impossible for two clients on a mobile data network to establish a direct connection with each-other.

This problem, and the entirety of the resulting "solution¹" and the engineering complexity that stems from it in the article, is completely solved by IPv6, because it renders NAT obsolete.

¹not that they had a choice.

jlgaddis · on July 2, 2015

While I agree that IPv6 makes NAT obsolete, I, unfortunately, don't see NAT going away even after everyone is 100% on IPv6. Too many people/companies rely on NAT for "security" for it to completely die.

simoncion · on July 2, 2015

At worst, I see NAT being replaced with default-reject-unless-associated-with-an-existing-internally-initiated-connection border firewalls everywhere a sysadmin with half a clue works.

jlgaddis · on July 2, 2015

And that's what I would do (basic stateful inspection), but there are too many organizations (especially SMBs with little/no I.T. staff onboard) who will simply use NAT with IPv6 to get that same end result.

wmf · on July 2, 2015

We'd probably be better off giving those people a default-deny firewall and calling it NAT to pacify them. But I doubt that will happen.

simoncion · on July 2, 2015

I could maybe see Cisco doing such a thing.

stephengillie · on July 1, 2015

At least until we deplete all the ipv6 addresses. With some of the crazy applications (ways of doing things) being developed to service Kubernetes clusters on AWS, I can see this happening faster than the ipv4 depletion.

gh02t · on July 2, 2015

Given that there's enough addresses to assign around 1000000 addresses to every bacterial cell on the planet, you can safely expect that it will last longer than ipv4. It's not a matter of "oh, lets double it that should be big enough", the ipv6 space is around 18 orders of magnitude larger. Computers and our uses for them would have to change enormously for us to exhaust that. Not saying it won't ever happen, but it'll be a while.

https://en.m.wikipedia.org/wiki/Orders_of_magnitude_(numbers... (not to be a smartass about what orders of magnitude means, just because the article has some really cool examples)