
Creating a low-latency calling network - taf2
https://whispersystems.org/blog/low-latency-switching/
======
MichaelGG
Very smart to avoid Asterisk, FreeSWITCH and SIP in general for a security
oriented program. SIP is moronically complex. As an illustration, the spec
talks about using SIP to setup _chess games_ , non-ironically. Like, that's
how super-duper-abstract the SIP authors thought they were being. Of course,
it's a bit strange to then explain why a mandatory header is "Call-ID", but
hey, this is the IETF...

Also, all the major SIP platforms are thousands of lines of C or C++ just to
parse this fucked up format. One particular popular project has almost 1M
lines of C in its repo. And not a single CVE issued. Anyone taking bets on if
it's more likely that there's never been an issue, or if we just don't notice?
I mean, with switch statements spanning 1000+ lines, we can be sure it's top
quality everywhere, right? One popular system allowed you to put shell
commands in SIP headers, and it'd just go and execute them, due to ...
creative[1] ... escaping and evaluation rules.

Not to mention that on every SIP network I've tested, even simple, trivial,
things like IP-based auth _don 't actually work_. It's mind-boggling. It's
just that there's even lower-hanging fruit so no one bothers with remotely fun
attacks.

1: Insane and terribly thought-out.

------
mikebo
Do users attempting to call each other have to be on the same switch? Or do
you route data explicitly from one switch to another?

~~~
zaroth
I have the same question... If clients choose the closest entry point to the
overlay network, are they fully meshed internally?

It's an interesting problem any time you are routing in an overlay network.
It's probably tricky enough selecting a single optimal connection point for
two clients to connect to minimize the end-to-end latency between the clients.

~~~
paulasmuth
The load balancing they describe in their article is only used for inbound
requests, similar to a traditional HTTP LB setup. If I understand it correctly
they don't implement the outgoing message delivery to the called user
themselves, but instead use SMS or a Google Service to deliver push
notifications.

So the problem of "how do I route request X to user Y" has to be solved by
either Google or the Provider that delivers the short message. -- Actually,
with their current setup they don't even need to know where a user is located
geographically, since they simply choose a proxy server by asking Route53 for
a list of close servers (I presume close to the calling user) and then use
their connect-then-disconnect-hack to choose the "best" server from that list.

    
    
        > Instead we decided to write our own minimal signaling protocol 
        > and use push notifications (at first SMS, then eventually GCM when 
        > it was introduced) in order to initiate calls.
    

So I imagine you could just stick the address of the switch that was chosen by
the caller into the message that gets delivered to the called device when a
call is initiated.

Obviously this doesn't solve the hard part of "how not to drop a call when
your switch goes down mid-call" though.

~~~
zaroth
The signaling is the first part of the article. The challenge there is
avoiding your own persisted connection or polling mechanism. (The solution is
to use someone else's persisted connection or polling!)

In the second part they're describing TURN which is a 3rd party packet relay
which you bounce your packets through when you can't directly route between
two endpoints (usually because of NAT). As in, the call has been signaled, the
keys exchanged, and now I just need to get packets of audio between Alice and
Bob every 20ms or so and how nice would it be if they could do that between
themselves and my servers could stay out of it?!

Broken NAT (the need for TURN) is probably the thing that frustrates me most
about the Internet. If any two endpoints could always simply and easily
connect just through direct routing it makes a lot of applications lives much
easier, and many applications possible which otherwise end up centralized.

This is just one case in point. Just like Skype, the IP isn't even really in
the actual "product" but in the hacks it takes to make the product work in the
reality of pathologically NAT'd networks.

~~~
paulasmuth
Yes, this seems like a bunch of work to keep up and running and I agree that
most of the meat of their solution is actually in Google's or Amazon's systems
running the GeoDNS/push stuff. Hopefully IPv6 will fix it all ™.

However, until then, GCM [1] seems like a really good workaround. And I
believe it is actually free of charge and available for both iOS and Android.

[1] [https://developers.google.com/cloud-
messaging/](https://developers.google.com/cloud-messaging/)

~~~
zaroth
Just centralize it :-). Exactly the reason I think IPv6 won't fix it
(arbitrary inbound will still be firewalled by upstream either as a ToS/AUP or
as a "feature") because too much money is at stake for the centralized
services.

I think solid decentralized service discovery and direct routing would be as
pivotal as the Blockchain itself. One actually helps solve the other, e.g
using an altcoin for anti-spam. But the direct routing (a better NAT hole
punch) apparently is not possible without service provider cooperation.

Tor is probably the biggest reliable semi-centralized overlay network, I'm not
sure if there are any better options for punching through NAT that don't
involve running your own public relay or trusting a 3rd party. But I assume
Tor is much too slow to support realtime voice between an arbitrary client and
hidden service?

Bitcoin miners faced a similar problem of needing to build a higher speed
relay network to shuttle large blocks faster than the existing P2P relay
network. In that case there was a group of P2P nodes configured to allow much
larger number of peers combined with dedicated fast paths between themselves
each located in high-speed hubs. I'm not sure if it was ever deployed.

~~~
simoncion
Comcast Residential only filters 25, 135, 139, 445 and -for some reason- 1080.

If Comcast isn't applying an aggressive inbound firewall, pretty much noone
will.

------
rdtsc
Regarding the TURN server, if they are already defined a custom protocol they
could try to do a hack where they try first to connect via a TURN then while
progress, test if peer-to-peer direct media would work. If it does
transparently switch to that.

I do like the idea of load balancing on the client, that is pretty cool. I did
something similar on the local LAN.

Apple uses TCP multipath for cases where connection might drop out wonder if
that would be an option here as well.

~~~
toast0
> Apple uses TCP multipath for cases where connection might drop out wonder if
> that would be an option here as well.

As far as I've seen, that's for Siri only. I don't think any of the other
mobile platforms allow MPTCP for apps, either. (I would love to be wrong
though)

------
deathanatos
> However, the NATs generally employed by mobile data networks are
> pathological, such that it’s virtually impossible for two clients on a
> mobile data network to establish a direct connection with each-other.

This problem, and the entirety of the resulting "solution¹" and the
engineering complexity that stems from it in the article, is completely solved
by IPv6, because it renders NAT obsolete.

¹not that they had a choice.

~~~
jlgaddis
While I agree that IPv6 makes NAT obsolete, I, unfortunately, don't see NAT
going away even after everyone is 100% on IPv6. Too many people/companies rely
on NAT for "security" for it to completely die.

~~~
simoncion
At worst, I see NAT being replaced with default-reject-unless-associated-with-
an-existing-internally-initiated-connection border firewalls everywhere a
sysadmin with half a clue works.

~~~
jlgaddis
And that's what I would do (basic stateful inspection), but there are too many
organizations (especially SMBs with little/no I.T. staff onboard) who will
simply use NAT with IPv6 to get that same end result.

~~~
wmf
We'd probably be better off giving those people a default-deny firewall and
_calling it_ NAT to pacify them. But I doubt that will happen.

~~~
simoncion
I could _maybe_ see Cisco doing such a thing.

