I see ZRTP, but I didn't see anything on the site about signaling encryption. As you may or may not know, the content is only one component of a secure communication. There needs to be signaling encryption as well. The signaling encryption is harder than the media encryption, because the media encryption only works if the signaling encryption was successful. Signaling across a network you don't trust is really the hard part, and it's a problem for all of these apps.
I don't know if Moxie implemented the certificate pinning stuff in RedPhone, but that's the sort of Crypto you need to have fool proof call security.
Calls are vulnerable to MITM attacks because you have to trust the network you're riding over to some extent. Redphone has intermediating crypto for the call setup that's nifty, and I'd be cautious about using any "secure" calling system that didn't provide setup protection.
Again, I'm not saying Ostel doesn't have these things, I just couldn't find them.
Edit: Also wtf is FreeSWITCH doing in there??
Regarding signalling encryption. The system uses SIP TLS with a certificate signed by a common CA. Check it out and walk up the chain manually if you're concerned. Certificate pinning is interesting but in this case the signalling isn't really what we care about, from a priority perspective. We care about the /content/ of the call, which in SIP land is a completely different protocol with a nice peer to peer key agreement system which is currently unbreakable according to public information. There's some classified NSA doc that allegedly says it could be done, read all about it...oh wait, you can't because it's classified. :(
So the reason there isn't a bunch of copywriting which discusses the importance of the key agreement in the signalling layer is because I don't think it's very important. Now, one could middle the signalling in a clever way and that could, possibly result in a dropped call, which could be classified as a DoS vuln. But that's unlikely and there wouldn't be any content leaked. I would be interested in a proof of this attack vector.
Finally, my favorite part about middled signalling is that even if you do it right and a whole forged SIP dialog is built up and Mallory answers on the other end when Bob thinks he's talking to Alice, you still get to hear Mallory's voice! So unless Mallory is pro at doing voice impressions, like that dude on SNL can do Jay-Z and Dr. Dre real good, it's gonna be obvs that something is not right.
Really Finally, I'm using Freeswitch to provide diagnostic services like an echo test. There's some crazy ideas about offering an IVR that does encrypted voicemail but I don't know much about that.
I don't know why you say you'd hear Mallory's voice; that isn't implied at all. I don't need you to speak to someone else in order to MITM the signed Certificate you hold so sacred.
I'm not trying to be offensive, but I think ZRTP or SRTP don't matter one lick if the cert gets compromised. Without pinning, how do you know your certificate is actually the one you were expecting?
If the Cert gets popped, I don't see how the call could possibly be secure. The entire key exchange is splayed open for the operator to see. Yes, media encryption is unbreakable, but what would be the point of breaking the encryption if you have the key?
Am I missing something?
For more information on what I'm saying: http://blog.cryptographyengineering.com/2012/11/lets-talk-ab...
Edit: In reading Moxie's input in the blog post above, I may be overestimating the vulnerability of the call setup. I still contend that you can't really trust certs, and the only semblance of trust is pinning, but I digress.
Here's the trick: SIP has nothing to do with sound or video. It "establishes sessions". The typical SIP dialog flow has a hierarchy of many other protocols. In order, they read like this
That dude in the middle is the Session Description Protocol. This describes what will happen in the future regarding the media stream. When the clients agree on this (codecs, IP addresses, ports, etc), a full-duplex session is established between the two peers. The preceding TLS stuff, which depended on a CA is now over. We are ready for round two.
This is what you missed. We haven't even begun sending data over our media socket yet and the security stuff that depends on a central authority is finished.
Now that we can speak to each other, let's do that! But wait! My client has an alpha numeric string on the screen. This is called a Short Authentication String. When you read the SAS to me and I read mine to you, we click "OK" and now our conversation is encrypted. Because we agreed on a key with our words, not our fingers.
If you would like to try this IRL, you can call firstname.lastname@example.org. I'm online right now.
1) ostel.me :: Like many of the Guardian projects, this is an experiment in combining existing OSS libraries to create an app. In this case, CSipSimple, pjsip, and ZORG. It's basically a standard SIP/RTP VoIP client with an Android UI. That means it needs to maintain a persistent connection to a SIP server at all times, which doesn't necessarily work well with the Android process model, could drain your device's battery life, and could be flaky in scenarios where you're going in and out of coverage or switching from data to wifi. However, it is ideal for the hacker crowd that wants ultimate control, maximum configurability, and enjoys occasionally drop to the command line. You're correct that it's likely not possible to do things like certificate pinning in this context.
2) RedPhone :: While it remains to be seen whether we were correct or not, our development philosophy with RedPhone was to eschew the VoIP libraries and paradigms that were originally developed for the desktop environment in favor of OSS/Free code written from scratch for the mobile environment. Our belief is that the different network and platform characteristics of mobile devices require a mobile-oriented solution. This means that we use a lightweight mobile-oriented signaling protocol instead of SIP, push notifications instead of maintaining a persistent connection at all times, techniques for establishing low-latency routing for global calling (https://whispersystems.org/blog/low-latency-switching), a jitter buffer optimized for mobile data networks (https://whispersystems.org/blog/client-side-audio-quality), and your normal phone number for addressing rather than a new identifier. It also means that, yes, we can do things like certificate pinning for the signaling channel. This is all obviously oriented towards the average smartphone user, though, so it's sometimes less appealing to the hacker crowd who want to use a SIP identifier or connect through their own SIP server.
2) Silent Circle :: My sense is that Silent Circle is trying to do both. Their stack seems to be based on traditional VoIP protocols (SIP/RTP), and their server-side infrastructure appears to be a FreeSWITCH box in a single Canada datacenter (maybe with a single-DC European presence now or coming soon as well?). However, they are using those desktop protocols to try to create a packaged non-hacker-oriented experience. I'm not sure if that's possible or not, but I'm obviously biased. It does cost a non-trivial amount of money, though, and their client source isn't Free.
If you can encrypt the audio, and play that through the phone, that would be your best chance. Without the telco implementing encryption it's nigh impossible.
Ultimately, our focus with OStel (and the larger Open Secure Telephony Network) is to build best-practice based secure voice systems built on free software, that work like the Internet does. This also means apps like Jitsi, Groundwire, SFLPhone and other compliant clients can work just as well with our system as the primary Android app we focus on.
I'd like to know what differentiates it from RedPhone, Silent Circle and other similar products.
Ostel is open source client and server side, you can easily set up your own server, and if I understand it correctly it's also federated.
Some reading: https://github.com/WhisperSystems/RedPhone/issues/63
$35 for secure calls on iOS? We can do better than that.
The problem I don't see this solving is the fact that I still need to trust a third party that routes my call not to store and hand over any data on those calls.
You can read about this and more on some of our posts on the Guardian Project blog: