Hacker News new | past | comments | ask | show | jobs | submit login
Vuvuzela – Private messaging system that hides metadata (github.com/davidlazar)
243 points by speps on Dec 3, 2015 | hide | past | favorite | 30 comments

The added noise reminds me of WASTE, which was infamously created by Justin Frankel at Nullsoft to protest the post-acquisition atmosphere at AOL...



IIRC release of WASTE was closely tied to the vesting of his AOL options and his departure from AOL that followed. So it was more of a farewell finger really than a "protest".

Author of WinAmp.

> In practice, the message latency would be around 20s to 40s, depending on security parameters and the number of users connected to the system.

Haven't had time to read the paper yet - is this an inherent property of the system, or a number that could be reduced by future work?

Author here:

The latency can be reduced by future work. For example, Section 9 in the paper mentions an idea for reducing the amount of noise (and thus reducing latency) by treating honest users as noise.

Vuvuzela can also be configured with a smaller "privacy budget" to reduce latency.

seems inherent - servers operate in rounds and each round carries legit traffic plus enough noise to allow traffic metadata to drown in nothingness and forward it to a whole server chain to avoid an adversary to control all the servers, so as you scale servers need more time to forward and mixin traffic

Could someone much smarter than me do some napkin math and figure out roughly what kind of tradeoff in security you make by limiting the noise?

20-40 seconds is fine if you are releasing NSA secret files and want the chance of metadata discovery to be <0.1% (number made up)

But it's too much to use as a regular form of conversation between average parties who just want to set a precedent that all conversations by default should be un-monitorable.

However what if that number was down to ~5 seconds? Now it's tolerable. But the tradeoff is, what does the chance of detectability go up to? 1%? 5%? 50%?

> 20-40 seconds is […] too much to use as a regular form of conversation between average parties

With the current irc-ish UI the latency would clash with user expectations, but e-mail, message boards and quite a few other forms of communication regularly deal with much higher latencies. I suppose it depends on how you market it.

I'll create a GUI interface using Visual Basic that prints

    Tracking IP address ... [30 seconds] ... Failed! Message delivered!

you can have x "message" sent per round, where a message is a packet (a line can be split over multiple packets) and x must remain constant and in the implementation is 1.

now the problem is, the adversary modelled here is one that can drop a node out of the network and see if any other node message frequency drops.

that's why traffic, noise and latency are strictly correlated. if a conversation suddenly drop you have to make sure that the noise level doesn't drop, and vice versa.

when you start a conversation you need to tune down the noise, to maintain the frequency. thus the more noise you generate the more packet you can send when you need it, but the more the load on the servers.

it's not about hiding the message in the noise (that's tor model) it's about making messages statistically indistinguishable. for that there is no confidence interval, you need to send random noise out, randomly, at a fixed randomized frequency.

now I think they don't actually send a message every round, but do generate noise and messages interlocked so that the frequency remains constant. that frequency must be slower than rounds frequency and you cannot really tune it in the way you're suggesting. There might be probably other way to relax the confidence, but it might have impacts on other clients on the network, even those unrelated to your conversation.

also, under this round model, if a client drop after handshaking and before collecting the message the message for that handshake is lost (no retransmission and no chance to pick it up later)

The slides toward the end of my talk [1] try to give some intuition for Vuvuzela's security in terms of "jury certainty".

[1] https://davidlazar.org/slides/vuvuzela-sosp2015.pdf

I'm very happy to see work going into this topic. Hiding metadata is the major part of private communication that's not accounted for by any major chat system. I've recently started studying the matrix.org specification, which seems like the best bet for a next generation chat system, but it doesn't account for metadata privacy. It'd be very difficult to hide metadata while also offering many of the other features of a modern chat system that make it useful and convenient for people (e.g. message history that a new client can sync with later).

Here's a summary and commentary of the paper from Adrian Colyer's 'important/interesting CS papers' review blog: http://blog.acolyer.org/2015/10/23/vuvuzela-scalable-private...

"The cost of running a Vuvuzela server on AWS at current prices is about $10K/month, dominated by bandwidth usage."

Or probably about $50 for a VPS...

The amount of bandwidth used per month isn't mentioned but when the Snapchat database was leaked I hosted it on a server of mine and snapchatdb.info was pointed at it. Bandwidth over the first three days was just over 27 TB and didn't cost a dime extra (I work for an ISP and my servers are housed in our cages in datacenters but I pay very little for them).

What's 27 TB cost on AWS?

  First 1 GB / month	$0.00 per GB
  Up to 10 TB / month	$0.09 per GB
  Next 40 TB / month	$0.085 per GB
So around USD 2,345.-

Another approach at a more general messaging system that can be used to similar ends is Whisper, by the same community that makes Ethereum: https://www.youtube.com/watch?v=BrWlAtfqF6s

you win just for the project name.

I thought the example conversation was quite cute also :p

It's an actual conversation from Citizenfour.

Citizenfour is still on my "to-watch" list at the minute, but yeah, I had figured it was an excerpt of an actual conversation.

Interestingly, I have a private repo on Github that does something eerily similar. It's been stale for a couple of years, but I had to go check that repo and verify it was private. Same name, nearly identical purpose. It made me think.

One if the problems with Tor is that adding noise has too much of a latency and bandwidth cost, thus it doesn't. But an IM client has much less restrictive latency and bandwidth costs, so it makes sense that it adds it.

Out of curiosity, why did you call it Vuvuzela? I'm from Southern Africa btw

Perhaps because it generates a lot of noise (latency, active users) in the process, like the vuvuzela does :)

I was hoping to find the rationale in the name on the link, but didn't see anything.

Because it adds noise.

for the record: bitmessage also has this kind of metadata & data privacy

BitMessage has greater latency (~2 Mins?) but is fully P2P, given the cost estimates for a server upthread, I wonder if bandwidth concerns are the reasons for a client server architecture over P2P...

My understanding is that BitMessage achieves its non-content privacy guarantees by sending each message to all clients, and then the latency is the result of the Proof of Work and some other concepts borrowed from BitCoin.

I'd love to hear more about it if you have time.

In particular, I'm interested in the P2P vs Client-Server trade off, is Vuvuzela workable in a fully P2P network, say over WebRTC?

This is cool! Need to read the paper, and I'm not that knowledgeable anyway, but it seems to be offering something genuinely new.

Edit: Should have more explicitly asked -- can anyone who knows more about this chime in? Is it as novel as it seems? Does it look secure?

It was published at one of the top systems conferences (SOSP), so I would presume it's at least somewhat novel. Link to full paper here: http://sigops.org/sosp/sosp15/current/2015-Monterey/printabl...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact