
Ask HN: Generate random traffic for metadata obfuscation? - fratlas
I remember some time ago a scare of meta data or something similar being leaked and specific peoples traffic being available (not security background so sorry if that doesn&#x27;t even make sense), was just wondering if there were any scripts to randomly send requests to accumulate random meta data and obfuscate your real traffic?
======
brodo
Scripts like these can be easily circumvented as browsing habits are not
random. Using machine learning or statistical methods, you can filter out the
randomness. People can be identified by the unique set of websites they use
pretty reliably. (see [https://svs.informatik.uni-
hamburg.de/publications/2013/2013...](https://svs.informatik.uni-
hamburg.de/publications/2013/2013-04-10HBF_Behavior-based-tracking-with-dns-
traffic.pdf)) The only way to anonymize yourself using fake traffic is not to
generate random traffic but to specifically engineer requests which bring you
closer to the global average. How to do this is still an open research
question.

~~~
notduncansmith
What if you were to train an algorithm for a given user's usage patterns on a
website and then project those patterns onto other users? Use machine learning
to generate realistic noise seeded with random keyword topics that would span
multiple sites. It would appear to anyone watching as if you really did have a
certain number of interests, that you don't actually hold. Then superimpose a
bunch of fake usage profiles over your actual browsing.

~~~
zodPod
That sounds awesome! "Tonight, I want Google to think that I'm a duck watcher
that is interested in skydiving and cooking at low altitudes!" and you just
combine your profile to be whatever you want.

You could even use that as a purchasing advantage. "Well, I'm interested in
knowing what a gamer would buy in this situation. So, I'm going to use a 23
year old gamer geek profile." or "I wonder what you would buy a photographer
for Christmas.." then you just seed it with photographer data and let it
suggest you things then you pick something haha

------
jstayton
TrackMeNot ([http://cs.nyu.edu/trackmenot/](http://cs.nyu.edu/trackmenot/))
uses this technique to protect against web search tracking.

In general, privacy by obfuscation is an idea that privacy researchers have
been considering. Other than TrackMeNot, however, I haven't seen many real-
world applications.

~~~
DavideNL
Hah, awesome add-on, you can even add your own (language) websites/rss feeds.

------
feral
I've given this a good bit of thought; I'm sure its easy to generate plausible
deniability, but actually masking relationships is not as simple as it sounds,
as an adversary will easily filter out random noise.

You would need chaff deliberately constructed to look a bit like non-random
social network/communication data; that's kind of a research project, and its
going to be hard to guarantee it is working in the face of unknown statistical
attack.

------
Canada
Here's a couple of papers describing metadata resistant anonymity systems that
may help you understand how "chaff" traffic can help obfuscate real traffic.
It's not as simple as generating random noise because real traffic patterns
can be identified with confirmation or intersection attacks.

Aqua:
[http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p...](http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p303.pdf)

Dissent: [http://bford.info/pub/net/panopticon-
cacm.pdf](http://bford.info/pub/net/panopticon-cacm.pdf)

Of particular interest to your question are the threat models.

------
bespoke_engnr
I don't know what's happening with this now, but I remember the Pond (
[https://www.imperialviolet.org/2013/11/10/pond.html](https://www.imperialviolet.org/2013/11/10/pond.html)
) project working on something like this. I believe they were trying to do
this with encrypted data as opposed to plaintext, but their work might still
be interesting. IIRC a big part of it was batching transmissions and
introducing a delay, sending 'dummy' data when there was no 'real' data, at
regular intervals (to prevent timing/traffic correlation attacks).

It kind of reminded me of the old Asynchronous Transfer Mode protocol, with
its cells.

------
evils
Set up a Tor Exit node.

------
kpdyer
fteproxy [1] can superficially mask the protocol that you're using (e.g.,
makes Tor look like HTTP) using regular expressions.

marionette [2] enables control over what protocol it looks like you're using
and duration of connections generated, amount of data sent per connection,
etc.

[1] [https://fteproxy.org/](https://fteproxy.org/) [2]
[https://github.com/marionette-tg/marionette](https://github.com/marionette-
tg/marionette)

------
Natanael_L
Tor, I2P

------
JoeAltmaier
(xkcd) [https://xkcd.com/576/](https://xkcd.com/576/)

