

Identifying botnets with Hadoop and Cassandra - helwr
http://gigaom.com/cloud/hadoop-kills-zombies-too-is-there-anything-it-cant-solve/

======
A1kmm
A centralised system like that, however, has a very high privacy cost - one
server knows all IPs addresses that access every IP address monitored by the
system.

The article isn't clear on exactly how trust is computed, but if all that is
required is to detect the total number of connections between given IPs, it is
possible that a peer-to-peer algorithm where only limited information is
shared with neighbours about IPs that are specifically queried.

If the aim is to obtain a list of IPs that at least m peers out of n have been
contacted by (only counting the last k incoming connections, per peer),
without disclosing the entire list to any party, that could be done having
each peer broadcast the number connections per /8 to all peers. Each peer
checks the total is less than k for each peer, and also records the number
from each peer for each /8. Any /8s which have seen a total of less than m
connections are rejected, and the counts for each /9 in the remaining /8s are
broadcast (and must add to the right number reported previously for the /8 for
the peer, and be less than the number of IPs in the range). This system means
that at least (m-1) peers need to collude to find out if someone has been
contacted by an IP address in a range that rarely contacts people - if some
mechanism stopped (m-1) of the peers from being controlled by the same person,
this system could work.

