

Is the NSA Blinded by Big Data? - mixmax
https://medium.com/surveillance-state/e78cbf912907

======
hooande
The author might be underestimating the NSA's technical ability. The logic
here is that each person has ~300 friends, and three hops is 300 x 300 x 300 =
27 million. This is a big number, but the NSA doesn't have to give the same
scrutiny and technical resource to each person. It's well known that most of
the connections won't be interesting and there are methods to reduce the size
of the list.

Which brings us to the author's dismissal of pattern recognition. Statements
like "A pattern recognition algorithm would come up with “young Saudi men” as
the closest heuristic for the 9/11 bombers" are equivalent to "I've designed a
bad classifier in my head and I'm telling you how wrong it would be". The NSA
has more than a few smart people on this and they are probably working all the
harder due to so much public attention. Algorithms can get very sophisticated
with a sufficient amount of data and processing cores.

The reality is that the NSA doesn't have the technical capability to find
meaning in billions of communications. But they aren't stupid either. The
scale of their data isn't going to hide us. Whether or not they'll use the
data responsibly is another question.

------
aylons
Do you know what mass surveillance and analysis is good at? Chilling political
movements and spotting rising leaderships.

I bet NSA is not blind to that. Actually, they're not blind at all.

------
incongruity
There's one huge flaw in the exponential math here: duplicates. There's a good
chance that a large number of people your friends know are already included in
your network – the percentages will vary, but it definitely cuts down the pure
exponential growth.

~~~
olefoo
Several simplifying assumptions were made; in a real world scenario you would
have a certain number of duplicates that is shared with any given contact; but
the overlap of redundancies between contacts is going to be low. You also need
to figure that some of your contacts have an order of magnitude more contacts
than your your average contact, and that they are more likely be connected to
other super-connectors. So your second order-contacts are likely to be richer
in people who have more contacts than the average networker...

Even building a decent estimator for this sort of problem is a hard problem;
since you need to get some good estimates for a number of distributions (
number of contacts, number of duplicates between pairs of contacts, number
super-connector bilateral contacts, etc.).

My general sense is that duplication of contacts has little effect on the
number of contacts of contacts of contacts; and that the number of super-
connectors you know would have a greater impact.

