This is exactly the argument I've been making to people when we discuss PRISM.
Think about the heterogeneity of the data, the lack of structure, and the unpredictable nature of its generation. Frankly, I have no doubt that the NSA is not monitoring phone chatter on a mass scale, probably not because they can't, but because if they did there would be no way in hell to parse, store, process and evaluate the data generated.
We (the scientific/big data community) can barely get recommendation engines working well - engines which have one set of data (what you watched) and do one other thing (suggest what else you might want to watch). Unless the NSA is decades ahead in a number of fields (like data warehousing, statistical analysis of massive datasets, machine learning) how are they getting useful information in a systematic way, considering the pressure from the data-firehouse involved?
My guess is they're probably not - instead the data are collected, and then used in conjunction with traditional approaches. e.g. little johnny buys some fertilizer and one way plane ticket - so who's he been talking to, what's he been saying, etc.
Honestly, how the NSA is using/dealing with/storing/accessing these data is actually an incredibly interesting question, from an academic/systems perspective.
Natural language processing improves at a fast pace, and these records remain there to be processed at an increasingly large scale as technology allows.
I don't think most people are ready to comprehend what keeping a comprehensive digital record of private communications allows.
Including after the fact attribution of motive for any of a number of actions based on peoples online comments.
The real question is if the already exposed two way trade in information is going to be broadened into a comprehensive assessment service. Will the NSA provide a "DataVeillance report" on individuals considered for 'sensitive positions'?
Will your call records be used to assess your fitness for work? Will your spending habits be turned into behavioral alerts so that your HR manager is calling you in to ask if you've been drinking too much?
Who gets to access these records and for what purposes?
The technical portion of William Binney's presentation at HOPE 9 sheds a lot of light on this.[1]
If you go to 14:33 in the video, it has a nifty screenshot of an activity sequencing tool. He also talks a lot about latent semantic analysis and other methods.
Think about the heterogeneity of the data, the lack of structure, and the unpredictable nature of its generation. Frankly, I have no doubt that the NSA is not monitoring phone chatter on a mass scale, probably not because they can't, but because if they did there would be no way in hell to parse, store, process and evaluate the data generated.
We (the scientific/big data community) can barely get recommendation engines working well - engines which have one set of data (what you watched) and do one other thing (suggest what else you might want to watch). Unless the NSA is decades ahead in a number of fields (like data warehousing, statistical analysis of massive datasets, machine learning) how are they getting useful information in a systematic way, considering the pressure from the data-firehouse involved?
My guess is they're probably not - instead the data are collected, and then used in conjunction with traditional approaches. e.g. little johnny buys some fertilizer and one way plane ticket - so who's he been talking to, what's he been saying, etc.
Honestly, how the NSA is using/dealing with/storing/accessing these data is actually an incredibly interesting question, from an academic/systems perspective.