This is an interesting post to see alongside There Is Too Much Stuff[1], which is also on the front page at time of writing. Many parallels between our collection and hoarding of junk, and our collection and hoarding of junk data.
Unfortunately individual executives have historically not been made to adequately feel the pain when companies lose control of user data. Nothing will change until that changes.
There is a semi-valid reason similar, but I think different, from his first point.
Holding on to data provides an option to potential downstream gains. (ie: holding data has an "option value").
Will it be valuable? Who knows? Maybe there is some nugget which will unlock your business or service and bring you victory.
If you get rid of it, you'll never know, until perhaps too late.
If you couple the option-value of holding data to a perhaps unrealistically low expectation of the cost of holding data (ie: a breach), you get an expected value calculation that says, 'hold everything!'
The rationality of hoarding makes more sense if you have 'warehousing' capabilities. Not so much if you're just piling things up haphazardly. (And those capabilities have costs associated with them.)
> "The rationality of hoarding makes more sense if you have 'warehousing' capabilities"
Or if you believe you could/will organize the data in the future, perhaps with an unrealistic expectation that doing so will become cheaper in the future.
Schneier is (as usual) right. It is far too easy (in fact, the default) to store every little detail about customers interacting with services you control. Only some of this data can be justified for business or technical reasons. The rest is dead weight and an actual liability.
A simple example relevant to readers here: every web server I know of logs information about each request and retains those logs for a long period of time (because, who cares? Disk space is cheap).
But those logs contain IP addresses and possibly other identifying information. Web servers get broken into all the time - if those logs get leaked people can use those logs to build profiles of your visitors and possibly correlate with other data sets to tease out information about specific visitors. On my website it would be easy to spot the admin IP address for further attacks.
Paranoid? Perhaps. During WWII the slogan was "Loose Lips Sink Ships" - any slip of information could be the final piece of data that gives the enemy a fatal advantage. We are effectively at war with criminals and actual foreign states (not to mention corporate interests) who are continually looking for similar advantages. Even random script kiddies want your data.
One of the things I like about the GDPR is that it force companies (and individuals) to recognize the danger and act accordingly.
The original article and the one you posted are not inconsistent - in your link Schneier quotes some research that the stock market doesn't care about data breaches, probably because there is little direct financial damage. Hence there is little incentive for companies to stop stockpiling every scrap of data.
Schneier is arguing that the risk to individuals and society at large is greater than any benefit this data could provide.
One way to remove the incentives to gather and store data is through regulation and fines - then the stockmarket would care.
> Schneier is arguing that the risk to individuals and society at large is greater than any benefit this data could provide.
I don't think we read the same (2016) article.
In the one I read, he is claiming that holding onto personal data is about banking it for future revenue (he actually says profitability, but I think he is just not being savvy here). Where he specifically talks about risks, he talks about risks to the company, to PR. Nothing, not a chirp, about damage to the individuals or society at large.
Earlier in the article he actually cites Anthem and Target, the 2 biggest breaches to date (as of 2016). Ironic because these breaches did no damage to the companies. Also, not a well formed argument because those companies weren't monetizing personal data in the way that we think of as privacy invasive, or "user as the product". Whether or not personal data is a toxic thing to be avoided, those companies needed to take that data.
In the 2016 article I read, Schneier is claiming that the risk analysis is flawed and the damage to a company from the eventual breach/loss of personal data, outweighs the financial gain to be realized from that data. He's trying to appeal to market wisdom, not social good. "protect your bottom line", not "think of the children". Which is a fine thesis. But the 2 primary examples he gives in the same article prove himself wrong, and an academic study in 2018 further proves this to be wrong.
> One way to remove the incentives to gather and store data is through regulation and fines - then the stockmarket would care.
Yep, only then. It's a bit like a class action. Each individual person that is caught up in a privacy breach, suffers just this little bit and doesn't have an individual voice for redress at all. "The market" will thus do as it will with such data. We need a big hammer to fix it. GDPR has been a huge benefit for privacy, as partly evidenced by all the naysayers using implementation cost as their argument. In the US, when the dust finally settles around Equifax, we may move the needle.
To finally answer your first point, I'm saying that Schneier was wrong in 2016 and he corrected himself in 2018.
I think Schneier is trying to say in both cases that it is not worth the risks, but I'll concede that we are arguing a very fine point. In any case, I agree with Schneier in this latest post and have argued along these lines myself.
...collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes;
...adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed;
...kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed;
In the EU, Schneier's proposal isn't just a good idea - it's the law.
Now suppose your product is personalized search. You need every piece of data about someone, for as long as possible, so that you can do ML on it and personalize their search results. You could even argue that targeted advertising is part of the product in the same way that vendors pay grocers for shelf space and yet the cereal they were paid to put in front of you is a product you might actually choose (especially given that the contrary would imply they're wasting their money paying for placement).
And never mind search, personalized anything based on ML.
Which of those things does that violate? You can specify the thing ahead of time, anything that isn't statistically independent from the outcome is relevant but the only way to know that is to have the data to analyze it and that relation could change at any time, there is no natural time limit on how long the data remains useful for that purpose, etc.
My claim is not that that would necessarily be the outcome in an EU court. They're apparently quite against the idea in general -- prohibiting that sort of bulk data collection seems to be what everybody puts out as the justification of the rules. What I'm asking is, by what reasoning does that ostensibly undesired behavior actually violate those rules?
Meanwhile on the other hand, are we sure we actually want it gone? If ML-driven personalized medicine ultimately gets good enough to extend your life by a decade or make it so that you don't spend the second half of it bedridden, it would be worth quite a lot of cost to have that benefit.
It seems to me we're going about this whole thing wrong. The problem is not the data, it's the centralization. Having your own data is valuable, but it should be yours, on your device, not Facebook's. And then you knock out a major category of mass data breach because all the data doesn't actually exist in any one place.
Then if you want to share it with your doctor, you should be able to do that. But if your boss wants it, or the government without a warrant, it's still yours -- nobody gets to demand it, and Facebook can't provide it to Cambridge Analytica without your consent, because they don't have it. You do. And the "personalized news feeds" most people don't actually want isn't enough to convince you to give it to them.
But that's a completely different thing. It's more of a technical solution -- a different architecture that protects privacy intrinsically, rather than a set of rules that corporations can try to weasel out of with lawyers and lobbyists and jurisdiction shopping and trade wars.
"By what reasoning does that ostensibly undesired behavior actually violate those rules?"
Under GDPR, it violates the principle of consent (the described scenario wouldn't meet the other ways that would allow you to use that data) - you're free to provide personalized search by collecting every piece of data about someone, for as long as possible, so that you can do ML on it and personalize their search results, as long as they freely give informed opt-in consent. Bulk collecting the data on everyone doesn't do that.
If you convince someone that they really want to allow you to collect all kinds of data (and they know what you're going to be collecting) because it'll allow them to receive better personalized service and they intentionally choose that this is what they desire, then that's okay. If not, then it's not okay.
If they're choosing not to give you that permission while knowing what it involves - well, that's their choice to make.
If they don't want to think or care about your offer and ignore it and don't opt in, then they obviously don't value the potential benefit enough to pay the price, so you can't collect and use their data. Tough luck.
An important additional component of the GDPR is that an individual can revoke their consent, and require all previously collected personal information be deleted.
1: https://www.theatlantic.com/health/archive/2019/05/too-many-...