
Data Is a Toxic Asset, So Why Not Throw It Out? (2016) - mooreds
https://www.schneier.com/essays/archives/2016/03/data_is_a_toxic_asse.html
======
fredley
This is an interesting post to see alongside _There Is Too Much Stuff_ [1],
which is also on the front page at time of writing. Many parallels between our
collection and hoarding of junk, and our collection and hoarding of junk data.

1: [https://www.theatlantic.com/health/archive/2019/05/too-
many-...](https://www.theatlantic.com/health/archive/2019/05/too-many-
options/590185/)

------
darkpuma
Unfortunately individual executives have historically not been made to
adequately feel the pain when companies lose control of user data. Nothing
will change until that changes.

------
amelius
Why are information brokers even allowed to exist?

The Wikipedia entry discusses some past attempts to apply regulation, but this
all seems way too mild.

[https://en.wikipedia.org/wiki/Information_broker#Criticism](https://en.wikipedia.org/wiki/Information_broker#Criticism)

~~~
brighter2morrow
Powerful people like information brokers because their information can help
track down dissidents

------
rrosen326
There is a semi-valid reason similar, but I think different, from his first
point.

Holding on to data provides an option to potential downstream gains. (ie:
holding data has an "option value").

Will it be valuable? Who knows? Maybe there is some nugget which will unlock
your business or service and bring you victory.

If you get rid of it, you'll never know, until perhaps too late.

If you couple the option-value of holding data to a perhaps unrealistically
low expectation of the cost of holding data (ie: a breach), you get an
expected value calculation that says, 'hold everything!'

It may be totally rational, even if incorrect.

~~~
Retra
The rationality of hoarding makes more sense if you have 'warehousing'
capabilities. Not so much if you're just piling things up haphazardly. (And
those capabilities have costs associated with them.)

~~~
darkpuma
> _" The rationality of hoarding makes more sense if you have 'warehousing'
> capabilities"_

Or if you believe you could/will organize the data in the future, perhaps with
an unrealistic expectation that doing so will become cheaper in the future.

------
AndrewStephens
Schneier is (as usual) right. It is far too easy (in fact, the default) to
store every little detail about customers interacting with services you
control. Only some of this data can be justified for business or technical
reasons. The rest is dead weight and an actual liability.

A simple example relevant to readers here: every web server I know of logs
information about each request and retains those logs for a long period of
time (because, who cares? Disk space is cheap).

But those logs contain IP addresses and possibly other identifying
information. Web servers get broken into all the time - if those logs get
leaked people can use those logs to build profiles of your visitors and
possibly correlate with other data sets to tease out information about
specific visitors. On my website it would be easy to spot the admin IP address
for further attacks.

Paranoid? Perhaps. During WWII the slogan was "Loose Lips Sink Ships" \- any
slip of information could be the final piece of data that gives the enemy a
fatal advantage. We are effectively at war with criminals and actual foreign
states (not to mention corporate interests) who are continually looking for
similar advantages. Even random script kiddies want your data.

One of the things I like about the GDPR is that it force companies (and
individuals) to recognize the danger and act accordingly.

A longer essay I wrote along the same lines[0]

[0]
[https://sheep.horse/2018/6/the_eu_general_data_protection_re...](https://sheep.horse/2018/6/the_eu_general_data_protection_regulation_and_me_%28.html)

~~~
jiveturkey
> Schneier is (as usual) right.

It's easy to be right when you argue both sides!

[https://www.schneier.com/blog/archives/2018/01/security_brea...](https://www.schneier.com/blog/archives/2018/01/security_breach.html)

~~~
AndrewStephens
Not sure what you are trying to say here.

The original article and the one you posted are not inconsistent - in your
link Schneier quotes some research that the stock market doesn't care about
data breaches, probably because there is little direct financial damage. Hence
there is little incentive for companies to stop stockpiling every scrap of
data.

Schneier is arguing that the risk to individuals and society at large is
greater than any benefit this data could provide.

One way to remove the incentives to gather and store data is through
regulation and fines - then the stockmarket would care.

~~~
jiveturkey
> Schneier is arguing that the risk to individuals and society at large is
> greater than any benefit this data could provide.

I don't think we read the same (2016) article.

In the one I read, he is claiming that holding onto personal data is about
banking it for future revenue (he actually says profitability, but I think he
is just not being savvy here). Where he specifically talks about risks, he
talks about risks to _the company_ , to PR. Nothing, not a chirp, about damage
to the individuals or society at large.

Earlier in the article he actually cites Anthem and Target, the 2 biggest
breaches to date (as of 2016). Ironic because these breaches _did no damage_
to the companies. Also, not a well formed argument because those companies
weren't monetizing personal data in the way that we think of as privacy
invasive, or "user as the product". Whether or not personal data is a toxic
thing to be avoided, those companies needed to take that data.

In the 2016 article I read, Schneier is claiming that the risk analysis is
flawed and the damage to _a company_ from the eventual breach/loss of personal
data, outweighs the financial gain to be realized from that data. He's trying
to appeal to market wisdom, not social good. "protect your bottom line", not
"think of the children". Which is a fine thesis. But the 2 primary examples he
gives in the same article prove himself wrong, and an academic study in 2018
further proves this to be wrong.

> One way to remove the incentives to gather and store data is through
> regulation and fines - then the stockmarket would care.

Yep, only then. It's a bit like a class action. Each individual person that is
caught up in a privacy breach, suffers just this little bit and doesn't have
an individual voice for redress at all. "The market" will thus do as it will
with such data. We need a big hammer to fix it. GDPR has been a huge benefit
for privacy, as partly evidenced by all the naysayers using implementation
cost as their argument. In the US, when the dust finally settles around
Equifax, we may move the needle.

To finally answer your first point, I'm saying that Schneier was wrong in 2016
and he corrected himself in 2018.

~~~
AndrewStephens
I think Schneier is trying to say in both cases that it is not worth the
risks, but I'll concede that we are arguing a very fine point. In any case, I
agree with Schneier in this latest post and have argued along these lines
myself.

------
paol
This is very much in line with the logic of the GDPR legislation, which while
not perfect is overall good.

It serves to reinforce the perspective that personal data is a liability, so
the natural thing is to have as little of it as you actually need.

------
jdietrich
Art. 5 GDPR:

 _Personal data shall be:

...collected for specified, explicit and legitimate purposes and not further
processed in a manner that is incompatible with those purposes;

...adequate, relevant and limited to what is necessary in relation to the
purposes for which they are processed;

...kept in a form which permits identification of data subjects for no longer
than is necessary for the purposes for which the personal data are processed;_

In the EU, Schneier's proposal isn't just a good idea - it's the law.

[https://gdpr-info.eu/art-5-gdpr/](https://gdpr-info.eu/art-5-gdpr/)

~~~
AnthonyMouse
Now suppose your product is personalized search. You need every piece of data
about someone, for as long as possible, so that you can do ML on it and
personalize their search results. You could even argue that targeted
advertising is part of the product in the same way that vendors pay grocers
for shelf space and yet the cereal they were paid to put in front of you is a
product you might actually choose (especially given that the contrary would
imply they're wasting their money paying for placement).

And never mind search, personalized _anything_ based on ML.

Which of those things does that violate? You can specify the thing ahead of
time, anything that isn't statistically independent from the outcome is
relevant but the only way to know that is to have the data to analyze it and
that relation could change at any time, there is no natural time limit on how
long the data remains useful for that purpose, etc.

My claim is not that that would necessarily be the outcome in an EU court.
They're apparently quite against the idea in general -- prohibiting that sort
of bulk data collection seems to be what everybody puts out as the
justification of the rules. What I'm asking is, by what reasoning does that
ostensibly undesired behavior actually violate those rules?

Meanwhile on the other hand, are we sure we actually want it gone? If ML-
driven personalized medicine ultimately gets good enough to extend your life
by a decade or make it so that you don't spend the second half of it
bedridden, it would be worth quite a lot of cost to have that benefit.

It seems to me we're going about this whole thing wrong. The problem is not
the data, it's the centralization. Having your own data is valuable, but it
should be yours, on your device, not Facebook's. And then you knock out a
major category of mass data breach because all the data doesn't actually exist
in any one place.

Then if you want to share it with your doctor, you should be able to do that.
But if your boss wants it, or the government without a warrant, it's still
yours -- nobody gets to demand it, and Facebook can't provide it to Cambridge
Analytica without your consent, because they don't _have_ it. You do. And the
"personalized news feeds" most people don't actually want isn't enough to
convince you to give it to them.

But that's a completely different thing. It's more of a technical solution --
a different architecture that protects privacy intrinsically, rather than a
set of rules that corporations can try to weasel out of with lawyers and
lobbyists and jurisdiction shopping and trade wars.

~~~
PeterisP
"By what reasoning does that ostensibly undesired behavior actually violate
those rules?"

Under GDPR, it violates the principle of consent (the described scenario
wouldn't meet the other ways that would allow you to use that data) - you're
free to provide personalized search by collecting every piece of data about
someone, for as long as possible, so that you can do ML on it and personalize
their search results, as long as they freely give informed opt-in consent.
Bulk collecting the data on everyone doesn't do that.

If you convince someone that they really want to allow you to collect all
kinds of data (and they know what you're going to be collecting) because it'll
allow them to receive better personalized service and they intentionally
choose that this is what they desire, then that's okay. If not, then it's not
okay.

If they're choosing not to give you that permission while knowing what it
involves - well, that's their choice to make.

If they don't want to think or care about your offer and ignore it and don't
opt in, then they obviously don't value the potential benefit enough to pay
the price, so you can't collect and use their data. Tough luck.

~~~
jcwilde
An important additional component of the GDPR is that an individual can
_revoke their consent_ , and require all previously collected personal
information be deleted.

[https://www.gdpreu.org/the-regulation/key-
concepts/consent/](https://www.gdpreu.org/the-regulation/key-
concepts/consent/)

[https://www.gdpreu.org/the-regulation/list-of-data-
rights/ri...](https://www.gdpreu.org/the-regulation/list-of-data-rights/right-
to-erasure/)

------
jiveturkey
This article is pre-GDPR.

In January 2018, just 4 months prior to GDPR taking effect, Schneier argued,
or at least endorsed evidence supporting, the opposite.

[https://www.schneier.com/blog/archives/2018/01/security_brea...](https://www.schneier.com/blog/archives/2018/01/security_breach.html)

This seems more consistent with my intuition. If personal data were in fact
toxic, companies would in fact take pains to discard it.

We (HN crowd) like to think privacy matters, but there is ample evidence to
the contrary. $FB as the examplar. [https://money.usnews.com/investing/stock-
market-news/article...](https://money.usnews.com/investing/stock-market-
news/articles/2018-04-25/facebook-inc-fb-stock)

