
Data is a Toxic Asset - interweb
https://www.schneier.com/blog/archives/2016/03/data_is_a_toxic.html
======
jmaistre
_The Ashley Madison data breach was such a disaster for the company because it
saved its customers ' real names and credit card numbers. It didn't have to do
it this way. It could have processed the credit card information, given the
user access, and then deleted all identifying information. To be sure, it
would have been a different company. It would have had less revenue, because
it couldn't charge users a monthly recurring fee._

This seems to me the wrong way to solve the problem. The crazy thing about
credit cards, social security numbers, and bank account numbers is that these
numbers are supposed to be kept secret and private, and yet you need to
constantly give them out to people. Everyone you write a check to gets your
bank account number, every place you buy from gets a credit card number. This
is insane.

The right way to solve this is that Visa and Master Card need to develop a
standard to make super easy to generate a unique payment number everytime you
make an online purchase. Then that should be built in as a browser extension
or component. So I browse to a site, click to pay with my Visa card, and Visa
automatically generates a unique code for that site and fills it in on the
form.

Also it is insane that someone can steal my identity by simply knowing my
social security card. The right way to solve this would be to have an
indentity provider that has a short 10 second video of myself on file. Then,
when I want to sign up for a credit card or bank account, I take a 10 second
video of myself using my cell phone, granting approval to open the account. A
staffer at the credit card company then compares the video with the video on
file with the identity provider, and verifies that it matches. The identify
provider also sends a message to an email address or mobile number on file, so
that I am alerted that someone is opening an account in my name. Using these
two simple safe guards, identity theft would be much, much harder. A video
recording of a person is very hard to fake, much harder to fake than a
signature.

A final key innovation would be if email providers would make it super-easy to
generate aliases per site. I do this myself manually with fastmail, but if
there was a simple browser extension that would automatically create an alias
and fill in a form, that would be great, because I could have a unique address
that all funnels into one place, for everything I sign up to.

~~~
a-saleh
Is this a US specific thing? Why would you need to keep your SSN and bank
account number private?

Ok, I know US citizens are not automatically given ID cards, so if everybody
takes the SSN you give them at face value, I get that.

I don't understand the bank account especially. Like I have some automatically
deducted monthly payments, but I remember I needed to specifically authorize
the receiving account to be able to ask for the money with my bank.

With cards,the standards are starting to get there, i.e: I can enable with my
bank that every time I use the card for internet payment, I need to confirm my
identity with code they send me in sms. As far as I know, I could ask for
different second factor of authentification, I know my dad has standard rsa
token.

Unfortunately I had problem using this with some foreign site (I think it was
Amazon?), so I had to disable it. I live in Czech Republic.

~~~
toast0
> Is this a US specific thing? Why would you need to keep your SSN and bank
> account number private?

For SSN, if you have good credit, you a SSN and a name is basically all that's
needed to open a new account connected to your general credit record. If the
account was opened in your name without your consent, it's a lot of work to
get it disassociated from you.

For bank account numbers, most payments are processed through the 'automated
clearing house', which is fancy check clearing. In the old days, maybe your
bank would look at the check presented and return it without payment if they
could tell it wasn't legitimate / your signature wasn't right. With an
electronic withdrawl, there's not really any information provided to them to
check anything.

~~~
eru
The poster you replies to knows the answer you've given. The question was
rather: `Why is the system set up in such a way that this is the case?'

------
makeitsuckless
Schneier is missing one major reason why companies keep data: regulation.

So many regulatory bodies and laws requiring companies to keep all kinds of
data for all kinds of reasons for a wide variety of periods, so that simply
having a policy to "store _all_ the things" is way, way simpler to implement
than to carefully study and adhere to each individual rule.

Nothing really new here, even before cheap storage and ubiquitous computers,
companies kept boxes and boxes of all the paperwork ever, just in case some
audit may require them to dig it up. Only physical limitations sometimes
caused them to throw away stuff labeled "a decade ago", and today there simply
is more data and zero incentive to destroy it.

~~~
aethertron
Good point, here's an example: EU VAT, which obliges companies selling digital
goods in Europe to store customer and transaction details for 10 years.

[https://www.gov.uk/guidance/register-and-use-the-vat-mini-
on...](https://www.gov.uk/guidance/register-and-use-the-vat-mini-one-stop-
shop#records-you-need-to-keep)

~~~
ross-life
How does this work with digital stores (Steam/App Store/Play Store)? Do you
even get that data from them as a developer?

~~~
aethertron
I think those stores take care of VAT and all the requirements around that, so
the developer doesn't need to worry about it. That's what the 30% cut is for.

------
whatnotests
Securing data is a daunting state to be in; it's not a task, but a lifestyle.

Out of 100 average web developers only a handful take security into account
during design and fewer still think and work through what's necessary to keep
anything safe at all.

It's no wonder that popping servers is so trivial and even high value targets
with dedicated security teams and constant proactive threat response get pw0nd
daily.

~~~
x5n1
It has nothing to do with web developer and everything to do with the insane
state of web development and software development in general. You want a full-
stack developer that also deal with security, as they say good luck with that.
Knowledge in web development is always incomplete because there is simply not
enough time to learn everything. So web developers and software developers are
necessarily hackers. All of them. It is impossible for a software developer to
be an expert because the domain of knowledge is beyond an individual's
comprehension. So good luck adding security to the repertoire of already
overworked people who are doing the jobs of 3 different people and usually
earning a pittance for all the work they are doing.

You want security, then hire a security expert to oversee the development and
ensure security. Pay him or her 60,000-70,000 to ensure that. Otherwise forget
about it, not going to happen. Your developer is already too busy as he or she
is.

~~~
vbezhenar
Security is easy. Do not trust user input. In any framework or library just
identify user inputs and treat them carefully. Hard part is to remember that.

~~~
voltagex_
No, no it's really not.

[https://www.owasp.org/index.php/Top_10_2013-Top_10](https://www.owasp.org/index.php/Top_10_2013-Top_10)
and that's _only_ Web security.

~~~
vbezhenar
Almost all of those items are exactly what I wrote: do not trust user input.

~~~
voltagex_
There has to be a middle ground between saying "security is simple" and "the
sky is falling".

I think stating things like "don't trust user input" risks things like
[https://kivikakk.ee/cryptography/2016/02/20/breaking-
homegro...](https://kivikakk.ee/cryptography/2016/02/20/breaking-homegrown-
crypto.html) happening.

Security is hard, programming is hard, we should all get better at both.

------
jjwiseman
Maciej Ceglowski has been saying the same thing:
[http://idlewords.com/talks/haunted_by_data.htm](http://idlewords.com/talks/haunted_by_data.htm)

------
zmmmmm
This is a great article, and I hope it gets read widely. I love the phrase
"toxic data spill". We won't have reached maturity in the IT world until it
becomes completely accepted and assumed that your systems _will_ be broken
into and whatever data is accessible there _will_ be stolen. Only when we
start designing with that in mind as a first principle will we actually have a
chance of making people safe. For now, virtually every system I come across is
designed around the principle that nobody bad will ever get in, and all we
have to focus on is layers of encryption and network security to stop them -
it is honestly just ludicrously naive. Even with perfect security, one day
someone you _let_ in will turn bad and expose data.

------
stretchwithme
I'd like to see 2 or 3 or 4 step authentication in place whenever someone
tries to USE my data.

Your data is over the place, in many hands. While it should be protected, it
should also be much harder to use it to pretend to be you.

You should be able to set up 0 or 1 or 2 step authentication for trivial
purchases, 3 step for larger purchases or accessing credit, or even 4 step
authentication for things like buying a car or house.

Some steps could be approval require or denial required. Its enough to be able
to deny the purchase of a latte, but you might want to always have to approve
spending thousands of dollars.

And we need to be able to set up new kinds of authentication steps, like
fingerprints or the approval of one or more trusted relatives for an older
person or child. Or even use a notary public. And you might have to use more
of these if you are from home.

And none of this should be manditory, but there should be sensible defaults
that individuals can change. AFTER being well authenticated, of course :-)

If we raise the difficulty level of stealing MOST people's identity, this will
largely solve this problem, especially for those most wanting to solve it.

------
CM30
Or in the simplest terms possible, the best 'private' service or site is the
one that doesn't store any personal information for its users. If you're
running a site like Ashley Madison, then don't store real names and
information. Same with if you're running an anonymous message service, an
anonymous emai service, etc.

That's not some shocking new thing. Forums and other such sites have been
letting people sign up with no more than an email address and password for
years. And the payment stuff on these sites and services could easily go
through PayPal or some other third party provider (who's likely got a much
more secure system setup than you).

But no, a lot of sites and companies and services seem to be all 'let's store
everything about everyone, and then wonder why it causes a meltdown when the
site gets hacked and said data leaked all over the internet'.

------
KirinDave
He's right that data volumes have non-linear risk profiles.

He's wrong that there is evidence more data isn't better. While there are
indications of this for advertising, it is definitely not the case for
financial data.

And the other subtlety is that lots of low quality data is indeed useless, but
small sums of high value data can do a lot. That high value data is what
people are looking to steal. Having a little bit of user financial traction
data, for example, is incredibly powerful. Much more so than, say, cross-
website shared cookies or Amazon referral patterns.

And there is a whole class of data that has value proportional to the total
sum of it you possess. A good example of that is surveillance data. Ubiquitous
video coverage of an environment is much more useful (to both machines and
humans) than partial coverage.

------
rl3
The trick is to just collect and store data at exabyte scales like Google or
the NSA.

That way when there's a breach, it's impractical for the attackers to
exfiltrate the complete dataset because the target probably represents a non-
trivial percentage of the world's storage capacity.

The attackers can filter the data, but _surely_ someone's going to notice ten
thousand machines whirring away at odd MapReduce queries.

~~~
Terribledactyl
Let's say all of the most sensitive data of a person can be fit in 20kb, SSN,
CC, Bank, your dogs's high school's mascot's sweetheart, etc. The entire US
would just about fit on a 6tb drive.

The rest of the data is not that valuable in comparison.

~~~
rl3
> _your dogs 's high school's mascot's sweetheart_

I'm not sure what that is exactly, but it sounds really sensitive. :)

In all seriousness, while the release of banking and identity information is
certainly bad, I'd argue the contents of private communications or
browsing/search history are potentially far more damaging for a lot of people.
In order to include that in your hypothetical, it'd require either a lot of
filtering or the 6TB number would balloon quite a bit.

~~~
Terribledactyl
>I'm not sure what that is exactly, but it sounds really sensitive.

I was poking fun at "security" questions.

These bits of data are gatekeepers, if I have your aol account password, I get
all of those for free.

(So I think we're in agreement, my SSN isn't controversial to my friends,
employer, family, news, etc, but most people have probably had conversations
or searches that could look really bad)

