
GDPR vs. Blockchain: Technology vs. The Law - velmu
https://blog.trendmicro.com/gdpr-vs-blockchain-technology-vs-the-law/
======
Klathmon
My issue with Blockchain and "information laws" (not just GDPR, but any law
which prohibits or controls information) is that you ("you" being the
creator/maintainer of the blockchain) can never mess up.

If someone puts "illegal information" directly in your blockchain, you either
need a way to roll-back the change (which removes most/all of the benefits of
an append-only structure like a blockchain), or you need to be okay with
breaking the law.

That someone could be a blackhat that wanted to publish a bunch of personal
info, a disgruntled employee, or just regular users if you allow arbitrary
input.

I don't know of any kind of real solution, and at some point someone is going
to push the limits and make it obvious that "illegal information" is on a
blockchain, and we (as society) will have to come up with a way to deal with
it.

~~~
petertodd
> you either need a way to roll-back the change (which removes most/all of the
> benefits of an append-only structure like a blockchain)

Blockchains are not inherently immutable or even append-only. What they
guarantee (assuming the consensus scheme is working) is that you'll be _aware_
of all the data in the chain _or_ the fact that some of it is missing; the
idea that blockchains are inherently immutable is a big misconception that
treats blockchains like they're magic.

In a database, if you are forced to delete data, you'll go ahead and delete
that data. You's probably also record the fact that you deleted that data in
some kind of auditing log to be able to audit compliance with those deletion
requests, as well as detect illegitimate deletions.

In a blockchain, you can do the exact same thing, but with better controls:
add a block to the blockchain that contains machine-readable instructions
telling all nodes processing the blocks to delete the data if they have a
copy.

Once every node has done that, even though the blockchain still _commits_ to
that data, the data itself is unavailable, thus complying with the requirement
to delete it. Essentially we now have a dangling pointer: there will be a
digest in the chain, but the data that was hashed to produce the digest is no
longer available. The blockchain is of course no longer able to be validated
in the same way as before, but what you can validate is the fact that the
deletion process was followed properly. E.g. you might have rules that state
2-of-3 senior admins have to use their signing keys to sign off on a deletion
request.

A similar example of this principle exists in Git: you can do a shallow
checkout that doesn't contain all prior history. Recentish versions of Git
handle this quite well: every operation will succeed _if_ it doesn't happen to
process the missing history. Similarly, if every git repo is in fact a shallow
repo, with some history deleted, that data will gone.

At the low level, this works because practically all blockchains make use of
merkle trees. Rather than the hash of a block b_n being calculated as
Hash(b_n), you instead have a header that commits to a merkle tree of data
items. Thus you can still validate that the headers are connected without
actually having all the data (extra credit problem: is this always a good
thing?).

1) By "commit", we mean that the blockchain headers contains a hash digest of
some other data; the headers are _committed_ to the data they hashed, because
if that data was changed the hash digest would change, rendering the headers
invalid.

~~~
Klathmon
I'd argue at that point it's not a "blockchain", since you are removing one of
the major points for using the data structure (the ability to verify the chain
without trust).

If you are just going to rely on a central authority to say which blocks
should be "ignored" (in other words, which hashes should just be trusted),
then why use a blockchain at all? It's just a very slow, complicated,
expensive append-only log file.

The only benefit a blockchain provides (for the sake of this discussion) is
it's ability for someone to verify it without trust from the genesis block all
the way to the most current. You take that away, it's not a blockchain
anymore.

~~~
petertodd
> If you are just going to rely on a central authority to say which blocks
> should be "ignored" (in other words, which hashes should just be trusted),
> then why use a blockchain at all?

Even if I ultimately have to kotow to the central authority, I still want to
be able to audit what they're doing. Similarly, there is no such thing in the
real world as a trusted central authority: the central authorities themselves
are groups of people who don't fully trust each other, and thus want to be
able to do effective internal auditing.

You're line of argument could also be used to argue that we don't need to
audit banks at all.

> It's just a very slow, complicated, expensive append-only log file.

The reason why Bitcoin is slow and expensive is because of the proof-of-work
and decentralization, not because of the blockchain itself.

Distributed databases already make use of complicated log files to achieve
consensus between nodes and for auditing; adding hashes to the entries in
those log files is relatively easy, and as cryptographic hashing is quite
fast, doing that doesn't incur much of a performance hit. Most importantly, in
a centralized environment, you can fairly easily use standard sharding
techniques to make all of this scale well - which is the same way very large
databases scale anyway.

~~~
Klathmon
That's not my argument, my argument is that using a blockchain as your
underlying data structure is entirely pointless if you have a way to
selectively remove blocks. I'm saying nothing about how we should audit other
things, just that a blockchain's "audit method" is the ability to follow the
chain from the genesis block to current without any trusted 3rd party or
outside information. You remove that ability, you've removed the point of
using a blockchain in general. I don't care how other central authorities
audit themselves, that's completely off topic, I care that a blockchain is by
definition a chain of blocks.

A simple list of "data", it's hash, and a counter would be better for what you
want in just about every way. You'd still be able to see that data was
modified or removed, and it would be faster, simpler, and easier to work with.

>The reason why Bitcoin is slow and expensive is because of the proof-of-work
and decentralization, not because of the blockchain itself.

I never said anything about Bitcoin. I'm saying that using a blockchain while
throwing out the "chain" part is more complicated, slower, and more expensive
than using a simple append-only log file.

~~~
petertodd
> I don't care how other central authorities audit themselves, that's
> completely off topic

We're talking about GDPR here, which is most important in the context of
centralized systems whose operators have to worry about compliance with
European law; really, it's easier to argue that the fully trustless systems
you're imagining - what Bitcoin attempts to be - are what are off topic here.

> A simple list of "data", it's hash, and a counter would be better for what
> you want in just about every way. You'd still be able to see that data was
> modified or removed, and it would be faster, simpler, and easier to work
> with.

If you implement this in a real system, you'll need to rehash all the data any
time any of the data is changed. The obvious optimization there is to only
hash new data plus a commitment (hash digest) of the old data, at which point
you have a blockchain.

> I never said anything about Bitcoin. I'm saying that using a blockchain
> while throwing out the "chain" part is more complicated, slower, and more
> expensive than using a simple append-only log file.

How do you know the append-only log file was actually append-only? You'll want
redundancy, which means multiple systems must have their own copies of that
log file.

Next you'll want auditors to be able to challenge those systems, which is
often easiest if you have those systems sign cryptographic statements about
what should be in that log file. Real-world systems have log files whose state
changes over time, thus you'll want to the chain of blocks optimization so you
don't have to re-download and re-hash everything every time the state changes
to verify the latest set of signatures for the log file.

And at that point, you have a blockchain whose consensus is defined by a
centralized quorum of trusted but auditable entities.

------
DennisP
Pretty reasonable actually:

> GDPR does not prohibit blockchain, but it does put some procedural
> requirements around blockchain’s use in commercial enterprises. For
> individuals who opt into a blockchain, there is no authority to amend or
> correct a block once it is incorporated into the chain. For them, caveat
> emptor. For organizations, make sure you have a mechanism that will allow
> you to disassociate an individual with their blockchain contributions

~~~
joering2
Wow the blockchain can actually become a grat loophole! Concerned about
privacy of your users? Just upload their data to blockchain and then use this
information anyway you want directly from the blockchain. I mean GDPR wont
rollback blockchian, can they now?

~~~
knorker
IANAL, but pointers to PII are still PII. If you can go from "description or
address" to "human" then it's PII.

~~~
Boulth
Yes.

> Personal data that has been de-identified, encrypted or pseudonymised but
> can be used to re-identify a person remains personal data and falls within
> the scope of the law.

Solution:

> Personal data that has been rendered anonymous in such a way that the
> individual is not or no longer identifiable is no longer considered personal
> data. For data to be truly anonymised, the anonymisation must be
> irreversible.

Source: [https://ec.europa.eu/info/law/law-topic/data-
protection/refo...](https://ec.europa.eu/info/law/law-topic/data-
protection/reform/what-personal-data_en)

~~~
knorker
So you're agreeing? If someone else has the handle->PII mapping then the
handle is PII too, and if you can't scrub the handle, then you are in
violation of GDPR.

~~~
DennisP
According to the article, you only have to scrub the actual PII, not the
pointer to it. A pointer that points to nothing is fine, even if it used to
point to PII.

And even that is only the case if it's an enterprise holding the PII; if it's
out there in the world in a P2P context then the law recognizes that nothing
can be done about it.

~~~
knorker
I'm not so sure. An IP address is PII, not because it's about you but because
when combined with other databases identifies you.

I'd argue that you're still _identifiable_ if this pointer can be converted to
your PII. Even if you point to someone _elses_ database, as long as it's not
been scrubbed.

Again, IP address is PII even though I don't have address-to-human mappings.
My understanding is that this is because their ISP does.

Maybe somewhere the word "identifiable" is defined in the legal context?

In any case we agree that whatever it means, it will not allow me to outsource
PII-storage to China, and have mere random "handles" in their database, right?
Because then the law is completely void through a loophole.

~~~
DennisP
I think the scenario is, you have the PII in your database and the blockchain
has "RecordNumber: 123", and then you delete all the PII in record 123 so the
it's just a meaningless number.

If there's another copy of the PII somewhere, then you haven't deleted it. If
the PII was actually public on a P2P network, instead of held by a company,
then they're saying the law doesn't apply.

~~~
knorker
Ok. But why would that be the law? Say my reference number is public on the
internet, and the reference number is in a transaction log.

That means that the transaction log has information about me, that is not
anonymized. Thanks to this reference log you can see my purchase patterns.

And isn't "a database of people's purchase patterns" the exact thing GDPR is
meant to control?

------
vilhelm_s
This applies to all uses Merkle trees---blockchains is one example, but
someone[1] pointed out that git is another. Each git repo typically contains
the names and email addresses of people who committed, and you can't rewrite
history without disrupting anyone else who is working with the repo.

Does this now mean that github is in violation of the GDPR, since they cannot
retroactively delete people's personal information?

[1]
[https://social.heldscal.la/notice/7460829](https://social.heldscal.la/notice/7460829)

~~~
knorker
> Each git repo typically contains the names and email addresses of people who
> committed, and you can't rewrite history without disrupting anyone else who
> is working with the repo.

But you _can_ do it. You can disrupt people, but _shrug_. You do the same
thing in git to remove copyrighted content from your repo. You can't tell The
Law (copyright law and/or GDPR) that "sorry, can't do it because that'd be
inconvenient to people".

The problem is that if you do it to bitcoin it breaks.

> Does this now mean that github is in violation of the GDPR, since they
> cannot retroactively delete people's personal information?

Sure they can.

First of all github may not be the legal "owner" (I forget the term), the user
is. Second, if github were told by a court to remove the content, of course
they can.

This is even scalable, to github:

"Hello N. N. Due to GDPR request XYZ your repo A is in shallow-clone mode
until you remove Y piece of PII. Click here to automatically git-filter-
branch."

To say they can't remove it is like saying you cannot remove PII because your
business's domain name is TheresaMayIsABigFatTurd.com. That's not going to
fly.

------
jasode
_> Under GDPR, an organization that constructs a blockchain may have to remove
a block or modify some data to comply with a request to forget someone.[...]
GDPR does not prohibit blockchain, but it does put some procedural
requirements around blockchain’s use in commercial enterprises._

If "organizations" and "commercial enterprises" means entities like Goldman
Sach's proposed blockchain, or Kodak's cryptocurrency, than the GDPR rules
aren't really that radical nor create contradictions of compliance. As the
blog says, just use a record id (a db "foreign key" id) instead of storing the
raw data directly in the blockchain. The record pointer can either point to
real data in a lookup table or point to null. The "erase" only works because
the organization controls the "lookup table" that the record ids point to.

Instead of that commercial scenario, what people wonder about are the public
blockchains such as Bitcoin and Ethereum. Has there been definitive writing on
that subject that concludes GDPR doesn't apply to those? Since there's no
central authority in charge of those blockchains, any private data can't be
erased as easily.

~~~
knorker
IANAL:

Bitcoin isn't cash[1] (or money?), so presumably doesn't have financial
retention requirements.

But it is data about my behaviour, purchase transactions, donations,
interests, etc.

Which is PII.

Which means (to me) like I should be able to demand all transactions related
to wallet xyz be purged, not just the "additional data" associated with the
transactions.

As we've learned anonymizing data is hard. Even with IP addresses removed
people have been identified[2]

[1]
[https://www.youtube.com/watch?v=p9HH_dFcoLc](https://www.youtube.com/watch?v=p9HH_dFcoLc)
and other references and actual court decisions saying it's not money.

[2] I think it's this one I'm thinking of: [https://arstechnica.com/tech-
policy/2009/09/your-secrets-liv...](https://arstechnica.com/tech-
policy/2009/09/your-secrets-live-online-in-databases-of-ruin/)

~~~
jasode
_> I should be able to demand_

If GDPR applies to Bitcoin, __who__ would you demand the purge action from?

There are an estimated ~10000 pseudoanonymous Bitcoin full nodes[1] all
updating a ~170 GB append-only database[2].

Where does the GDPR compliance demand get sent? To the bitcoin user forum
where some (not all) the developers hang out? Or to the maintainer of the
github repo for C++ bitcoin client? What would be the procedure? Make those
10,000 nodes update and download new client software that "forks" the
blockchain for each GDPR erasure request? Why would Chinese miners care about
compliance with EU laws like GDPR?

[1] [https://bitnodes.earn.com/](https://bitnodes.earn.com/)

[2] [https://blockchain.info/charts/blocks-
size](https://blockchain.info/charts/blocks-size)

~~~
hartator
> If GDPR applies to Bitcoin, _who_ would you demand the purge action from?

It's not doable for sure. However, one can argue you can find all the node
IPs, subpoena their ISPs, sue every companies and individuals to compliance,
jail the ones who don't obey, and blacklist IPs from non cooperative
jurisdictions. Fun times.

~~~
knorker
It's doable for a very useful definition of "do".

A good comparison is "the pirate bay" and copyright infringement.

You can't shut down all copyright infringement, but starting a company in 2018
to spread and share other people's copyright is not "a good idea", or at least
you know what you'll be getting into.

GDPR, likewise, is meant to stop dealing in other people's PII without consent
(and consent can be withdrawn).

When you see "PII", think copyrighted content. You can't say "because
algorithms I'm not responsible!".

------
rwcarlsen
Would the GDPR have any effect if an anonymous 3rd party installed personal
info into the blockchain against the wishes of people who now want it to be
forgotten?

~~~
knorker
IANAL, but why would it matter _how_ the PII got into some company's
computers? GDPR demands that the person in question can have it removed.

Otherwise companies could just go "yeah we have everyone's personal data, but
that was done by a previous employee and we fired her, so we don't have to
delete it".

~~~
rwcarlsen
But in the case of the (bitcoin or other public) blockchain - there is no
person or organization you can go to to have it removed. It is a distributed
p2p ledger/database with individuals and orgs all over the world in every
jurisdiction storing and sharing the data.

~~~
knorker
Yes there is.

\- "Are _you_ storing my data?" \- "Uhm, technically… I mean it's on our hard
drive" \- "Delete all my data now. Please and thank you"

~~~
fixermark
"I'm in China, request denied."

~~~
knorker
Sigh.

1) Sounds great! Hand over all bitcoin control to the Chinese jurisdiction.
Problem solved, eh?

2) Hey CoinBase, Merchant X, etc… you are acting as a front to dealing in EU
citizens PII, cease and desist with all that.

~~~
fixermark
Yep. I don't know that the EU intended to create a perverse incentive to move
cryptocoin experimentation out of the European jurisdiction, but here we are.

Given how anti-authoritarian the userbase for most cryptocurrencies is, I
wouldn't be surprised if this result is---while not intentional---certainly
not something EU regulators would worry too much about.

~~~
knorker
To me this sounds like "why make murder illegal in the EU? What perverse
incentive are they creating to move murder offshore to China?"

You can't defend doing bad in the world by saying it's not yet banned
everywhere. It's kinda defeatist. We can't allow women to drive because then
Saudis will have less incentives to come here to spend their money.

~~~
fixermark
The thing is whether we assume the behavior in question is bad.

I expect people in the EU still want to make murder illegal even if it "moves
it to China"; they don't want it in their backyard.

Did the people of the EU really not want cryptocurrency in their backyard?
Maybe they didn't, and this is all working as intended. I'm not sure they
intended this side-effect. "Right to be forgotten" may not be something one
actually wants to apply to questions like "Where did the mafia get all this
money?", even if we want to apply it to other spaces (which is why "follow the
money" is so often a useful technique for piercing obfuscatory smokescreens in
other data sets).

(Personal bias disclosure: I'm also kind of expecting they're going to find
the costs of the GDPR outweigh the benefits in general in the next five to ten
years, but I might discover I'm mistaken).

~~~
knorker
I'd say the people of the EU don't want their PII all over the place (even if
it's in China. Actually especially if it's in China). If you can make
cryptocurrencies not have that then that'd be great. Bitcoin very much does
not address this.

As for questions like "Where did the mafia get all this money from" this is an
argument against all cryptocurrencies, and for the financial regulation that
is in place.

Financial regulation exists because people want it, for the most part, and
either cryptocurrencies should have to abide by them (which includes
trackability and reversibility) or it does not (which means GDPR).

I'd argue that no, there is pretty much nobody who wants cryptocurrencies,
when they think about how yeah almost all the time financial regulation do
benefit the individual.

I'm not saying GDPR is perfect, or even necessarily a step in the right
direction. But cryptocurrencies are definitely a step in the wrong direction,
going what people want.

------
SiempreViernes
> A medical record would refer to “Corp-ID Client 192734.” If that person
> wished to be forgotten, the organization would re-assign that pseudo-ID to a
> null ID, eradicating the link from the person to the data.

Putting medical records into a global public database seems like it would
require a more anonymisation work than just redacting the name for it to be
acceptable.

~~~
fixermark
Also, the process described will require nothing to ever denormalize the
record from the "handle" identifier to the "pointer" identifier.

Caching is hard, so good luck guaranteeing that.

------
kasperni
If the information does not need to be public. A simple solution is to encrypt
the data you put on the blockchain. And then just throw away the key if
somebody request that you delete their information. Of course it cannot be
applied in all situations. But sometimes it will do the trick.

~~~
knorker
But an IP address is PII. So presumably a wallet address in PII. If so, then
all transactions are PII and subject to GDPR removals.

This seems harder to solve.

------
he0001
With GDPR the scope of what is actual personal information is broadend. This
means that an account number is considered personal information. How is any
variant of an ID in a block chain different from an account id?

~~~
Boulth
Account ID is only personal info if it can be tracked back to a person.

Source: [https://ec.europa.eu/info/law/law-topic/data-
protection/refo...](https://ec.europa.eu/info/law/law-topic/data-
protection/reform/what-personal-data_en)

~~~
he0001
As an example; If I have an account on a block chain I still will know what it
is and anyone that have been interacting with that account could still
identify the account. And GDPR says explicitly that it shouldn’t be possible
at _all_ to identify an individual given considering any amount of time and
any resources. So I still think an “account” pseudo key in a block chain is
still under GDPR. But we’ll probably see those cases in court under various
conditions.

Source: paragraph 26 [http://eur-lex.europa.eu/legal-
content/EN/TXT/PDF/?uri=CELEX...](http://eur-lex.europa.eu/legal-
content/EN/TXT/PDF/?uri=CELEX:32016R0679&rid=1)

------
SCdF
How is this specific to blockchains compared to any other kind of append-only
book-keeping?

~~~
Klathmon
Because blockchains have what most other append-only systems don't, a
mathematical requirement that the data NEVER change in order for the system to
continue running.

In an append-only system the enforcement comes from the software running
around the data, if you need to you can modify the data in the past, and tell
the software to just pretend it wasn't changed. You can't do that with a
blockchain without throwing the whole thing away, so it brings into question
what happens when legally you MUST remove information.

~~~
petertodd
A good way to explain it is with a blockchain the math guarantees that you'll
be able to _detect_ modifications to the data, or if some data is missing, the
_potential_ of modifications.

But even with a blockchain, the solution is the same: build an "escape hatch"
into the validation software that allows certain kinds of data modifications
and/or deletions to be ignored. The only difference is with a blockchain
you'll have stronger guarantees that auditors will be able to detect if that
has in fact happened.

It is true that certain types of blockchain cryptographic structures can
greatly limit the granularity of those options. But no-one actually uses those
types of blockchains (namely chains without per-block merkle trees), so that
point is moot.

