
Apache Kafka and GDPR compliance - Antwnis
http://www.landoop.com/blog/2017/12/apache-kafka-gdpr-compliance/
======
throwaway2016a
I'm interested in the right to be forgotten section but I'm confused as too
what this article is saying...

How exactly do you "forget" the data on the logs?

One interesting solution that kills two birds with one stone is if you encrypt
the personally identifiable information then delete the private key if there
is a request to be forgotten. Has the added benefit of also effectively
destroying the data in backup copies too.

~~~
Antwnis
> How exactly do you "forget" the data on the logs?

If we think around the options, you can have either:

i) eventual deletion (log retention policy) ii) compacted topics (and push
null values) iii) expensive re-processing of the entire log iv) expensive
segment re-write operation

with each option bringing in a new set of challenges

~~~
nerpderp83
Encrypt with a user specific key when the data enters the log. You can
effectively delete all the user specific data by throwing the key away. No
tracking down files or reprocessing necessary.

~~~
pcl
Is that acceptable within GDPR requirements?

~~~
nerpderp83
[https://en.wikipedia.org/wiki/Crypto-
shredding](https://en.wikipedia.org/wiki/Crypto-shredding)

[https://law.stackexchange.com/questions/23375/gdpr-
general-d...](https://law.stackexchange.com/questions/23375/gdpr-general-data-
protection-regulation-crypto-shredding-or-regular-delete)

I believe it would be, but IANAL.

------
theptip
This "right to be forgotten" requirement is quite staggering in scope. Do I
need to dig out all of my offsite tape backups and re-transcribe them to edit
out my user's data every time a user requests to be forgotten?

Sibling comments mention a cunning scheme with encryption, but that doesn't
really help an enterprise with an existing non-GDPR-compliant backup archive.

~~~
numbsafari
My understanding is that the GDPR “right to be forgotten” does not cover
backups. There may be some exceptions, but there are practical limits on its
reach.

~~~
sulam
I believe your understanding is incorrect. GDPR certainly includes storage and
processing, both of which backups probably trigger.

Anyway, think about the spirit of the law, and then think about how that
interacts with backups. If someone asks to be deleted from your system, you do
so, and then you restore a backup with their data, you have clearly violated
the intent.

~~~
tscs37
Keep a log of deleted users and re-delete upon restore.

The GDPR contains exceptions for data storage for which it is infeasible or
outside reasonable effort to delete individual records or you have legal
compliances to uphold.

~~~
jgeraert
Isn't the log of deleted users subject to the GDPR then?

~~~
hobofan
You can make a log of deleted users without it containing personally
identifiable information, by just storing the IDs.

------
alexatkeplar
We've been doing a lot of thinking about how to support GDPR at Snowplow
(Kafka and Kinesis but plenty of other logs and stores) - for our first phase
we're just going to support irreversible pseudonymization of tagged PII:

[https://github.com/snowplow/snowplow/issues/3472](https://github.com/snowplow/snowplow/issues/3472)

For later phases, yes user-specific encryption of PII or hashing-with-lookup
table are the way to go...

~~~
brians
I wish you wouldn’t call it irreversible. Every large public claim of that
sort has proven false. Consider the Netflix case, where the separate IMDB
review dataset allowed reconstruction of pseudonymous movie watching records.

These approaches may help with compliance, but they’re the opposite of real
safety.

------
polskibus
I'm wondering if anyone thought about a GDPR extension that would include
machine learning extension, ie. being forgotten meant "unlearning" to the
model from my data (or relearning it on dataset from which my data was
removed).

~~~
hobofan
I would consider that already covered under the GDPR. Most machine learning
approaches today make little to no guarantees about differential privacy and
allow for (partial) extraction of the training dataset, which would mean that
the request for deletion was never fully fulfilled.

~~~
polskibus
So do you mean that GDPR allows for a request for removal from model or of
there is an exemption from data mining results?

~~~
hobofan
I think that it allows for a request for removal from the model unless it can
be proven that the PII cannot be retrieved from the model.

(This should not be considered legal advice by me.)

------
skyisblue
With GDPR do we need to get consent from users before we can set any cookies?

~~~
kbart
GPDR itself doesn't specify cookies use. "Cookie law" is defined in ePrivacy
Directive (2002/58/EC) which to be replaced by ePrivacy Regulation which is an
addendum to GPDR. Actually, it's going to be much saner approach than the joke
the current "cookie law" is:

 _" Simpler rules on cookies: the cookie provision, which has resulted in an
overload of consent requests for internet users, will be streamlined. The new
rule will be more user-friendly as browser settings will provide for an easy
way to accept or refuse tracking cookies and other identifiers. The proposal
also clarifies that no consent is needed for non-privacy intrusive cookies
improving internet experience (e.g. to remember shopping cart history) or
cookies used by a website to count the number of visitors."_[0]

To answer your question _" do we need to get consent from users before we can
set any cookies?"_

It depends: yes for tracking cookies, no for others. How to tell them apart is
another question..

0\.
[https://en.wikipedia.org/wiki/EPrivacy_Regulation_(European_...](https://en.wikipedia.org/wiki/EPrivacy_Regulation_\(European_Union\))

------
Sir_Substance
>The right to be forgotten, becomes one of the hardest challenges because of
data immutability. Apache Kafka does not support deleting records, and
although some eventual deletion is supported, it requires

This always seemed like an incredibly toxic decision to me. It's one that
crops up in all sorts of systems, large and small. What, none of these people
/ever/ foresaw the need to delete some data?

~~~
wiz21c
It's not that simple. For example in my business, we may give some money to
help someone "once in its life" (the law says so). Therefore, if the persons
asks to be deleted, then we might not apply the law anymore because it'll mean
we won't remember the decision... I think GDPR is a good thing, but at some
point, in my business, those who write the laws will have to be aware of it
(and the legal teams is miles away from the IT stuff, sadly).

~~~
tscs37
The GDPR offers exceptions to the right to erasure, this mostly includes legal
compliance (banks) or in the interest of legal claims or when data cannot be
easily deleted as individual record. It also does not affect any non-digital
documents which aren't filed. This is all laid out very thoroughly in the
legal documents relating to this.

~~~
wiz21c
I must recognize I didn't read the section about removal thoroughly. But I did
read the articles about the "categories of data" which are the major pain
point right now 'cos it forces you to, well, find appropriate categories of
data. It's a very interesting thing to do but, in my organization, it leads to
many loooong discussions :-)

