
Archiving your website in the Wayback Machine - l1am0
https://blog.simon-frey.eu/posts/gdpr_how_to_close_your_blog_the_right_way/
======
JacobAldridge
Of course, if you're going to offload hosting, GDPR concerns, or just your own
backup peace of mind to the Internet Wayback Machine, don't forget to say
thank you - [https://archive.org/donate/](https://archive.org/donate/)

------
yoz-y
A few years back I have removed all like buttons, Disqus comments and
analytics from my blog. Even though I have hosted the images myself and
generated links. It took me maybe an hour and that is because I was browsing
reddit or something at the same time. Now I do not need to do anything about
GDPR.

~~~
pluma
Keep in mind you still need a privacy policy, even if it just says "we store
anonymised access logs which we delete after X months and that's all"
(paraphrased).

EDIT: Not sure why the downvotes. Prove me wrong if you disagree. Every lawyer
I've heard after the GDPR echoes the sentiment that even simple sites that
don't do anything fancy should have at least a minimal privacy policy.

There are certain requirements a site could theoretically meet to be entirely
exempt, but if you have a public-facing website in 2018 you're probably not,
even if it's a "digital business card".

Besides, the point is that a privacy policy that explicit tells a visitor that
you're really not doing much at all with their data is a positive signal
whereas a missing privacy policy is more likely to indicate that a) you don't
care about your users' rights, b) you have no idea what you're doing or c) you
don't want to tell your users what you're doing with their data.

~~~
yoz-y
AFAIK the fact whether IP addresses and user agents are considered personal
information is still up to debate.

Nevertheless it is not much work to add that so I probably will. I've already
done that kind of policy to my app: "I keep nothing because I don't collect
anything"

~~~
pluma
IP addresses combined with time stamps are personal information if there's
only one person using that IP address at that moment in time. If you don't
want to personally verify that the IP address was used by more than one person
at the given time, you should consider it personal information.

What is apparently unintuitive to some people (especially programmers) about
the GDPR definition of "personal information" is that it's not a clear cut
list but highly contextual: if it can be used to unambiguously identify a
single individual by someone, then to that someone it is personal information
and they need to treat it as such.

Absurd example: if someone called "John Smith" writes their name on a piece of
paper and you find it ten years later, it's probably not personal information
as you have no way to determine who that name refers to or even whether it
refers to anyone at all.

Absurd counter-example: any given number can be personal information when used
to identify specific individuals in the real world (e.g. numeric user IDs), so
slipping your coworker a piece of paper with the number "12345" on it can be
personal information if both of you know who it refers to (or can look it up).

~~~
yoz-y
It is personal information but is it personally identifiable?

For example in order to anonymize medical data you have to ensure that parties
who do have access can not go from data to the person. This usually means that
the database with data only has some identifiers and another database has the
correspondence of identifier -> personal information.

Back to the IP address topic, it is true that an IP address plus a timestamp
is enough to identify a specific computer. However, as a site owner I have no
realistic mean to go from here to knowing who the person is. For example if
somebody asked me to remove their data, they would have to tell me which IP
they had when. (this is actually an interesting legal/technical question,
should apache logs be purgeable?)

This being said to me it seems fair to specify that these logs are created and
kept. It would be nice to have a boilerplate paragraph to paste to sites that
only keep the apache logs.

~~~
pluma
I'm using the two phrases synonymously because the GDPR protects personal
information, which is the superset (the distinction of PII as a subset seems
to be more relevant in the US).

The GDPR is concerned with privacy. If you derive information from my personal
information in a way that makes it impossible to go back from the derived
information to identifying me as an individual, it's anonymous and thus not
relevant to my privacy. However if you use this exact same process but only
use it once and record that you only did it with my information, it becomes
linked to me again and can no longer be considered anonymous.

For another example: imagine you have a closed group of 1000 participants.
"One of the guys with blonde hair" is probably a fairly ambiguous identifier
because you'd expect there to be more than one person in the group that
description could apply to. However if I'm the only blonde guy in that group,
it's now clearly referring to me as an individual and thus affecting my
privacy.

The thing about storing IP plus timestamp on the other hand is that while it
may be practically anonymous to you, you're storing it. Even though you can't
resolve that information to a single person right now, someone else could if
you gave them access to it.

You can make an argument about where exactly the line should be drawn
considering it's rarely impossible that someone somewhere could use seemingly
innocuous data to identify someone but it's not much of a leap to go from an
IP address and a time stamp to a subscriber who might be a single individual:
IPs are publicly registered to ISPs and those ISPs know who they assigned the
IP to at a given point in time (especially in countries like Germany where
ISPs are required to keep records of this) so you can already easily convert
"IP plus timestamp" to "IP plus timestamp plus an organisation that is capable
of resolving that IP plus timestamp to a subscriber". In other words: at best
an IP plus timestamp isn't anonymous, it's at least pseudonymous (even if you
have no legal means of resolving that pseudonym).

FWIW there are free privacy policy generators out there for small scale
websites (e.g. most blogs). Here's a good one for Germany:
[https://datenschutz-generator.de/](https://datenschutz-generator.de/)

EDIT: For programmers:

You have a set (IP, timestamp).

IPs are publicly registered to ISPs so you can resolve that to an ISP, which
can act as a function (IP, timestamp) => subscriber.

A subscriber can be, among other things, a single individual.

So storing the IP and timestamp is the equivalent of storing an identifier
from a lookup table of subscribers (some of whom are single individuals).

Whether the result of the lookup table is accessible by legal means (e.g. a
warrant) or technical means (e.g. a decryption key) or practical means (e.g. a
literal key to a safe) makes no difference.

------
tjoff
_" So I searched trough my websites: Remove Facebook Like Button, Remove
analytics, Add privacy statement, Add cookie opt-in/out …. the list goes on."_

This is exactly why GDPR is so essential.

Not a single thought about why allowing tracking or the trade offs for others
gain had occurred before GDPR.

For a blog GDPR is a non-issue.

Though I'm deeply disappointed in how many large sites have interpreted "opt
in" as spending 5 minutes vigorously unchecking boxes with ambiguous meaning,
hopefully that will bite them hard.

This isn't rocket science. Yes, if you go out of your way trying to game your
users privacy as much as possible then things will get hairy. That's a feature
and the whole point.

~~~
eatitraw
Have you actually tried to read and understand GDPR before getting to this
level self-righteousness?

A blog writer wants an easy way of tracking popularity of their posts. Clearly
I need a protection from this.

The internet is worse off after GDPR.

~~~
detaro
And they can have easy ways of tracking popularity. You can use Google
Analytics and similar products in compliant ways, and the tools for that have
existed for ages (at least in the case of GA).

~~~
eatitraw
Except that if you add "login with" button then you are suddenly start
processing personal data (even if you only store social media ids).

~~~
Festro
But you don't just store social media ids.

You seem to misunderstand that you're entering into a partnership with a third
party. They become a data processor on your behalf. They process a whole lot
more, and you have access to a lot of that data. Fortunately, the social media
platform includes the privacy policy and consent process in their onboarding
of users, so you don't need to worry about it for the purposes of social
login.

~~~
eatitraw
You actually need to worry, because ids are personal data.

Eve you abandon all these social media platforms and use emails for login,
email addresses are still personal data, and you are still processing it.

------
skbly7
There is already quite a good plugin available by Berkman, Harvard University
related to what author's idea seem to be. It works with Drupal as well as
Wordpress upon which I worked as my GSoC project (nginx/httpd modules as
well). [http://amberlink.org](http://amberlink.org)

It use Archive and local copies as backend, while had ideas to support IPFS
among others.

What Amber does?

Amber is an open source tool for websites to provide their visitors persistent
routes to information. It automatically preserves a snapshot of every page
linked to on a website, giving visitors a fallback option if links become
inaccessible.

If one of the pages linked to on this website were to ever go down, Amber can
provide visitors with access to an alternate version. This safeguards the
promise of the URL: that information placed online can remain there, even
amidst network or endpoint disruptions.

------
pmlnr
I'd argue that running a webserver on a VPS/Raspberry/etc, with no logs,
serving a static site or a wget mirrored version of that site is better - you
still own the site, but if you don't store any information of your visitors at
all, in any form, there's no way GDPR will bite you.

------
NVRM
The tooling is ridiculous, while we can do this with some barebone
bookmarklets.

Save url to archive:

javascript:void(window.open('[https://web.archive.org/save/'+location.href));](https://web.archive.org/save/'+location.href\)\);)

Search the archive for url:

javascript:void(window.open('[https://web.archive.org/web/*/'+location.href));](https://web.archive.org/web/*/'+location.href\)\);)

xD

~~~
NVRM
Peoples downvoting doesn't even known how to make a for loop...

------
Aardwolf
A blog only displays public content on a page right? How can that be affected?

~~~
mikejb
By adding a shit-ton of tracking and analytic code from 3rd parties that
provide some convenience. Blog owners are worried because it's not always
clear what happens through the imported 3rd-party code, and now they have to
start caring about that.

~~~
pmlnr
There's Matomo[^1] and ancient tools like awstats[^2] which are self-hosted
and can be configured to be completely GDPR friendly.

I thought the "need" for silly amount of analytics died with 3rd party website
visitor counters back in the days.

[^1]: [https://matomo.org/](https://matomo.org/)

[^2]: [http://www.awstats.org/](http://www.awstats.org/)

------
uhnuhnuhn
I've just about had it with these GDPR shitposts on HN.

The website the author talks about (datenschutzhelden.de meaning "data
protection heroes") was apparently a platform to share tools and best
practices for online privacy. Now it turns out the same guy running that
website thinks it's too much hassle to remove Facebook integration and offer a
cookie opt-out.

That's truly next level hypocrisy.

~~~
oytis
I can't get into Simon's head, but I've got a feeling that GDPR contradicts
the basic maxim of hacker culture that my computer belongs to me. The website
seems to teach how those who care enough can protect their privacy by getting
control over one's own computer, not imposing requirements on the others'.

~~~
pluma
Your computer belongs to you. My data belongs to me. I can give you my data
and you can keep that data if you tell me what you are going to keep it for
and how you are going to use it and when I agree with all of that, but you
don't get to abuse it for anything else and I can revoke that permission at
any moment and you have to comply.

It's not "imposing requirements", it's called "respecting consent".

The more I hear arguments like this the more it reinforces my impression that
"hacker culture" isn't really about experimenting with technology but more
about self-entitled rich kids abusing other people and shared property for
their own fun and profit (like young Zuckerberg marveling at being trusted
with access to people's private information without understanding the implied
mutual understanding his users assumed to be self-evident).

I feel like the GDPR is the Code of Conduct of privacy laws: it codifies a
modicum of respect that should need not explicit mentioning but seems to have
been entirely lost on entire generations of (aspiring) Silicon Valley hacker
types and thus catches them by surprise when it really should be the least you
can do.

At the very least you are now aware that when you're violating your users'
privacy (if only by handing off their data to random BigCo's you have no
formal contract with) you're breaking the law just as clearly as those cool
'80s kids were breaking the law when they whistled into phones to cheat their
way to free phone calls.

~~~
oytis
Not to start a long philosophical discussion, but hacker culture (you might
not like it, but the author seems to be sympathetic to it) has been
traditionally critical to the notion of 'intellectual property', that is that
by creating some intellectual work I can prohibit the others from
redistributing it. The idea that I 'own' my personal data seems to be another
step further is diluting the notion of property: this time I don't even need
to create anything to impose limitations on the others.

It is also not about 'rich' and 'poor', it's about clear rules that are the
same for the rich and for the poor alike.

~~~
pluma
I would have considered myself a "hacker" in my teenage years when I was
teaching myself programming by digging through language specs online and
looking at other people's code to understand what makes it work.

However it seems that "hacker culture" as the author likely sees it (also as
described in Steven Levy's "Hackers") is really more about privilege than
anything else. A lot of the antics that have entered hacker lore were only
possible because the kids performing them were in relatively risk-free
environments (particularly the notorious MIT Tech Model Railroad Club). Not
necessarily privilege in the modern social sense but certainly in the sense of
class (unless you believe being able to study at MIT is 100% about merit and
nothing else).

It doesn't matter whether the "rules" of hacker culture are the same for those
with privilege and those without: just as in startup culture, you're fare
freer to experiment if you have a safe environment to fall back on if you
screw up. If you're an MIT kid with wealthy parents a botched prank is less
likely to land you in jail and this knowledge allows you to take risks more
easily.

Sure, there's a level of anarchism in hacker culture but too often the kind of
"hacking" that lands you venture capital for your startup (especially "growth
hacking") also includes a blatant disregard for others (again remember
Zuckerberg and the "suckers").

You may argue that this is a deviation from the original hacker ethos or not
"true hacking" but there doesn't seem to be anything in hacker culture to
exclude these people by (which is why I mentioned the formal rules you now
often find in codes of conduct, which many decry as superfluous and
unnecessary because they seem to state the obvious).

As to your real point: the idea of owning data is the polar opposite of what
copyright has become to be about (at least in the US): data is owned by the
individual. You can grant a company usage rights but they're always highly
specific and easily revocable. Personal data is not "intellectual property",
it's an aspect of your own identity.

In the years since the "Social Web" we've seen many failed attempts to allow
users to "reclaim" ownership of their data. Microformats, decentralisation,
software like Diaspora, the Unhosted movement, and so on. Most of them failed
for practical reasons. Few of them really addressed privacy concerns, even
fewer really enforced data ownership. The GDPR is promising to accomplish what
hundreds and thousands of hackers have tried to do for years: not by rebelling
against the BigCo's, but by redefining privacy and data ownership as human
rights.

If you understand hacker culture, you will also remember that before the
Social Web the norm was to be anonymous: "on the Internet nobody knew you were
a dog", "men were men, women were men and 14 year old girls were FBI agents".
You'd go by pseudonyms by default and freely pick new ones to swap identities.
Unmasking people was possible, to a degree, but difficult because of dial-up
and dynamic IPs.

Nowadays every single coffee pot in your home could theoretically have a
dedicated IP address and most of the Internet we use to share information is
accessed using a browser that's often uniquely identifiable without even
looking at the IP. It's no longer enough to rely on technology to grant us
anonymity. The GDPR restores some of that early '90s anonymity. Not by
outlawing technology but by enshrining new human rights and forcing us to
respect them.

/rant

~~~
oytis
I would'n agree that these attempts were completely failed. Like the whole
free software world works, they created better and better tools that at some
point could have become good enough to actually protect one's privacy and at a
later point could have become usable by non-hackers as well.

Now at the time when it's easy as never before for every (well, not every
every but you get my point) schoolboy/girl to create their own standalone page
with comments, own e-mail server and whatever they want, they will probably
not be able to do so, without risking being drowned by an Abmahnungswelle. Not
to say that all decentralized social networks projects are at risk for
approximately the same reason.

One might hope that in the future we'll have a reproducible technology for
creating GDPR-proof websites and the world will be a happy place again, but
solving legal issues with code is a notoriously difficult problem. Legislative
acts are not code, and something as vague as GDPR is not even a spec.

